最近遇到了一个取数场景–比较两个字符串有多少位相同,之所以有这个需求,是因为一个是OCR识别的,一个是用户输入的,需要比对分析下从而做出改进。下面就总结下 SQL比较两个字符串相同的位数 具体如何实现:

假设我们已有如下源表 table1,我们要得到字段 info_ocr 和字段 info_input 相同的位数和不同的位数:

示例代码:

--方法1:截取字符并逐一比较
select
    id,
    info_ocr,
    info_input,
    
    case when substr(info_ocr, 1, 1) = substr(info_input, 1, 1) then 1 else 0 end
        + case when substr(info_ocr, 2, 1) = substr(info_input, 2, 1) then 1 else 0 end
        + case when substr(info_ocr, 3, 1) = substr(info_input, 3, 1) then 1 else 0 end
        + case when substr(info_ocr, 4, 1) = substr(info_input, 4, 1) then 1 else 0 end
        + case when substr(info_ocr, 5, 1) = substr(info_input, 5, 1) then 1 else 0 end
        + case when substr(info_ocr, 6, 1) = substr(info_input, 6, 1) then 1 else 0 end
        + case when substr(info_ocr, 7, 1) = substr(info_input, 7, 1) then 1 else 0 end
        + case when substr(info_ocr, 8, 1) = substr(info_input, 8, 1) then 1 else 0 end
        + case when substr(info_ocr, 9, 1) = substr(info_input, 9, 1) then 1 else 0 end
        + case when substr(info_ocr, 10, 1) = substr(info_input, 10, 1) then 1 else 0 end as same_chr_cnt,
    
    case when substr(info_ocr, 1, 1) <> substr(info_input, 1, 1) then 1 else 0 end
        + case when substr(info_ocr, 2, 1) <> substr(info_input, 2, 1) then 1 else 0 end
        + case when substr(info_ocr, 3, 1) <> substr(info_input, 3, 1) then 1 else 0 end
        + case when substr(info_ocr, 4, 1) <> substr(info_input, 4, 1) then 1 else 0 end
        + case when substr(info_ocr, 5, 1) <> substr(info_input, 5, 1) then 1 else 0 end
        + case when substr(info_ocr, 6, 1) <> substr(info_input, 6, 1) then 1 else 0 end
        + case when substr(info_ocr, 7, 1) <> substr(info_input, 7, 1) then 1 else 0 end
        + case when substr(info_ocr, 8, 1) <> substr(info_input, 8, 1) then 1 else 0 end
        + case when substr(info_ocr, 9, 1) <> substr(info_input, 9, 1) then 1 else 0 end
        + case when substr(info_ocr, 10, 1) <> substr(info_input, 10, 1) then 1 else 0 end as not_same_chr_cnt

from table1;



--方法2:将单行字符串拆成多行字符后判断并汇总
with
    tmp1 as (
        select
            id,
            info_ocr,
            info_input,
            info_ocr_chr,
            row_number() over () as row_num
        from table1
        cross join unnest(split(info_ocr,'')) as t (info_ocr_chr)
    ),
    
    tmp2 as (
        select
            id,
            info_ocr,
            info_input,
            info_input_chr,
            row_number() over () as row_num
        from table1
        cross join unnest(split(info_input,'')) as t (info_input_chr)
    ),
    
    tmp3 as (
        select
            tmp1.id,
            tmp1.info_ocr,
            tmp1.info_input,
            tmp1.info_ocr_chr,
            tmp2.info_input_chr,
            case when tmp1.info_ocr_chr = tmp2.info_input_chr then 1 else 0 end as is_chr_same
        from tmp1
        left join tmp2
        on tmp1.id = tmp2.id and tmp1.row_num = tmp2.row_num
    )
select
    id,
    info_ocr,
    info_input,
    sum(is_chr_same) as same_chr_cnt,
    10 - sum(is_chr_same) as not_same_chr_cnt
from tmp3
where info_ocr_chr <> ''
group by 1,2,3;

代码结果:

以上就是 SQL比较两个字符串相同的位数或不同的位数 的代码实现,方法1更直观通用,方法2也是一种思路,代码使用Presto SQL执行,其他SQL语言可能会略有差异。


原创文章,欢迎转载,转载请注明出处并留下原文链接。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注