本文总结一下Presto SQL的优化小技巧,简化代码或者提高执行效率。

1. 使用with语句减少嵌套子查询和中间表

在做关联判断或者打标签的时候,需要嵌套子查询或者创建中间表,此时可以考虑使用with语句避免嵌套子查询或者创建过多的过程表。

with 
    t1 as (
        select
            cust_id
        from loan
        where apply_date < current_date and apply_date >= current_date - interval '30' day
        group by 1
    )
select
    t2.cust_id,
    case when t1.cust_id is not null then 1 else 0 end as is_apply_30d
from detail as t2
left join t1
on t2.cust_id = t1.cust_id;
with
    a as (select cust_id from table1 where ...),
    b as (select cust_id from table2 where ...)
select
    ...
from table3 as t
left join a
on t.cust_id = a.cust_id
left join b
on t.cust_id = b.cust_id;

2. 提高row_number() over ()的执行效率

偶尔可能会碰到row_number() over ()执行很慢甚至内存溢出报错的情况,此时可以考虑加上where条件 or rand() < 0 以提高执行效率。Presto默认使用的是TopNRowNumberOperator执行器,加上 or rand() < 0 能诱导Presto使用Window执行器,从而在不影响结果的同时提高执行效率。

select
    cust_id,
    order_no,
    order_type,
    order_date,
    order_time
from 
    (
    select
        cust_id,
        order_no,
        order_type,
        order_date,
        order_time,
        row_number() over (partition by cust_id, order_date order by order_time) as row_n
    from order_detail
    ) as t
where t.row_n= 1 or rand() < 0;

3. 减少类型转换

--假设order_detail表中的order_date类型为字符, 存储格式为yyyy-mm-dd, 第1种写法要优于第2种写法

--第1种写法
select
    order_no,
    order_date
from order_detail
where order_date >= '2021-09-01'

--第2种写法
select
    order_no,
    order_date
from order_detail
where date(order_date) >= date'2021-09-01'  --字段order_date每一个值都要做一次date()转换

4. 使用join语句代替where in和子查询

where in ()where not in () 所用到的表或所产生的结果数据量较大时,语句执行效率就会明显下降,可以考虑改写为 join 语句,同时还可以结合使用前文说的 with 语句。

--where in
select
    ...
from table1
where cust_id in (select cust_id from table2);

--可改写为:
select
    ...
from table1 as t1
inner join table2 as t2
on t1.cust_id = t2.cust_id;


--where not in
select
    ...
from table1
where cust_id not in (select cust_id from table2 where type = 'xxx');

--可改写为:
select
    ...
from table1 as t1
left join table2 as t2
on t1.cust_id = t2.cust_id and t2.type = 'xxx'
where t2.cust_id is null;


原创文章,欢迎转载,转载请注明出处并留下原文链接。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注