Presto SQL-优化小技巧
本文总结一下Presto SQL的优化小技巧,简化代码或者提高执行效率。
1. 使用with语句减少嵌套子查询和中间表
在做关联判断或者打标签的时候,需要嵌套子查询或者创建中间表,此时可以考虑使用with语句避免嵌套子查询或者创建过多的过程表。
with t1 as ( select cust_id from loan where apply_date < current_date and apply_date >= current_date - interval '30' day group by 1 ) select t2.cust_id, case when t1.cust_id is not null then 1 else 0 end as is_apply_30d from detail as t2 left join t1 on t2.cust_id = t1.cust_id;
with a as (select cust_id from table1 where ...), b as (select cust_id from table2 where ...) select ... from table3 as t left join a on t.cust_id = a.cust_id left join b on t.cust_id = b.cust_id;
2. 提高row_number() over ()的执行效率
偶尔可能会碰到row_number() over ()执行很慢甚至内存溢出报错的情况,此时可以考虑加上where条件 or rand() < 0 以提高执行效率。Presto默认使用的是TopNRowNumberOperator执行器,加上 or rand() < 0 能诱导Presto使用Window执行器,从而在不影响结果的同时提高执行效率。
select cust_id, order_no, order_type, order_date, order_time from ( select cust_id, order_no, order_type, order_date, order_time, row_number() over (partition by cust_id, order_date order by order_time) as row_n from order_detail ) as t where t.row_n= 1 or rand() < 0;
3. 减少类型转换
--假设order_detail表中的order_date类型为字符, 存储格式为yyyy-mm-dd, 第1种写法要优于第2种写法 --第1种写法 select order_no, order_date from order_detail where order_date >= '2021-09-01' --第2种写法 select order_no, order_date from order_detail where date(order_date) >= date'2021-09-01' --字段order_date每一个值都要做一次date()转换
4. 使用join语句代替where in和子查询
当 where in () 或 where not in () 所用到的表或所产生的结果数据量较大时,语句执行效率就会明显下降,可以考虑改写为 join 语句,同时还可以结合使用前文说的 with 语句。
--where in select ... from table1 where cust_id in (select cust_id from table2); --可改写为: select ... from table1 as t1 inner join table2 as t2 on t1.cust_id = t2.cust_id; --where not in select ... from table1 where cust_id not in (select cust_id from table2 where type = 'xxx'); --可改写为: select ... from table1 as t1 left join table2 as t2 on t1.cust_id = t2.cust_id and t2.type = 'xxx' where t2.cust_id is null;
原创文章,欢迎转载,转载请注明出处并留下原文链接。