SQL-窗口函数
窗口函数的应用非常广泛,涉及跨行计算的都需要应用窗口函数来实现,对窗口函数的掌握程度能够体现一个数据分析师的取数功底,因此,本文总结一下窗口函数。
我们先来看一下窗口函数的定义:
--函数定义 function(args) OVER ([PARTITION BY expression] [ORDER BY expression [ASC|DESC]]) --变种1 function(args) OVER () --变种2 function(args) OVER (PARTITION BY expression) --变种3 function(args) OVER (ORDER BY expression [ASC|DESC]) --变种4 function(args) OVER (PARTITION BY expression ORDER BY expression [ASC|DESC])
从语法定义来看,函数 + over () 就构成了窗口函数,但这里的函数并不支持所有函数,而是仅支持两类函数,其一是本身属于窗口函数的,其二是聚合函数。子语句PARTITION BY用于指定窗口的分组依据,子语句ORDER BY用于指定窗口的排序依据。
下面我们就列举下Presto SQL常用的天生的窗口函数,天生的意思是这些函数必须有 over 窗口才行:
- row_number() → bigint
返回序列号,常用于分组去重。
- rank() → bigint
返回排名,排名取值不连续。
- dense_rank() → bigint
返回排名,排名取值连续。
- first_value(x) → [same as input]
返回第1个值。
- last_value(x) → [same as input]
返回最后1个值。
- nth_value(x, offset) → [same as input]
返回第N个值。
- lag(x[, offset[, default_value]]) → [same as input]
返回当前行的前N行的值,默认 offset 值为1,也就是默认取前1个值。
- lead(x[, offset[, default_value]]) → [same as input]
返回当前行的后N行的值,默认 offset 值为1,也就是默认取后1个值。
应用举例→
假设我们有如下 table1,左边是 table1 原始数据,为了便于观察代码结果,我们将 table1 排序后的结果也展示出来,即右边的 table1_sorted:
示例代码:
select id, product, time, amount, row_number() over (partition by id order by time) as row_num, rank() over (order by amount) as rank_num, dense_rank() over (order by amount) as dense_rank_num, first_value(product) over (partition by id order by time) as first_product, last_value(product) over (partition by id order by time) as last_product, nth_value(product, 1) over (partition by id order by time) as nth_product_1, nth_value(product, 2) over (partition by id order by time) as nth_product_2, lag(product) over (partition by id order by time) as lag_product_1, lag(product,3) over (partition by id order by time) as lag_product_3, lead(product) over (partition by id order by time) as lead_product_1, lead(product,3) over (partition by id order by time) as lead_product_3 from table1 order by 1,2,3,4;
代码结果:
实际应用中,可按需使用 PARTITION BY、ORDER BY [ASC | DESC]、IGNORE NULLS, 而且除了以上天生的窗口函数外,窗口函数还支持聚合函数,包括:count()、count_if()、sum()、avg()、 array_agg()、 map_agg() 等等,我们可按需组合使用这些函数。
原创文章,欢迎转载,转载请注明出处并留下原文链接。