窗口函数的应用非常广泛,涉及跨行计算的都需要应用窗口函数来实现,对窗口函数的掌握程度能够体现一个数据分析师的取数功底,因此,本文总结一下窗口函数。

我们先来看一下窗口函数的定义:

--函数定义
function(args) OVER ([PARTITION BY expression]  [ORDER BY expression [ASC|DESC]])


--变种1
function(args) OVER ()


--变种2
function(args) OVER (PARTITION BY expression)


--变种3
function(args) OVER (ORDER BY expression [ASC|DESC])


--变种4
function(args) OVER (PARTITION BY expression  ORDER BY expression [ASC|DESC])

从语法定义来看,函数 + over () 就构成了窗口函数,但这里的函数并不支持所有函数,而是仅支持两类函数,其一是本身属于窗口函数的,其二是聚合函数。子语句PARTITION BY用于指定窗口的分组依据,子语句ORDER BY用于指定窗口的排序依据。

下面我们就列举下Presto SQL常用的天生的窗口函数,天生的意思是这些函数必须有 over 窗口才行:

  • row_number() → bigint

返回序列号,常用于分组去重。

  • rank() → bigint

返回排名,排名取值不连续。

  • dense_rank() → bigint

返回排名,排名取值连续。

  • first_value(x) → [same as input]

返回第1个值。

  • last_value(x) → [same as input]

返回最后1个值。

  • nth_value(x, offset) → [same as input]

返回第N个值。

  • lag(x[, offset[, default_value]]) → [same as input]

返回当前行的前N行的值,默认 offset 值为1,也就是默认取前1个值。

  • lead(x[, offset[, default_value]]) → [same as input]

返回当前行的后N行的值,默认 offset 值为1,也就是默认取后1个值。


应用举例→

假设我们有如下 table1,左边是 table1 原始数据,为了便于观察代码结果,我们将 table1 排序后的结果也展示出来,即右边的 table1_sorted:

示例代码:

select
	id,
	product,
	time,
	amount,

	row_number() over (partition by id order by time) as row_num,
	rank() over (order by amount) as rank_num,
	dense_rank() over (order by amount) as dense_rank_num,

	first_value(product) over (partition by id order by time) as first_product,
	last_value(product) over (partition by id order by time) as last_product,
	nth_value(product, 1) over (partition by id order by time) as nth_product_1,
	nth_value(product, 2) over (partition by id order by time) as nth_product_2,

	lag(product) over (partition by id order by time) as lag_product_1,
	lag(product,3) over (partition by id order by time) as lag_product_3,
	lead(product) over (partition by id order by time) as lead_product_1,
	lead(product,3) over (partition by id order by time) as lead_product_3
from table1
order by 1,2,3,4;

代码结果:


实际应用中,可按需使用 PARTITION BY、ORDER BY [ASC | DESC]、IGNORE NULLS, 而且除了以上天生的窗口函数外,窗口函数还支持聚合函数,包括:count()、count_if()、sum()、avg()、 array_agg()、 map_agg() 等等,我们可按需组合使用这些函数。


原创文章,欢迎转载,转载请注明出处并留下原文链接。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注