1 窗口函数语法
常用的分析函数
常用的专用窗口函数
窗口函数
窗口函数的3个组成部分可以单独使用,也可以混合使用,也可以全部不用
partition by
对指定的字段进行分组,后续都会以组为单位,把每个分组单独作为一个窗口进行统计分析操作。
with temp as(
select 'A' as col1,1 as col2
union all
select 'A' as col1,1 as col2
union all
select 'B' as col1,1 as col2
)
select
col1
,sum(col2) over(partition by col1) as '对窗口中的数据求和'
from temp
输出结果:
col 对窗口中的数据求和
A 2
A 2
B 1
with temp as(
select 'A' as col1,1 as col2
union all
select 'A' as col1,1 as col2
union all
select 'B' as col1,1 as col2
)
select
col1
,sum(col2) over() as '对整体数据求和'
from temp
输出结果:
col 对整体数据求和
A 3
A 3
B 3
order by
order by 与 partition by 连用的时候,可以对各个分组内的数据,按照指定的字段进行排序。如果没有 partition by 指定分组字段,那么会对全局的数据进行排序。
with temp as(
select 'A' as col1,1 as col2
union all
select 'C' as col1,1 as col2
union all
select 'B' as col1,1 as col2
)
select col1,row_number() over(order by col1 desc) as 排序 from temp
输出结果:
col1 排序
C 1
C 2
B 3
A 4
with temp_01 as(
select 'A' as col1,1 as col2
union all
select 'D' as col1,1 as col2
union all
select 'C' as col1,1 as col2
union all
select 'B' as col1,1 as col2
)
select col1,sum(col2) over(order by col1) as 求累计 from temp_01
输出结果:
col1 求累计
A 1
B 2
C 3
D 4
with temp_02 as(
select 'A' as col1,1 as col2
union all
select 'C' as col1,1 as col2
union all
select 'C' as col1,1 as col2
union all
select 'B' as col1,1 as col2
)
select col1,sum(col2) over(order by col1) as 求累计 from temp_02
输出结果:
col1 求累计
A 1
B 2
C 4
C 4
with temp_01 as(
select 'A' as col1,'b' as col2,1 as col3
union all
select 'A' as col1,'a' as col2,1 as col3
union all
select 'C' as col1,'a' as col2,1 as col3
union all
select 'C' as col1,'b' as col2,1 as col3
)
select col1,sum(col3) over(partition by col1 order by col1,col2) as 求累计 from temp_01
输出结果:
col1 求累计
A 1
A 2
C 1
C 2
with temp_02 as(
select 'A' as col1,'b' as col2,1 as col3
union all
select 'A' as col1,'a' as col2,1 as col3
union all
select 'C' as col1,'a' as col2,1 as col3
union all
select 'C' as col1,'a' as col2,1 as col3
)
select col1,sum(col3) over(partition by col1 order by col1,col2) as 求累计 from temp_02
输出结果:
col1 求累计
A 1
A 2
C 2
C 2
rows between 开始位置 and 结束位置
rows between 是用来划分窗口中,函数发挥作用的数据范围。我们用如下例子加深 rows between 的理解。
rows between 常用的参数如下:
① n preceding:往前
② n following:往后
③ current row:当前行
④ unbounded:起点(一般结合preceding,following使用)
使用例子如下:
rows between unbounded preceding and current row与 partition by 、order by 连用,可以产生对窗口中的数据求累计数的效果。
with temp as(
select 'A' as col1,1 as col2
union all
select 'A' as col1,1 as col2
union all
select 'B' as col1,1 as col2
)
select
col1
,sum(col2) over(partition by col1 order by col2 desc rows between unbounded preceding and current row) as '对窗口中的数据求和'
from temp
输出结果:
col1 对窗口中的数据求和
A 1
A 2
B 1
- 排序窗口函数
2.1 排序并产生自增编号,自增编号不重复且连续
我们可以使用函数:row_number() over()
数据样例:
col1 ranks
a 1
b 2
b 3
b 4
c 5
d 6
具体语法如下:
> row_number() over(partition by 列名 order by 列名 rows between 开始位置 and
> 结束位置)
案例如下:
>with temp as(
select 'a' as col1
union all
select 'b' as col1
union all
select 'b' as col1
union all
select 'b' as col1
union all
select 'c' as col1
union all
select 'd' as col1
)
>
>select col1,row_number() over(order by col1) as ranks from temp
输出结果:
col1 rank
a 1
b 2
b 3
b 4
c 5
d 6
2.2 排序并产生自增编号,自增编号会重复且不连续
我们可以使用函数:rank() over()
数据样例:
col1 ranks
a 1
b 2
b 2
b 2
c 5
d 6
具体语法如下:
案例如下:
with temp as(
select 'a' as col1
union all
select 'b' as col1
union all
select 'b' as col1
union all
select 'b' as col1
union all
select 'c' as col1
union all
select 'd' as col1
)
select col1,rank() over(order by col1) as ranks from temp
输出结果:
col1 rank
a 1
b 2
b 2
b 2
c 5
d 6
2.3 排序并产生自增编号,自增编号会重复且连续
我们可以使用函数:dense_rank() over()
数据样例:
col1 ranks
a 1
b 2
b 2
b 2
c 3
d 4
具体语法如下:
案例如下:
with temp as(
select 'a' as col1
union all
select 'b' as col1
union all
select 'b' as col1
union all
select 'b' as col1
union all
select 'c' as col1
union all
select 'd' as col1
)
select col1,dense_rank() over(order by col1) as ranks from temp
输出结果:
col1 ranks
a 1
b 2
b 2
b 2
c 3
d 4
聚合窗口函数
3.1 求窗口中的累计值
我们可以使用:sum() over()
with temp as(
select 'A' as col1,1 as col2
union all
select 'A' as col1,1 as col2
union all
select 'A' as col1,1 as col2
union all
select 'B' as col1,1 as col2
union all
select 'B' as col1,1 as col2
)
select
col1
,sum(col2) over(partition by col1 order by col2 desc rows between unbounded preceding and current row) as '对窗口中的数据求和'
from temp
输出结果:
col1 对窗口中的数据求和
A 1
A 2
A 3
B 1
B 1
3.2 求窗口中 3 天的平均价格
我们可以使用 avg() over()
with temp as(
select 'A' as col1,'2022-11-01' as date_time,50 as price
union all
select 'A' as col1,'2022-11-02' as date_time,60 as price
union all
select 'A' as col1,'2022-11-03' as date_time,45 as price
union all
select 'A' as col1,'2022-11-04' as date_time,70 as price
union all
select 'A' as col1,'2022-11-05' as date_time,40 as price
union all
select 'A' as col1,'2022-11-06' as date_time,40 as price
union all
select 'B' as col1,'2022-11-01' as date_time,40 as price
union all
select 'B' as col1,'2022-11-02' as date_time,30 as price
union all
select 'B' as col1,'2022-11-03' as date_time,50 as price
union all
select 'B' as col1,'2022-11-04' as date_time,50 as price
)
select
col1
,date_time
,price
,avg(price) over(partition by col1 order by date_time rows between 2 preceding and current row) as '3天的平均价格'
from temp
输出结果:
col1 date_time price 3天的平均价格
A 2022-11-01 50 50
A 2022-11-02 60 55
A 2022-11-03 45 51.666666666666664
A 2022-11-04 70 58.333333333333336
A 2022-11-05 40 51.666666666666664
A 2022-11-06 40 50
B 2022-11-01 40 40
B 2022-11-02 30 35
B 2022-11-03 50 40
B 2022-11-01 50 43.333333333333336
3.3 求分组中的最大值/最小值
with temp_01 as(
select 'A' as col1,10 as col2
union all
select 'C' as col1,10 as col2
union all
select 'C' as col1,20 as col2
union all
select 'A' as col1,20 as col2
union all
select 'A' as col1,20 as col2
)
select
col1
,col2
,max(col2) over(partition by col1) as 窗口中的最大值
,min(col2) over(partition by col1) as 窗口中的最小值
from temp_01
输出结果:
col1 col2 窗口中的最大值 窗口中的最小值
A 10 20 10
A 20 20 10
A 20 20 10
C 10 20 10
C 20 20 10
3.4 求分组中的总记录数
with temp_01 as(
select 'A' as col1,'a' as col2
union all
select 'C' as col1,'a' as col2
union all
select 'C' as col1,'a' as col2
union all
select 'A' as col1,'b' as col2
union all
select 'A' as col1,'b' as col2
)
select
col1
,col2
,count(col2) over(partition by col1) as 分组中的记录数
from temp_01
输出结果:
col1 col2 分组中的记录数
A a 3
A b 3
A b 3
C a 2
C a 2
- 位移窗口函数
4.1 获取分组中往前 n 行的值
基础语法:
语法解析:
-
field 是指定的列名
-
n 是往前的行数
-
行往前导致的,最后的 n 行值为 null,可以用 default_value 代替。
使用案例:
with temp_01 as(
select 'A' as col1,'2022-12-01' as date_time
union all
select 'C' as col1,'2022-12-01' as date_time
union all
select 'C' as col1,'2022-12-02' as date_time
union all
select 'A' as col1,'2022-12-02' as date_time
union all
select 'A' as col1,'2022-12-03' as date_time
)
select
col1
,date_time
,lead(date_time,1,'2022-12-30') over(partition by col1 order by date_time) as 往前n行的值
from temp_01
输出结果:
col1 date_time 往前n行的值
A 2022-12-01 2022-12-02
A 2022-12-02 2022-12-03
A 2022-12-03 2022-12-30
C 2022-12-01 2022-12-02
C 2022-12-02 2022-12-30
4.2 获取分组中往后 n 行的值
基础语法:
语法解析:
-
field 是指定的列名
-
n 是往前的行数
-
行往后导致的,前面的 n 行值为 null,可以用 default_value 代替。
使用案例:
with temp_01 as(
select 'A' as col1,'2022-12-01' as date_time
union all
select 'C' as col1,'2022-12-01' as date_time
union all
select 'C' as col1,'2022-12-02' as date_time
union all
select 'A' as col1,'2022-12-02' as date_time
union all
select 'A' as col1,'2022-12-03' as date_time
)
select
col1
,date_time
,lag(date_time,1,'2022-12-30') over(partition by col1 order by date_time) as 往前n行的值
from temp_01
输出结果:
col1 date_time 往前n行的值
A 2022-12-01 2022-12-30
A 2022-12-02 2022-12-01
A 2022-12-03 2022-12-02
C 2022-12-01 2022-12-30
C 2022-12-02 2022-12-01
极值窗口函数
5.1 获取分组内第一行的值
我们可以使用 first_value(col,true/false) over(),作用是:取分组内排序后,截止到当前行,第一个值。
with temp_01 as(
select 'A' as col1,'b' as col2
union all
select 'C' as col1,'a' as col2
union all
select 'C' as col1,'b' as col2
union all
select 'A' as col1,'a' as col2
union all
select 'A' as col1,'b' as col2
)
select
col1
,first_value(col2) over(partition by col1 order by col2) as 第一个值
from temp_01
输出结果:
col1 第一个值
A a
A a
A a
C a
C a
select
col1
,first_value(col2) over(partition by col1) as 第一个值
from temp_01
输出结果:
col1 第一个值
A b
A b
A b
C a
C a
5.2 获取分组内最后一行的值
我们可以使用 last_value(col,true/false) over(),作用是:取分组内排序后,截止到当前行,最后一个值。所以,如果使用 order by 排序的时候,想要取最后一个值,需要与 rows between unbounded preceding and unbounded following 连用。
with temp_01 as(
select 'A' as col1,'b' as col2
union all
select 'C' as col1,'a' as col2
union all
select 'C' as col1,'b' as col2
union all
select 'A' as col1,'a' as col2
union all
select 'A' as col1,'b' as col2
)
select
col1
,last_value(col2) over(partition by col1 order by col2 rows between unbounded preceding and unbounded following) as 第一个值
from temp_01
输出结果:
col1 第一个值
A b
A b
A b
C b
C b
相信大家都发现了,在本案例中,我们使用 order by 的时候与 rows between unbounded preceding and unbounded following 连用了,这是需要注意的一个点,如果不连用,将会产生以下效果:
with temp_01 as(
select 'A' as col1,'b' as col2
union all
select 'C' as col1,'a' as col2
union all
select 'C' as col1,'b' as col2
union all
select 'A' as col1,'a' as col2
union all
select 'A' as col1,'b' as col2
)
select
col1
,last_value(col2) over(partition by col1 order by col2) as 第一个值
from temp_01
输出结果:
col1 第一个值
A a
A b
A b
C a
C b
- 分箱窗口函数
ntile() over() 分箱窗口函数,用于将分组数据按照顺序切分成 n 片,返回当前切片值,如果切片不均匀,默认增加到第一个切片中。
案例:查询成绩前 20% 的人。
with temp as(
select 'A' as col1,90 as grade
union all
select 'B' as col1,80 as grade
union all
select 'C' as col1,82 as grade
union all
select 'D' as col1,99 as grade
union all
select 'E' as col1,100 as grade
union all
select 'F' as col1,92 as grade
union all
select 'G' as col1,93 as grade
union all
select 'H' as col1,85 as grade
union all
select 'I' as col1,95 as grade
union all
select 'J' as col1,70 as grade
)
select
col1
,grade
from
(select
col1
,grade
,ntile(5) over(order by grade desc) as level
from temp
)t1
where t1.level = 1
输出结果:
col1 grade
E 100
D 99
转载:https://zhuanlan.zhihu.com/p/587440793