一、每个学科的成绩第一名是谁?
0 问题描述
基于学生成绩表输出每个科目的第一名是谁呢?
1 数据准备
with t1 as
(
select
'zs' as name,
'[{"Chinese":80},{"Math":70},{"English":60}]' as score_ext
union all
select
'ls' as name,
'[{"Chinese":90},{"Math":70},{"English":90}]' as score_ext
union all
select
'ww' as name,
'[{"Chinese":60},{"Math":90},{"English":80}]' as score_ext
),
t2 as
(
select
name,
-- 需要把 [] 和 " 和 {} 都给去掉,方面后面操作
regexp_replace(score_ext, '\\[|\\{|\\}|\\"|\\]', '') as scores
from t1
),
t3 as (
select
name,
split(score, ":")[0] as course,
split(score, ":")[1] as score
from t2
lateral view explode(split(scores, ',')) expl as score
)
select
name,
course,
score
from (
select
course,
name,
score,
row_number() over (partition by course order by score desc) as rn
from t3
) t4
where rn = 1;
2 数据分析
步骤1:采用 regexp_replace(score_ext, '\\[|\\{|\\}|\\"|\\]', '') as scores,去除相关符号后,t2输出:
步骤2:基于t2表进行lateral view explode 侧写炸裂时,t3输出:
步骤3:基于t3表进行 row_number() over (partition by course order by score desc) as rn
开窗排序(分组topN),t4输出:
3 小结
该题目利用(行转列)炸裂+开窗进行 解决;用到的Hive正则表达式如下:
(06)Hive——正则表达式_hive 正则表达式-CSDN博客