0
点赞
收藏
分享

微信扫一扫

Hive--笔试题05_2--求TopN

witmy 2023-03-16 阅读 111


现在有这样一份数据:

1,huangxiaoming,45,a-c-d-f
2,huangzitao,36,b-c-d-e
3,huanglei,41,c-d-e
4,liushishi,22,a-d-e
5,liudehua,39,e-f-d
6,liuyifei,35,a-d-e

 

字段的意义:

id,name,age,favors
id,姓名,年龄,爱好

其中需要注意的是:每一条记录中的爱好有多个值,以"-"分隔

 

需求:

求出每种爱好中,年龄最大的两个人(爱好,年龄,姓名)
注意思考一个问题:如果某个爱好中的第二大年龄有多个相同的怎么办?

 

解题:

建表准备:
 

create database if not exists exercise;
use exercise;
drop table if exists exercise5;
create table exercise5(id int, name string, age int, favors string) row format delimited fields terminated by ",";
load data local inpath "/home/hadoop/exercise5.txt" into table exercise5;
select * from exercise5;
desc exercise5;

 

思路分析

需要把这种数据:

6,liuyifei,35,a-d-e

变成:

6,liuyifei,35,a
6,liuyifei,35,d
6,liuyifei,35,e

SQL实现的结果:

select explode(split("a-d-e", "-"));   √√√√√√√√
select id,name,age, explode(split(fovors), "-") from exercise5; xxxxxxx

必须要借助于虚拟视图技术:
leteral view

改写:

select a.id as id, a.name as name, a.age as age,  favor_view.favor 
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor;

求出每种爱好的最大的年龄的最终的SQL:

select aa.favor, max(aa.age) as maxage 
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa
group by aa.favor;

结果:

a       45
b 36
c 45
d 45
e 41
f 45

但是,如果需求扩展:
两个需求:
1、你如何帮我把这个年龄的姓名拿出来呢?
2、如果要去每种爱好中的前2名呢?

拓展一下:如果能给每一组中的每个人按照年龄降序排序,然后分配一个组内的序号。那么将来查询数据的时候根据序号去查询将变的非常容易

select aa.favor, aa.age 
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa order by aa.favor, aa.age desc;

// 这是需求数据格式,但是上面的SQL语句实现不了

id    age    rank
a 45 1
a 35 2
a 22 3
b 36 1
c 45 1
c 41 2
c 36 3
d 45 1
d 41 2
d 39 3
d 36 4
d 35 5
d 22 6
e 41 1
e 39 2
e 36 3
e 35 4
e 22 5
f 45 1
f 39 2

如果有上面的数据了,那么要筛选出每种爱好中的年龄前2名的人的信息,就容易了。

select id, name from table where rank <= 2;

// 利用窗口分析函数,添加序号:

select aa.id, aa.name, aa.age, aa.favor, 
row_number() over (distribute by aa.favor sort by aa.age desc) as index
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa ;

结果数据:

1       huangbo        45      a       1
6 liuyifei 35 a 2
4 liushishi 22 a 3
2 xuzheng 36 b 1
1 huangbo 45 c 1
3 huanglei 41 c 2
2 xuzheng 36 c 3
1 huangbo 45 d 1
3 huanglei 41 d 2
5 liudehua 39 d 3
2 xuzheng 36 d 4
6 liuyifei 35 d 5
4 liushishi 22 d 6
3 huanglei 41 e 1
5 liudehua 39 e 2
2 xuzheng 36 e 3
6 liuyifei 35 e 4
4 liushishi 22 e 5
1 huangbo 45 f 1
5 liudehua 39 f 2

// 最终SQL的具体实现:利用窗口分析函数去做

select c.favor, c.name, c.age from 
(
select aa.id, aa.name, aa.age, aa.favor,
row_number() over (distribute by aa.favor sort by aa.age desc) as index
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa
) c
where c.index <= 2;

每种爱好中,年龄最大的前2个人:

a       huangbo        45
a liuyifei 35
b xuzheng 36
c huangbo 45
c huanglei 41
d huangbo 45
d huanglei 41
e huanglei 41
e liudehua 39
f huangbo 45
f liudehua 39

 

至此结束!!!!希望大家有所收获

 

举报

相关推荐

0 条评论