案例需求:
RFM模型的复习
商用航空行业LRFCM模型的推广:
-------------------------------------本次练习基于数据源提取上述五个指标分析航空客户商业价值。
数据源:数据源
数据指标说明:
Hive操作:
构建数据库表:
reate database air_data_base;
use air_data_base;
create table air_data_table(
member_no string,
ffp_date string,
first_flight_date string,
gender string,
ffp_tier int,
work_city string,
work_province string,
work_country string,
age int,
load_time string,
flight_count int,
bp_sum bigint,
ep_sum_yr_1 int,
ep_sum_yr_2 bigint,
sum_yr_1 bigint,
sum_yr_2 bigint,
seg_km_sum bigint,
weighted_seg_km double,
last_flight_date string,
avg_flight_count double,
avg_bp_sum double,
begin_to_first int,
last_to_end int,
avg_interval float,
max_interval int,
add_points_sum_yr_1 bigint,
add_points_sum_yr_2 bigint,
exchange_count int,
avg_discount float,
p1y_flight_count int,
l1y_flight_count int,
p1y_bp_sum bigint,
1y_bp_sum bigint,
ep_sum bigint,
add_point_sum bigint,
eli_add_point_sum bigint,
l1y_eli_add_points bigint,
points_sum bigint,
l1y_points_sum float,
ration_l1y_flight_count float,
ration_p1y_flight_count float,
ration_p1y_bps float,
ration_l1y_bps float,
point_notflight int
)
row format delimited fields terminated by ',';
将数据源上传到Linux本地文件夹中,再从本地上传到hive数据库中:
load data local inpath '/home/hadoop/air_data.csv' overwrite into table air_data_table;
select * from air_data_table limit 20;
create table sum_seg_avg_null as select * from
(select count(*) as sum_yr_1_null_count from air_data_table where sum_yr_1 is null) sum_yr_1,
(select count(*) as seg_km_sum_null from air_data_table where seg_km_sum is null) seg_km_sum,
(select count(*) as avg_discount_null from air_data_table where avg_discount is null) avg_discount;
create table sum_seg_avg_min as select
min(sum_yr_1) as sum_yr_1,
min(seg_km_sum) as seg_km_sum,
min(avg_discount) as avg_discount
from air_data_table;
数据清洗:
create table sas_not_0 as
select * from air_data_table
where sum_yr_1 is not null and
avg_discount <> 0 and
seg_km_sum > 0;
提取有用数据项:
create table flfasl as select ffp_date,load_time,flight_count,avg_discount,seg_km_sum,last_to_end from sas_not_0;
select * from flfasl limit 10;
create table lrfmc as
select
round((unix_timestamp(load_time,'yyyy/MM/dd')-unix_timestamp(ffp_date,'yyyy/MM/dd'))/(30*24*60*60),2) as l,
round(last_to_end/30,2) as r,
flight_count as f,
seg_km_sum as m,
round(avg_discount,2) as c
from flfasl;
数据标准化:
create table standard_lrfmc as
select (lrfmc.l-minlrfmc.l)/(maxlrfmc.l-minlrfmc.l) as l,
(lrfmc.r-minlrfmc.r)/(maxlrfmc.r-minlrfmc.r) as r,
(lrfmc.f-minlrfmc.f)/(maxlrfmc.f-minlrfmc.f) as f,
(lrfmc.m-minlrfmc.m)/(maxlrfmc.m-minlrfmc.m) as m,
(lrfmc.c-minlrfmc.c)/(maxlrfmc.c-minlrfmc.c) as c
from lrfmc,
(select max(l) as l,max(r) as r,max(f) as f,max(m) as m,max(c) as c from lrfmc) as maxlrfmc,
(select min(l) as l,min(r) as r,min(f) as f,min(m) as m,min(c) as c from lrfmc) as minlrfmc;
数据挖掘:(客户分类)未完待续……
参考资料:
26个数据分析案例——第二站:基于Hive的民航客户价值分析
航空公司客户价值分析模型LRFCM