需求:
1.hive建表语句,导入数据语句,从linux文件系统导入
create table student(
stu_name string,
course string,
score int
)
row format delimited
fields terminated by ',';
load data local inpath '/root/stu.txt' into table student;
select * from default.student;
2.使用sparksql查询上述表,期望结果如下,并且将计算结果添加到mysql
【注:表名前边加上数据库。如:default.student】
from pyspark.sql import SparkSession
if __name__ == '__main__':
spark = SparkSession.builder \
.appName("测试") \
.config("hive.metastore.uris", 'thrift://hadoop11:9083') \
.enableHiveSupport() \
.getOrCreate()
df = spark.sql("""
select stu_name,
max(case when course='语文' then score else null end) chinese,
max(case when course='数学' then score else null end) math,
max(case when course='英语' then score else null end) english
from default.student
group by stu_name
""")
props = {'user': 'root', 'password': '123456', 'driver': 'com.mysql.jdbc.Driver'}
df.write.jdbc(url='jdbc:mysql://hadoop11:3306/test1', table='df_student', properties=props)
spark.stop()
sql结果: