以下hive版本3+,对应的hadoop也是3+
安装
下载
➜ ~ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
解压
➜ ~ tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/Apache/
配置环境变量
vim /etc/profile
...
export HIVE_HOME=/opt/Apache/apache-hive-3.1.2-bin
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH
...
启动
初始化数据库
hive
默认使用derby
数据库管理元数据,有很大缺陷,同一时间只能允许有一个hive
客户端。后面改用mysql
管理元数据。
- 初始化数据库元数据
➜ apache-hive-3.1.2-bin bin/schematool -dbType derby -initSchema
注意:这里大概率会报错,解决办法请参考:初始化derby数据库报错
- 启动客户端
➜ apache-hive-3.1.2-bin bin/hive
which: no hbase in (/opt/Java/jdk1.8.0_261/bin:/opt/Apache/apache-maven-3.6.3/bin:/opt/node-v12.18.4-linux-x64/bin:/opt/Apache/apache-ant-1.9.15/bin:/opt/Apache/hadoop-3.2.1/bin:/opt/Apache/apache-hive-3.1.2-bin/bin:/usr/local/bin:/usr/bin:/home/sairo/bin:/usr/local/sbin:/usr/sbin)
Hive Session ID = 9ea641f6-4c3b-49db-877e-93cf945cea77
Logging initialized using configuration in jar:file:/opt/Apache/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = 5dcd28fe-648a-4e5c-99cd-853719674c78
hive>
简单SQL操作
hive> show databases;
OK
default
Time taken: 0.864 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.056 seconds
hive> show tables;
OK
Time taken: 0.054 seconds
hive> create table test (id string, name string);
OK
Time taken: 0.837 seconds
hive> insert into test values('aaa', 'Tom');
Query ID = sairo_20201126194635_ab618e32-953d-4fd2-983b-dfc2d8abd4d2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1606377510588_0001, Tracking URL = http://dev-jsj.com:8088/proxy/application_1606377510588_0001/
Kill Command = /opt/Apache/hadoop-3.2.1/bin/mapred job -kill job_1606377510588_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-11-26 19:46:51,787 Stage-1 map = 0%, reduce = 0%
2020-11-26 19:46:58,055 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.96 sec
2020-11-26 19:47:05,333 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 9.81 sec
MapReduce Total cumulative CPU time: 9 seconds 810 msec
Ended Job = job_1606377510588_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://dev-jsj.com:9000/user/hive/warehouse/test/.hive-staging_hive_2020-11-26_19-46-35_648_9001647310701991693-1/-ext-10000
Loading data to table default.test
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 9.81 sec HDFS Read: 14565 HDFS Write: 243 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 810 msec
OK
Time taken: 32.519 seconds
使用MYSQL管理元数据
安装MYSQL
常规操作,直接跳过
提前创建好hive
元数据库
mysql> create database hive_metastore default charset utf8;
向hive中添加mysql驱动包
➜ apache-hive-3.1.2-bin cp /opt/Apache/repository/mysql/mysql-connector-java/8.0.13/mysql-connector-java-8.0.13.jar ./lib
配置JDBC连接参数
conf
目录下有很多配置文件模板,这里编辑hive-default.xml.template
并重命名为hive-site.xml
hive-site.xml
:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- jdbc 连接的 URL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://dev-jsj.com:3306/hive_metastore?useSSL=false&characterEncoding=utf8</value>
</property>
<!-- jdbc 连接的 Driver-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<!-- jdbc 连接的 username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- jdbc 连接的 password -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<!-- Hive 元数据存储版本的验证 -->
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<!--元数据存储授权-->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
</configuration>
注意:
- 配置jdbc url时,请将
&
改写为&
(&
符号在xml中有特殊语义,必须进行转义) - 如果使用官方自带的配置文件模板,请将所有
<property>
的子标签<description>
删除,官方用来解释这个配置属性的作用,但是很多描述带有特殊符号,不删除启动会报错。 - 有些教程可能会要求设置
hive
在hdfs
中的存储路径,如:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
如非特殊需求,不需要配置,默认值就是:/user/hive/warehouse
初始化数据库
➜ apache-hive-3.1.2-bin bin/schematool -dbType mysql -initSchema -verbose
-verbose
: 显示执行过程,不像之前那样一大段空白
到此为止,非常基础的配置已经OK,用户可以在命令行直接操作hive
,接下来配置如何远程连接hive
,类似启动hadoop
后台服务。
使用元数据服务的方式访问 Hive
- 第一步:修改配置文件
hive-site.xml
➜ apache-hive-3.1.2-bin vim conf/hive-site.xml
# 添加如下配置
<!-- 指定存储元数据要连接的地址 -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://dev-jsj.com:9083</value>
</property>
- 第二步:启动元数据服务
➜ apache-hive-3.1.2-bin bin/hive --service metastore
# 或者
➜ apache-hive-3.1.2-bin nohup bin/hive --service metastore &
提示:元数据服务是一个前台进程,默认会占用当前会话窗口,可以使用nohup
命令使其在后台运行
使用 JDBC 方式访问 Hive
所谓JDBC的方式就是在服务端开启hive
服务,即暴露端口,客户端可远程连接,类似于mysql
和hadoop
- 第一步:修改配置文件
hive-site.xml
➜ apache-hive-3.1.2-bin vim conf/hive-site.xml
# 添加如下配置
<!-- 指定 hiveserver2 连接的 host -->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>dev-jsj.com</value>
</property>
<!-- 指定 hiveserver2 连接的端口号 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
- 第二步:启动
hiveserver2
服务
➜ apache-hive-3.1.2-bin bin/hive --service hiveserver2
# 或者
➜ apache-hive-3.1.2-bin nohup bin/hive --service hiveserver2 &
提示:
- 启动
hiveserver2
前要先启动元数据服务 hiveserver2
服务是一个前台进程,默认会占用当前会话窗口,可以使用nohup
命令使其在后台运行
- 第三步:命令行模拟远程连接
➜ apache-hive-3.1.2-bin bin/beeline -u jdbc:hive2://dev-jsj.com:10000 -n sairo
其他配置
日志配置
在$HIVE_HOME/conf/
目录下有很多配置文件模板,找到hive-log4j2.properties.template
,配置相关属性就好。修改完成后记得重命名为hive-log4j2.properties
配置日志存放位置,默认在/tmp/{user}
目录
命令行格式配置
默认情况下在命令行中操作hive命令提示符非常简单,就像下面这样:
可以添加配置使得在命令提示符中带有数据库标识符。配置如下:
➜ apache-hive-3.1.2-bin vim conf/hive-site.xml
...
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
...
题外话
有兴趣的读者在完成以上操作后可以试试使用 HUE 连接hive
,可能会遇到很多坑,各种报错,笔者花了一下午才弄好。建议有时间爱折腾的读者试试。后面有时间笔者也会写篇文章介绍如何搭建HUE
环境。
问题解决
- 初始化
derby
数据库报错
➜ apache-hive-3.1.2-bin bin/schematool -dbType derby -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/Apache/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/Apache/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:448)
at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5104)
at org.apache.hive.beeline.HiveSchemaTool.<init>(HiveSchemaTool.java:96)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1473)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
原因:hive
自带的guava*.jar
与hadoop
的guava.jar
版本冲突,删除掉低版本替换为高版本
解决办法:
- 查找
hive
中guava*.jar
的位置
➜ apache-hive-3.1.2-bin find ./ -name "*guava*"
./lib/guava-19.0.jar
./lib/jersey-guava-2.25.1.jar
- 查看
hadoop
中guava*.jar
的位置,一般在/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/
目录
➜ apache-hive-3.1.2-bin find /opt/Apache/hadoop-3.2.1 -name "*guava*"
/opt/Apache/hadoop-3.2.1/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/guava-27.0-jre.jar
/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
- 删除
hive
中低版本guava
并替换为hadoop
的高版本guava
➜ apache-hive-3.1.2-bin mv ./lib/guava-19.0.jar ./lib/guava-19.0.jar.bak
➜ apache-hive-3.1.2-bin cp /opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/guava-27.0-jre.jar ./lib