0
点赞
收藏
分享

微信扫一扫

大数据技术-hive3

AbrahamW 2023-10-09 阅读 25

以下hive版本3+,对应的hadoop也是3+

安装

大数据技术-hive3_hive

下载

➜  ~ wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

解压

➜  ~ tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/Apache/

配置环境变量

vim /etc/profile
...
export HIVE_HOME=/opt/Apache/apache-hive-3.1.2-bin
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH
...

启动

初始化数据库

hive默认使用derby数据库管理元数据,有很大缺陷,同一时间只能允许有一个hive客户端。后面改用mysql管理元数据。

  • 初始化数据库元数据

➜  apache-hive-3.1.2-bin bin/schematool -dbType derby -initSchema

注意:这里大概率会报错,解决办法请参考:初始化derby数据库报错

  • 启动客户端

➜  apache-hive-3.1.2-bin bin/hive
which: no hbase in (/opt/Java/jdk1.8.0_261/bin:/opt/Apache/apache-maven-3.6.3/bin:/opt/node-v12.18.4-linux-x64/bin:/opt/Apache/apache-ant-1.9.15/bin:/opt/Apache/hadoop-3.2.1/bin:/opt/Apache/apache-hive-3.1.2-bin/bin:/usr/local/bin:/usr/bin:/home/sairo/bin:/usr/local/sbin:/usr/sbin)
Hive Session ID = 9ea641f6-4c3b-49db-877e-93cf945cea77

Logging initialized using configuration in jar:file:/opt/Apache/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = 5dcd28fe-648a-4e5c-99cd-853719674c78
hive>

简单SQL操作

hive> show databases;
OK
default
Time taken: 0.864 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.056 seconds
hive> show tables;
OK
Time taken: 0.054 seconds
hive> create table test (id string, name string);
OK
Time taken: 0.837 seconds
hive> insert into test values('aaa', 'Tom');
Query ID = sairo_20201126194635_ab618e32-953d-4fd2-983b-dfc2d8abd4d2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1606377510588_0001, Tracking URL = http://dev-jsj.com:8088/proxy/application_1606377510588_0001/
Kill Command = /opt/Apache/hadoop-3.2.1/bin/mapred job  -kill job_1606377510588_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-11-26 19:46:51,787 Stage-1 map = 0%,  reduce = 0%
2020-11-26 19:46:58,055 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.96 sec
2020-11-26 19:47:05,333 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 9.81 sec
MapReduce Total cumulative CPU time: 9 seconds 810 msec
Ended Job = job_1606377510588_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://dev-jsj.com:9000/user/hive/warehouse/test/.hive-staging_hive_2020-11-26_19-46-35_648_9001647310701991693-1/-ext-10000
Loading data to table default.test
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 9.81 sec   HDFS Read: 14565 HDFS Write: 243 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 810 msec
OK
Time taken: 32.519 seconds

使用MYSQL管理元数据

安装MYSQL

常规操作,直接跳过

提前创建好hive元数据库

mysql> create database hive_metastore default charset utf8;

向hive中添加mysql驱动包

➜  apache-hive-3.1.2-bin cp /opt/Apache/repository/mysql/mysql-connector-java/8.0.13/mysql-connector-java-8.0.13.jar ./lib

配置JDBC连接参数

conf目录下有很多配置文件模板,这里编辑hive-default.xml.template并重命名为hive-site.xml

大数据技术-hive3_hive_02

hive-site.xml:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- jdbc 连接的 URL -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://dev-jsj.com:3306/hive_metastore?useSSL=false&characterEncoding=utf8</value>
  </property>
  <!-- jdbc 连接的 Driver-->  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  <!-- jdbc 连接的 username-->
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>  
  </property>
  <!-- jdbc 连接的 password -->
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>
  <!-- Hive 元数据存储版本的验证 -->
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <!--元数据存储授权-->
  <property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
  </property>
</configuration>

注意:

  • 配置jdbc url时,请将&改写为&amp;&符号在xml中有特殊语义,必须进行转义)
  • 如果使用官方自带的配置文件模板,请将所有<property>的子标签<description>删除,官方用来解释这个配置属性的作用,但是很多描述带有特殊符号,不删除启动会报错。
  • 有些教程可能会要求设置hivehdfs中的存储路径,如:

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
</property>

如非特殊需求,不需要配置,默认值就是:/user/hive/warehouse

初始化数据库

➜  apache-hive-3.1.2-bin  bin/schematool -dbType mysql -initSchema -verbose

-verbose: 显示执行过程,不像之前那样一大段空白

到此为止,非常基础的配置已经OK,用户可以在命令行直接操作hive,接下来配置如何远程连接hive,类似启动hadoop后台服务。

使用元数据服务的方式访问 Hive

大数据技术-hive3_hadoop_03

  • 第一步:修改配置文件 hive-site.xml

➜  apache-hive-3.1.2-bin vim conf/hive-site.xml 
# 添加如下配置
<!-- 指定存储元数据要连接的地址 -->
<property>
    <name>hive.metastore.uris</name>
    <value>thrift://dev-jsj.com:9083</value>
</property>

  • 第二步:启动元数据服务

➜  apache-hive-3.1.2-bin bin/hive --service metastore
# 或者
➜  apache-hive-3.1.2-bin nohup bin/hive --service metastore &

提示:元数据服务是一个前台进程,默认会占用当前会话窗口,可以使用nohup命令使其在后台运行

使用 JDBC 方式访问 Hive

所谓JDBC的方式就是在服务端开启hive服务,即暴露端口,客户端可远程连接,类似于mysqlhadoop

大数据技术-hive3_hive_04

  • 第一步:修改配置文件 hive-site.xml

➜  apache-hive-3.1.2-bin vim conf/hive-site.xml 
# 添加如下配置
<!-- 指定 hiveserver2 连接的 host -->
<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>dev-jsj.com</value>
</property>
<!-- 指定 hiveserver2 连接的端口号 -->
<property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
</property>

  • 第二步:启动 hiveserver2服务

➜  apache-hive-3.1.2-bin bin/hive --service hiveserver2
# 或者
➜  apache-hive-3.1.2-bin nohup bin/hive --service hiveserver2 &

提示:

  • 启动hiveserver2前要先启动元数据服务
  • hiveserver2服务是一个前台进程,默认会占用当前会话窗口,可以使用nohup命令使其在后台运行
  • 第三步:命令行模拟远程连接

➜  apache-hive-3.1.2-bin bin/beeline -u jdbc:hive2://dev-jsj.com:10000 -n sairo

大数据技术-hive3_hadoop_05

其他配置

日志配置

$HIVE_HOME/conf/目录下有很多配置文件模板,找到hive-log4j2.properties.template ,配置相关属性就好。修改完成后记得重命名为hive-log4j2.properties

配置日志存放位置,默认在/tmp/{user}目录

大数据技术-hive3_hive_06

命令行格式配置

默认情况下在命令行中操作hive命令提示符非常简单,就像下面这样:

大数据技术-hive3_apache_07

可以添加配置使得在命令提示符中带有数据库标识符。配置如下:

➜  apache-hive-3.1.2-bin vim conf/hive-site.xml 
...
<property>
    <name>hive.cli.print.header</name>
    <value>true</value>
</property>
<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
</property>
...

大数据技术-hive3_apache_08

题外话

有兴趣的读者在完成以上操作后可以试试使用 HUE 连接hive,可能会遇到很多坑,各种报错,笔者花了一下午才弄好。建议有时间爱折腾的读者试试。后面有时间笔者也会写篇文章介绍如何搭建HUE环境。

大数据技术-hive3_hive_09

问题解决

  1. 初始化derby数据库报错

➜  apache-hive-3.1.2-bin bin/schematool -dbType derby -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/Apache/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/Apache/hadoop-3.2.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
        at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
        at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:448)
        at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
        at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5104)
        at org.apache.hive.beeline.HiveSchemaTool.<init>(HiveSchemaTool.java:96)
        at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1473)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

原因:hive自带的guava*.jarhadoopguava.jar版本冲突,删除掉低版本替换为高版本

解决办法:

  • 查找hiveguava*.jar的位置

➜  apache-hive-3.1.2-bin find ./ -name  "*guava*"                           
./lib/guava-19.0.jar
./lib/jersey-guava-2.25.1.jar

  • 查看hadoopguava*.jar的位置,一般在/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/目录

➜  apache-hive-3.1.2-bin find /opt/Apache/hadoop-3.2.1 -name "*guava*"
/opt/Apache/hadoop-3.2.1/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/guava-27.0-jre.jar
/opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar

  • 删除hive中低版本guava并替换为hadoop的高版本guava

➜  apache-hive-3.1.2-bin mv ./lib/guava-19.0.jar ./lib/guava-19.0.jar.bak
➜  apache-hive-3.1.2-bin cp /opt/Apache/hadoop-3.2.1/share/hadoop/hdfs/lib/guava-27.0-jre.jar ./lib

举报

相关推荐

hive3 集群

hive3的join数据错误

Hive3 on Spark3配置

hive3 spark3

hive3 内外表

HIVE3 深度剖析 (上篇)

HIVE3 深度剖析 (下篇)

hive3 hql脚本传递参数

0 条评论