Hive通过sql的方式来查询hdfs,简单来讲他是通过sql实现mapreduce的功能,实际上不是一个数据库,实际上将ddl语句解释为mapreduce. 下面简单的介绍,如何安装和使用hive表。
版本选择
访问 https://hive.apache.org/downloads.html 查看hive的版本,选择合适的版本,根据提供的release信息,会说适合哪个版本的hadoop 。
镜像地址
http://mirror.bit.edu.cn/apache/hive/
#我们下载2.3.4 版本
wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz
安装准备
安装hive前,我们需要把java和hadoop安装好,同时还是需要安装mysql服务的,最后再安装hive.
安装
#解压文件
tar -zxvf apache-hive-2.3.4-bin.tar.gz -C /usr/local/
#重命名文件夹
mv /usr/local/apache-hive-2.3.4-bin /usr/local/hive
解压文件
重命名文件夹
配置环境变量
目录结构
配置环境变量
vim /etc/profile
export HIVE_HOME=/usr/local/hive
source /etc/profile
#查看hive的版本
hive --version
设定环境变量
查看hive的版本
配置hive
#到hive的配置文件夹下面
cd /usr/local/hive/conf
#拷贝hive配置文件
touch hive-site.xml
1 设置mysql
创建hive-site.xml 这个配置文件,如果存在相同的配置,直接给注释掉。
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
censed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--><configuration>
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://ba-k8s-master-node1:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
</configuration>
hive的mysql数据源配置,具体可以参考官方文档https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration#AdminManualMetastoreAdministration-RemoteMetastoreDatabase
2 添加依赖包
#进入到hive的lib的目录
cd /usr/local/hive/lib
#下载mysql依赖包
wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.47/mysql-connector-java-5.1.47.jar
3 初始化数据库
schematool -dbType mysql -initSchema
初始化hive得mysql数据源。
可以看到数据库里面有了很初始化的表信息了。
4.1 启动客户端
hive --service cli
4.2 hive简单实用
#创建数据库
create database demo;
#使用demo这个数据库
use demo;
#查看当前数据库
select current_database();
4.3 hived导入本地数据
#创建表
create table student(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
#导入本地文本到数据库
load data local inpath "/data/demo/student.txt" into table student;
#查询数据
select * from student;
创建了student的表
测试数据
95002,刘晨,女,19,IS
95017,王风娟,女,18,IS
95018,王一,女,19,IS
95013,冯伟,男,21,CS
95014,王小丽,女,19,CS
95019,邢小丽,女,19,IS
95020,赵钱,男,21,IS
95003,王敏,女,22,MA
95004,张立,男,19,IS
95012,孙花,女,20,CS
95010,孔小涛,男,19,CS
95005,刘刚,男,18,MA
95006,孙庆,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95021,周二,男,17,MA
95022,郑明,男,20,MA
95001,李勇,男,20,CS
95011,包小柏,男,18,MA
95009,梦圆圆,女,18,MA
95015,王君,男,18,MA
4.4 hive导入hdfs的文件
#上传文件到hadoop
hadoop fs -put student.txt /demo/student.txt
#查看文件信息
hadoop fs -ls /demo
导入hdfs的数据
#创建用户表
create table student2(id int, name string, sex string, age int, department string) row format delimited fields terminated by ",";
#导入数据
load data inpath '/demo/student.txt' into table student2;
常见问题
1 Metastore state would be inconsistent !!
导致这个问题的原因是我将mysql的数据源搞错了,需要安装mysql服务,然后设定到新的机器上。
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
导致这个问题的真正原因是hive-site.xml文件配置的问题,解决办法是,干掉后面所有的。