本文档为CDH集群集成Griffin集成文档,所以Griffin需要的基础依赖服务,例如HDFS、Hive、Spark 下面将不再额外说明。
1. 版本选择
- Griffin 的版本:最新版 0.6.0;
- Griffin 依赖服务Livy:0.5.0(官方版本为0.3.0,)
- Griffin 依赖服务ElasticSearch:6.4.1(官方Maven中版本为6.4.1,选用其他版本可能会造成后面创建es索引失败)
2. 依赖服务部署
2.1 Livy 部署
- 打 Livy parcel 包
由于在CDH基础环境下部署Livy,这里选择使用Livy parcel 包的方式部署,便于统一管理。
详细的Livy parcel包打包方式,详见
里面的 livy_zeppelin_cdh_csd_parcels 无法下载的话,使用这个地址下载。
cm_ext 下载使用这个地址
具体的打parcel包的方式,这里不再一一说明,参考上面的文档即可。 - 部署Livy parcel 包
把Livy parcel 包部署到自建的httpd服务(搭建方式详见
离线安装 CM/CDH 问题记录
)中,然后在 cdh parcel 管理页面,添加Livy parcel 分发地址即可。
同时,把Livy csd jar
包放到/opt/cloudera/csd/
目录下,更改文件所有者和所属组都为cloudera-scm
,重启systemctl restart cloudera-scm-server
。
2.2 ElasticSearch 部署
es的官方文档很全,而且是中文的,直接按照文档上的说明部署即可,这里附录下我的config/elasticsearch.yml
的配置项:
# es集群名称
cluster.name: enbrands-sky
# 集群访问ip。如果是默认localhost只能通过localhost:9200访问
network.host: 192.168.1.xxx
# es 推荐配置,自动创建索引
action.auto_create_index: .monitoring*,.watches,.triggered_watches,.watcher-history*,.ml*
# 数据目录,用户elasticsearch要有读写权限
path.data: /data/var/lib/elasticsearch/
# 日志目录,用户elasticsearch要有读写权限
path.logs: /data/var/log/elasticsearch/
# 我部署的es是单节点的,选用single-node类型
discovery.type: single-node
# 启用http
http.cors.enabled: true
http.cors.allow-origin: "*"
# 关闭安全校验,内部使用es暂不作安全校验
xpack.security.enabled: false
配置完后,启动es:
./bin/elasticsearch -d -p pid
使用:curl -X GET "http://192.168.1.xxx:9200/?pretty"
检查es服务是否运行正常:
{
"name" : "qTMjIqp",
"cluster_name" : "enbrands-sky",
"cluster_uuid" : "Ci_zl6s9SWiYlBR1PSUz-w",
"version" : {
"number" : "6.4.1",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "e36acdb",
"build_date" : "2018-09-13T22:18:07.696808Z",
"build_snapshot" : false,
"lucene_version" : "7.4.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
3. 部署Griffin
3.1 创建es索引
```
curl -H "Content-Type: application/json" -XPUT http://192.168.1.xxx:9200/griffin -d '
{
"aliases": {},
"mappings": {
"accuracy": {
"properties": {
"name": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"tmst": {
"type": "date"
}
}
}
},
"settings": {
"index": {
"number_of_replicas": "2",
"number_of_shards": "5"
}
}
}
'
```
3.3 创建Griffin用到的库表
在mysql 中创建 Griffin 用到的库表,新建 数据库 quartz,格式为utf-8,建表语句详见Init_quartz_mysql_innodb.sql
3.3 配置Griffin
- service/src/main/resources/application.properties:
注意,就算不需要的配置项,也不要注释掉,不然项目起不来。# # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # Apache Griffin端口 server.port = 8123 spring.application.name=griffin_service # MySQL数据源 spring.datasource.url=jdbc:mysql://192.168.1.xx:3306/quartz?autoReconnect=true&useSSL=false spring.datasource.username=xxx spring.datasource.password=xxx spring.jpa.generate-ddl=true #spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver spring.datasource.driver-class-name=com.mysql.jdbc.Driver spring.jpa.show-sql=true # Hive metastore thrift 地址 hive.metastore.uris=thrift://192.168.1.xx:9083 hive.metastore.dbname=metastore hive.hmshandler.retry.attempts=15 hive.hmshandler.retry.interval=2000ms #Hive server2 地址 hive.jdbc.className=org.apache.hive.jdbc.HiveDriver hive.jdbc.url=jdbc:hive2://192.168.1.xx:10000/ hive.need.kerberos=false hive.keytab.user=xxx@xx.com hive.keytab.path=/path/to/keytab/file # Hive cache time cache.evict.hive.fixedRate.in.milliseconds=900000 # Kafka schema registry kafka.schema.registry.url= # Update job instance state at regular intervals jobInstance.fixedDelay.in.milliseconds=60000 # Expired time of job instance which is 7 days that is 604800000 milliseconds.Time unit only supports milliseconds jobInstance.expired.milliseconds=604800000 # schedule predicate job every 5 minutes and repeat 12 times at most #interval time unit s:second m:minute h:hour d:day,only support these four units predicate.job.interval=5m predicate.job.repeat.count=12 # external properties directory location external.config.location= # external BATCH or STREAMING env external.env.location= # login strategy ("default" or "ldap") login.strategy=default # ldap ldap.url=ldap://hostname:port ldap.email=@example.com ldap.searchBase=DC=org,DC=example ldap.searchPattern=(sAMAccountName={0}) # hdfs default name fs.defaultFS=hdfs://sky-ns # elasticsearch server地址 elasticsearch.host=192.168.1.xx elasticsearch.port=9200 elasticsearch.scheme=http # elasticsearch.user = user # elasticsearch.password = password # livy server 地址 livy.uri=http://192.168.1.xx:8998/batches #livy.need.queue=false #livy.task.max.concurrent.count=20 #livy.task.submit.interval.second=3 #livy.task.appId.retry.count=3 #livy.need.kerberos=false #livy.server.auth.kerberos.principal=livy/kerberos.principal #livy.server.auth.kerberos.keytab=/path/to/livy/keytab/file # yarn resourcemanager 地址 yarn.uri=http://192.168.1.xx:8088 # griffin event listener internal.event.listeners=GriffinJobEventHook logging.file=./logs/griffin-service.log
- service/src/main/resources/quartz.properties
# # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # org.quartz.scheduler.instanceName=spring-boot-quartz org.quartz.scheduler.instanceId=AUTO org.quartz.threadPool.threadCount=5 org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX # If you use postgresql as your database,set this property value to org.quartz.impl.jdbcjobstore.PostgreSQLDelegate # If you use mysql as your database,set this property value to org.quartz.impl.jdbcjobstore.StdJDBCDelegate # If you use h2 as your database, it's ok to set this property value to StdJDBCDelegate, PostgreSQLDelegate or others org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate org.quartz.jobStore.useProperties=true org.quartz.jobStore.misfireThreshold=60000 org.quartz.jobStore.tablePrefix=QRTZ_ org.quartz.jobStore.isClustered=true org.quartz.jobStore.clusterCheckinInterval=20000
- service/src/main/resources/sparkProperties.json:
注意上传 hive-site.xml 到 hdfs 上。{ "file": "hdfs:///griffin/griffin-measure.jar", "className": "org.apache.griffin.measure.Application", "queue": "default", "numExecutors": 2, "executorCores": 2, "driverMemory": "1g", "executorMemory": "4g", "conf": { "spark.yarn.dist.files": "hdfs:///home/griffin_spark_conf/hive-site.xml" }, "files": [ ] }
- service/src/main/resources/env/env_batch.json:
{ "spark": { "log.level": "INFO" }, "sinks": [ { "name": "console", "type": "CONSOLE", "config": { "max.log.lines": 10 } }, { "name": "hdfs", "type": "HDFS", "config": { "path": "hdfs:///griffin/persist", "max.persist.lines": 10000, "max.lines.per.file": 10000 } }, { "name": "elasticsearch", "type": "ELASTICSEARCH", "config": { "method": "post", "api": "http://192.168.1.xxx:9200/griffin/accuracy", "connection.timeout": "1m", "retry": 10 } } ], "griffin.checkpoint": [] }
- service/pom.xml:
修改build plugin ,不然打包运行识别不到主类<build> <plugins> <plugin> <groupId>com.ethlo.persistence.tools</groupId> <artifactId>eclipselink-maven-plugin</artifactId> <version>2.7.0</version> <executions> <execution> <phase>process-classes</phase> <goals> <goal>weave</goal> </goals> </execution> </executions> <dependencies> <dependency> <groupId>org.eclipse.persistence</groupId> <artifactId>org.eclipse.persistence.jpa</artifactId> <version>${eclipselink.version}</version> </dependency> </dependencies> </plugin> <!-- <plugin>--> <!-- <groupId>org.apache.maven.plugins</groupId>--> <!-- <artifactId>maven-jar-plugin</artifactId>--> <!-- <version>3.1.1</version>--> <!-- <executions>--> <!-- <execution>--> <!-- <phase>package</phase>--> <!-- <goals>--> <!-- <goal>jar</goal>--> <!-- </goals>--> <!-- <configuration>--> <!-- <classifier>lib</classifier>--> <!-- </configuration>--> <!-- </execution>--> <!-- </executions>--> <!-- </plugin>--> <!-- <plugin>--> <!-- <groupId>org.springframework.boot</groupId>--> <!-- <artifactId>spring-boot-maven-plugin</artifactId>--> <!-- <version>${spring-boot-maven-plugin.version}</version>--> <!-- <executions>--> <!-- <execution>--> <!-- <goals>--> <!-- <goal>build-info</goal>--> <!-- </goals>--> <!-- </execution>--> <!-- </executions>--> <!-- <configuration>--> <!-- <executable>false</executable>--> <!-- </configuration>--> <!-- </plugin>--> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> <version>${spring-boot-maven-plugin.version}</version> <configuration> <mainClass>org.apache.griffin.core.GriffinWebApplication</mainClass> </configuration> <executions> <execution> <goals> <goal>repackage</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <configuration> <appendAssemblyId>false</appendAssemblyId> <skipAssembly>false</skipAssembly> <outputDirectory>../target</outputDirectory> <descriptors> <descriptor>src/main/resources/assembly/assembly.xml</descriptor> </descriptors> </configuration> <executions> <execution> <id>assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> </plugin> </plugins> </build>
- service/src/main/java/org/apache/griffin/core/metastore/hive/HiveMetaStoreProxy.java:
// 在 initHiveMetastoreClient 方法中,添加: // 添加配置项,解决错误:HiveMetaStoreClient.class#open#UserGroupInformation获取 //用户和组信息错误,关闭metastore操作定义用户和组权限。不然无法连接上metastore hiveConf.setBoolVar(HiveConf.ConfVars.METASTORE_EXECUTE_SET_UGI, false);
- 如果缺少 kafka-schema-registry-client-3.2.0.jar 的话,参考这里
3.4 启动Griffin
-
使用命令打包:
mvn -Dmaven.test.skip=true clean install
-
将
measure\target\measure-0.6.0.jar
上传到 hdfs 上hadoop fs -put measure-0.6.0.jar /griffin/
,避免出现spark在yarn集群上执行任务时,需要到HDFS的/griffin目录下加载griffin-measure.jar,避免发生类org.apache.griffin.measure.Application找不到的错误。 -
把service启动jar复制到服务器上,启动服务:
nohup java -jar service-0.6.0.jar>service.out 2>&1 &
访问
http://192.168.1.xxx:8123/#/measures
即可:
登录的用户名:admin
登录的密码:admin使用文档参考官方文档:user-guide.md