0
点赞
收藏
分享

微信扫一扫

在Docker中快速测试Apache Pinot批数据导入与查询


Pinot 是一个实时分布式 OLAP 数据存储,专为提供超低延迟分析而构建,即使在极高吞吐量下也是如此。如果你还不了解Pinot,那么可以先阅读这篇文章​​《Apache Pinot基本介绍》​​,本文介绍如何以Docker方式运行Pinot,在Docker中运行Pinot对于了解Docker的新手来说是最简单不过的了。

拉取镜像

docker pull apachepinot/pinot:latest

或者你也可以指定pinot版本

docker pull apachepinot/pinot:0.9.3

在同一个docker容器中运行所有组件

docker run \
-p 9000:9000 \
apachepinot/pinot:latest QuickStart \
-type batch

随后在浏览器输入:http://localhost:9000,即可看到如下界面

在Docker中快速测试Apache Pinot批数据导入与查询_容器

上述模式为启动Batch模式的Pinot进程,可以参考​​这里​​实践其他的Pinot运行模式。

使用Docker compose在多个容器中运行Pinot进行

docker-compose.yml内容如下:

version: '3.7'
services:
zookeeper:
image: zookeeper:3.5.6
hostname: zookeeper
container_name: manual-zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
pinot-controller:
image: apachepinot/pinot:latest
command: "StartController -zkAddress manual-zookeeper:2181"
container_name: "manual-pinot-controller"
restart: unless-stopped
ports:
- "9000:9000"
environment:
JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"
depends_on:
- zookeeper
pinot-broker:
image: apachepinot/pinot:latest
command: "StartBroker -zkAddress manual-zookeeper:2181"
restart: unless-stopped
container_name: "manual-pinot-broker"
ports:
- "8099:8099"
environment:
JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"
depends_on:
- pinot-controller
pinot-server:
image: apachepinot/pinot:latest
command: "StartServer -zkAddress manual-zookeeper:2181"
restart: unless-stopped
container_name: "manual-pinot-server"
environment:
JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log"
depends_on:
- pinot-broker

将上述文件拷贝到本地docker-compose.yml文件中,执行如下命令启动:

docker-compose --project-name pinot-demo up

查看容器运行状态

docker ps

在Docker中快速测试Apache Pinot批数据导入与查询_zookeeper_02

同样在浏览器输入:http://localhost:9000,即可看到如下界面:

在Docker中快速测试Apache Pinot批数据导入与查询_docker_03

导入批量数据

在上述步骤中,我们已经在Dokcer中拉起Pinot运行环境,接下来便可导入数据进行查询。

进入/tmp目录,执行如下命令:

mkdir -p pinot-quick-start/rawdata
vim pinot-quick-start/rawdata/transcript.csv

在上述新建的csv文件中填入下述数据:

studentID,firstName,lastName,gender,subject,score,timestampInEpoch
200,Lucy,Smith,Female,Maths,3.8,1570863600000
200,Lucy,Smith,Female,English,3.5,1571036400000
201,Bob,King,Male,Maths,3.2,1571900400000
202,Nick,Young,Male,Physics,3.6,1572418800000

新建schema文件

vim pinot-quick-start/transcript-schema.json

填入如下内容:

{
"schemaName": "transcript",
"dimensionFieldSpecs": [
{
"name": "studentID",
"dataType": "INT"
},
{
"name": "firstName",
"dataType": "STRING"
},
{
"name": "lastName",
"dataType": "STRING"
},
{
"name": "gender",
"dataType": "STRING"
},
{
"name": "subject",
"dataType": "STRING"
}
],
"metricFieldSpecs": [
{
"name": "score",
"dataType": "FLOAT"
}
],
"dateTimeFieldSpecs": [{
"name": "timestampInEpoch",
"dataType": "LONG",
"format" : "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}]
}

创建表配置项

vim pinot-quick-start/transcript-table-offline.json

填入如下内容

{
"tableName": "transcript",
"segmentsConfig" : {
"timeColumnName": "timestampInEpoch",
"timeType": "MILLISECONDS",
"replication" : "1",
"schemaName" : "transcript"
},
"tableIndexConfig" : {
"invertedIndexColumns" : [],
"loadMode" : "MMAP"
},
"tenants" : {
"broker":"DefaultTenant",
"server":"DefaultTenant"
},
"tableType":"OFFLINE",
"metadata": {}
}

执行如下命令创建表

docker run --rm -ti \
-v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
--name pinot-batch-table-creation \
apachepinot/pinot:latest AddTable \
-schemaFile /tmp/pinot-quick-start/transcript-schema.json \
-tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
-controllerHost manual-pinot-controller \
-controllerPort 9000 -exec

得到如下输出:

在Docker中快速测试Apache Pinot批数据导入与查询_容器_04

Pinot 表的数据存储为 Pinot 段。

要生成段,我们需要首先创建一个作业规范 yaml 文件。 JobSpec yaml 文件包含有关数据格式、输入数据位置和 Pinot 簇坐标的所有信息。 您可以复制此作业规范文件。 如果您使用自己的数据,请确保 1) 用您的表名替换成transcript 2) 设置正确的 recordReaderSpec

executionFrameworkSpec:
name: 'standalone'
segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobType: SegmentCreationAndTarPush
inputDirURI: '/tmp/pinot-quick-start/rawdata/'
includeFileNamePattern: 'glob:**/*.csv'
outputDirURI: '/tmp/pinot-quick-start/segments/'
overwriteOutput: true
pinotFSSpecs:
- scheme: file
className: org.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec:
dataFormat: 'csv'
className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec:
tableName: 'transcript'
schemaURI: 'http://manual-pinot-controller:9000/tables/transcript/schema'
tableConfigURI: 'http://manual-pinot-controller:9000/tables/transcript'
pinotClusterSpecs:
- controllerURI: 'http://manual-pinot-controller:9000'

使用以下命令生成段并上传数据

docker run --rm -ti \
--network=pinot-demo_default \
-v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
--name pinot-data-ingestion-job \
apachepinot/pinot:latest LaunchDataIngestionJob \
-jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml

导入完数据之后即可在前端界面进行查询:

在Docker中快速测试Apache Pinot批数据导入与查询_zookeeper_05

在Docker中快速测试Apache Pinot批数据导入与查询_apache_06



举报

相关推荐

0 条评论