在clickhouse集群中,每一台机器都是单独的实例,我们可以使用其中的一台作为查询机器。此时如何更好的完成负载均衡是我们所关注的,chproxy即是这么一个工具。
Chproxy, is an http proxy and load balancer for ClickHouse database.
如何使用chproxy
第一步: 下载chproxy,可以直接在下方引文中下载,也可以通过命令下载:
$ mkdir -p /data/chproxy
$ cd /data/chproxy
$ wget https://github.com/Vertamedia/chproxy/releases/download/v1.14.0/chproxy-linux-amd64-v1.14.0.tar.gz
$ tar -xzvf chproxy-*.gz
目录结构:
|-- cache
| |-- longterm
| | `-- cachedir
| | |-- 6b0533288f9a1f93023110b8cfa9b921
| | `-- 6d9798e2b0a16f6b841d9f6fd786a038
| `-- shortterm
| `-- cachedir
|-- chproxy
|-- config
| |
| `-- config.yml
|-- logs
| `-- chproxy.out
`-- startup.sh
第二步: 配置chproxy:
log_debug: false # debug日志
hack_me_please: true
# cache设置,可设置长期缓存或者短期缓存,按组区分
caches: # 缓存设置
- name: "longterm"
dir: "/home/work/tools/chproxy/cache/longterm/cachedir"
max_size: 100Gb
expire: 1h
grace_time: 60s
- name: "shortterm"
dir: "/home/work/tools/chproxy/cache/shortterm/cachedir"
max_size: 1000Mb
expire: 15s
# 网络白名单组,按组区分(只有这些网络可以访问chproxy)
network_groups: # 白名单组,可设置多个白名单组
- name: "cluster_online"
networks: ["10.173.1.0/24","10.173.2.0/24","10.173.3.0/24"]
- name: "cluster_offline"
networks: ["10.12.1.0/24"]
# 参数设置,按组区分
param_groups: # 参数组,可设置多个参数
- name: "cron-job"
params:
- key: "max_memory_usage"
value: "20000000000"
- key: "max_bytes_before_external_group_by"
value: "20000000000"
- name: "web_param"
params:
- key: "max_memory_usage"
value: "5000000000"
- key: "max_columns_to_read"
value: "30"
- key: "max_execution_time"
value: "30"
# chproxy server相关设置,一般分为http、https、metrics
server:
http:
listen_addr: ":8090" # chproxy 服务监听端口
allowed_networks: ["cluster_offline", "cluster_online"] # 允许访问chproxy服务白名单
read_timeout: 5m
write_timeout: 10m
idle_timeout: 20m
metrics:
allowed_networks: ["cluster_offline", "cluster_online"] # 暴露给prometheus使用的白名单
# 设置chproxy用户,按组区分
users:
- name: "dev" # chproxy 用户名
password: "dev******" # chproxy 密码
to_cluster: "offline_suzhou_bigdata01" # 用户可访问的cluster名称(这里要跟下面clusters的name名称一致)
to_user: "admin" # chproxy用户对应的ck用户(这里跟下面clusters的users下name一致)
deny_http: false # 是否允许http请求
allow_cors: true
requests_per_minute: 20 # 限制该用户每分钟请求次数
cache: "shortterm" # 使用缓存,若使用缓存,查询优先走缓存,而不是按照规则轮询
params: "cron-job" # 应用“web”指定的参数集
max_queue_size: 100 # 最大队列数
max_queue_time: 35s # 队列最大等待时间
- name: "default" # chproxy 用户
to_cluster: "online_alluxio_cluster_10shards_2replicas" # 不同的chproxy用户,可对应不同的cluster集群
to_user: "admin"
allowed_networks: ["cluster_offline", "cluster_online"] # 这里直接添加ip貌似不行, "10.197.158.162", "10.104.100.255"
max_concurrent_queries: 50
max_execution_time: 1m
deny_https: false
deny_http: false
cache: "longterm"
params: "web_param"
# 逻辑集群设置,按组区分
clusters:
- name: "offline_suzhou_bigdata01" # chproxy 集合名称
scheme: "http" # 请求类型,http/https
nodes: ["suzhou-bigdata01.domain.com:8123"] # 集群可访问clickhouse节点,http使用端口默认为8123,https使用端口默认为8443,查看ck服务的config.xml配置文件查询
heartbeat: # 集群内部心跳检测定义
interval: 1m
timeout: 30s
request: "/?query=SELECT%201%2B1"
response: "2\n"
kill_query_user: # 达到上限自动执行kill用户
name: "default"
password: ""
users:
- name: "admin" # 集群对应clickhouse用户信息
password: "yyy**********"
max_concurrent_queries: 10
max_execution_time: 1m
# 多副本集群的配置
- name: "online_alluxio_cluster_10shards_2replicas" # chproxy 集群2名称,可从逻辑上定义多个集群
scheme: "http"
replicas: # 集群可访问clickhouse的cluster的副本节点配置,这里两个副本
- name: "replica1"
nodes: ["bigdata-work1.domain.com:8123", "bigdata-work2.domain.com:8123", "bigdata-work3.domain.com:8123", "bigdata-work4.domain.com:8123", "bigdata-work5.domain.com:8123"]
- name: "replica2"
nodes: ["bigdata-work2.domain.com:8123", "bigdata-work3.domain.com:8123", "bigdata-work4.domain.com:8123", "bigdata-work5.domain.com:8123", "bigdata-work1.domain.com:8123"]
users:
- name: "admin"
password: "xxx**********"
max_concurrent_queries: 30
max_execution_time: 1m
配置详情见:官方地址
第三步: 脚本启动
#!/bin/bash
#cd $(dirname)
ps -ef | grep chproxy | grep -v grep | awk '{print $2}' | xargs kill -9
nohup ./chproxy -config=./config/config.yml >> ./logs/chproxy.out 2>&1 &
第四步: 测试
1) 直接通过ch实例访问数据:
echo 'SELECT * from system.clusters limit 5' | curl 'http://bigdata-work1.domain.com:8123/?user=admin&password=*****' --data-binary @-
2)通过proxy代理访问数据:
echo 'SELECT * from system.clusters limit 10' | curl 'http://bigdata-work19.domain.com:8090/?user=dev&password=*****' --data-binary @-