1 kafka监控方式
Kafka这样的Java进程可以先通过JMX Agent或者第三方Agent(kafka_exporter\KMINION等)获取监控数据,再通过Prometheus采集数据、通过Grafana模板展示数据即可。
2 kafka_exporter
2.1 安装配置kafka_exporter
2.1.1 下载kafka_exporter
下载地址
https://github.com/danielqsj/kafka_exporter/releases/
#加速下载
wget https://mirror.ghproxy.com/https://github.com/danielqsj/kafka_exporter/releases/download/v1.7.0/kafka_exporter-1.7.0.linux-amd64.tar.gz2.1.2 安装kafka
tar -xf kafka_2.13-3.7.0.tgz -C /app/module/
mv /app/module/kafka_2.13-3.7.0/ /app/module/kafka2.1.3 更改配置
cd /app/module/kafka/config/
vim server.properties
listeners=PLAINTEXT://192.168.137.131:9092
advertised.listeners=PLAINTEXT://192.168.137.131:90922.1.4 启动kafka
cd /app/module/kafka/bin/
sh kafka-server-start.sh -daemon ../config/server.properties2.1.5 解压kafka_exporter
tar -xf kafka_exporter-1.7.0.linux-amd64.tar.gz -C /app/module/
ln -s /app/module/kafka_exporter-1.7.0.linux-amd64/ /app/module/kafka_exporter2.1.6 配置kafka_exporter启动⽂件
vim /usr/lib/systemd/system/kafka_exporter.service
[Unit]
Description=kafka_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
ExecStart=/app/module/kafka_exporter/kafka_exporter \
  --web.listen-address=:9308 \
  --kafka.server=192.168.137.131:9092
ExecReload=/bin/kill -HUP
TimeoutStopSec=20s
Restart=always
[Install]
WantedBy=multi-user.target
#kafka_exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]
意味着一个kafka_exporter可以配置多个kafka服务
#--kafka.server尽量不要用localhost:9092,有些时候获取不到partition数据2.1.7 启动kafka_exporter
systemctl daemon-reload
systemctl start kafka_exporter.service2.2 配置Prometheus
1、编辑Prometheus配置⽂件,将haproxy服务纳⼊监控
  - job_name: "kafka_exporter"
    metrics_path: "/metrics"
    static_configs:
    - targets: ["192.168.137.131:9308"]
2、重新加载Prometheus配置⽂件 
curl -X POST http://192.168.137.131:9090/-/reload2.3 kafka常用指标
2.3.1 brokers相关指标
指标名称  | 指标类型  | 指标含义  | 
kafka_brokers  | gauge  | Kafka集群中的brokers数量  | 
2.3.2 Topics相关指标
指标名称  | 指标类型  | 指标含义  | 
kafka_topic_partitions  | gauge  | 该主题的分区数  | 
kafka_topic_partition_current_offset  | gauge  | 分区在主题/分区上的当前偏移量  | 
kafka_topic_partition_oldest_offset  | gauge  | 分区在主题/分区上的最旧偏移量  | 
kafka_topic_partition_in_sync_replica  | gauge  | 该主题/分区的同步副本数量  | 
kafka_topic_partition_leader  | gauge  | 该主题/分区的领导者  | 
2.3.3 消费者组相关指标
指标名称  | 指标类型  | 指标含义  | 
kafka_consumergroup_current_offset  | gauge  | 消费者组在主题/分区上的当前位置  | 
kafka_consumergroup_lag  | gauge  | 消费者组在主题/分区上的当前大约滞后  | 
2.4 kafka告警规则文件
2.4.1 告警规则⽂件
vim /app/module/prometheus/rules/kafka_rules.yml
groups:
- name: kafka告警规则
  rules:
  - alert: kafka brokers异常
    expr: kafka_broker_info != 1
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}当前brokers异常:{{ $labels.address }}"
  - alert: kafka消息整体积压
    expr: sum(kafka_consumergroup_lag_sum{job="kafka-exporter"}) by (name,consumergroup, topic)>5000
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "【环境】{{ $labels.name }}\n【消费组】{{ $labels.consumergroup }}\n【topic】{{ $labels.topic }}【积压】:{{ $value | printf \"%.2f\" }}"
  - alert: kafka消息分区积压
    expr: (sum(kafka_consumergroup_lag{job="kafka-exporter"}) by (name,consumergroup, topic, partition)>1500) AND ON() (hour()+8)%24 >= 7 <= 21
    for: 3m
    labels:
      severity: critical
    annotations:
      description: "【环境】{{ $labels.name }}\n【消费组】{{ $labels.consumergroup }}\n【topic】{{$labels.topic}}【分区】{{ $labels.partition }}【积压】:{{ $value | printf \"%.2f\" }}"
  - alert: kafka分区数过多
    expr: sum by(name)(kafka_topic_partitions{job="kafka-exporter",topic !~"__.*"})>1500
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}当前分区数:{{ $value | printf \"%.2f\" }}"
  - alert:  kafka_brokers丢失
    expr: kafka_brokers{job="kafka-exporter"} < 3
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }}当前brokers数:{{ $value | printf \"%.2f\" }}"
  - alert:  kafka_TopicsReplicas
    expr: sum(kafka_topic_partition_in_sync_replica{job="kafka-exporter"}) by (name,topic) <1
    for: 2m
    labels:
      severity: critical
    annotations:
      description: "{{ $labels.name }} kafka topic in-sync partition:{{ $value | printf \"%.2f\" }}"2.4.2 检查rules语法
/app/module/prometheus/promtool check rules /app/module/prometheus/rules/kafka_rules.yml
2.4.3 重新加载Prometheus
curl -X POST http://192.168.137.131:9090/-/reload
2.4.4 验证告警规则

2.5 导入kafka图形
2.5.1 导入ID7589

这个模板匹配kafka_consumergroup_current_offset,如果是做实验的kafka,数据少,这个参数kafka_consumergroup_current_offset可能会不存在,如果参数不存在,job那里就是空的
2.5.2 导入ID21078










