变更记录
- 2022-06-17 编写node_exporter
- 2022-06-15 编写mysqld_exporter
node_exporter
一、binary部署
1.1、安装node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar -zxf node_exporter-1.3.1.linux-amd64.tar.gz
mv node_exporter-1.3.1.linux-amd64 /data/monitor/node_exporter
1.2、配置systemd启动node_exporter
cat node_exporter-9400.service
[Unit]
Description=node_exporter service
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
LimitNOFILE=1000000
#LimitCORE=infinity
LimitSTACK=10485760
ExecStart=/data/monitor/node_exporter/scripts/run_node_exporter.sh
Restart=always
RestartSec=15s
[Install]
WantedBy=multi-user.target
启动脚本
cat run_node_exporter.sh
set -e
# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
DEPLOY_DIR=/data/monitor/node_exporter
cd "${DEPLOY_DIR}" || exit 1
exec > >(tee -i -a "/data/monitor/node_exporter/log/node_exporter.log")
exec 2>&1
exec bin/node_exporter \
--web.listen-address=":9400" \
--collector.tcpstat \
--collector.systemd \
--collector.mountstats \
--collector.meminfo_numa \
--collector.interrupts \
--collector.vmstat.fields="^.*" \
--log.level="info"
二、k8s部署
2.1 编写node_exporter Daemonset yaml
apiVersion apps/v1
kind DaemonSet
metadata
name node-exporter-9400
namespace kube-system
spec
selector
matchLabels
app node-exporter-9400
template
metadata
labels
app node-exporter-9400
spec
containers
args
--web.listen-address=0.0.0.0:9400
--path.procfs=/host/proc
--path.sysfs=/host/sys
--path.rootfs=/host/root
image harbor.foxchan.com/prom/node-exporter v1.3.1
imagePullPolicy IfNotPresent
name node-exporter-9400
resources
terminationMessagePath /dev/termination-log
terminationMessagePolicy File
volumeMounts
mountPath /host/proc
name proc
mountPath /host/sys
name sys
mountPath /host/root
mountPropagation HostToContainer
name root
readOnlytrue
dnsPolicy ClusterFirst
hostNetworktrue
hostPIDtrue
restartPolicy Always
schedulerName default-scheduler
securityContext
terminationGracePeriodSeconds30
volumes
hostPath
path /proc
type""
name proc
hostPath
path /sys
type""
name sys
hostPath
path /
type""
name root
tolerations
# 为了 node exporter 能够被调度到 master 节点中运行,我们需要为 Pod 添加容忍度属性
key"node-role.kubernetes.io/master"
operator"Exists"
effect"NoSchedule"
updateStrategy
type OnDelete
2.2 编写 svc
由于 node exporter Pod 分散在各个节点,为了便于 Prometheus 收集这些 node exporter 的 Pod IP,需要创建 Endpoint 统一收集,这里通过创建 Service 自动生成 Endpoint 来达到目的。
apiVersion v1
kind Service
metadata
annotations
prometheus.io/scrape"true"
prometheus.io/port"9400"
prometheus.io/path /metrics
name node-exporter-9400
namespace kube-system
labels
app node-exporter-9400
spec
ports
name http
port9400
protocol TCP
targetPort9400
selector
app node-exporter-9400
2.3 加入prometheus 自动发现
在 Kubernetes 下,Promethues 通过与 Kubernetes API 集成,目前主要支持5中服务发现模式,分别是:Node、Service、Pod、Endpoints、Ingress。
prometheus 添加如下
global
scrape_interval 15s
evaluation_interval 15s
scrape_timeout 10s
rule_files
/prometheus-rules/*.rules.yml
scrape_configs
job_name'kubernetes-node-exporter'
kubernetes_sd_configs
role endpoints
api_server https //k8s.foxchan.com8443
tls_config
ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verifytrue
bearer_token_file /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs
source_labels __meta_kubernetes_endpoints_name
regex'node-exporter-9400'
action keep
mysqld_exporter
1、安装mysql_exporter
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.linux-amd64.tar.gz
tar -zxf mysqld_exporter-0.14.0.linux-amd64.tar.gz
mv mysqld_exporter-0.14.0.linux-amd64 /data/monitor/mysqld_exporter
2、在MySQL中创建用户并授权
密码如果有特殊字符建议放到中间,避免exporter无法识别
CREATE USER 'mysqld_exporter'@'localhost' IDENTIFIED BY 'exporter!12ASzx' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
3、写一个mysqld_exporter的配置文件
cat /data/monitor/mysqld_exporter/.my.cnf
[client]
host=localhost
user=exporter
password=exporter!12ASzx
socket=/var/lib/mysql/mysql.sock
4、配置systemd启动mysqld_exporter
cat mysqld_exporter-9401.service
[Unit]
Description=mysqld_exporter
After=network.target
[Service]
ExecStart=/data/monitor/mysqld_exporter/scripts/run_mysqld_exporter.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target
启动脚本
cat run_mysqld_exporter.sh
set -e
# WARNING: This file was auto-generated. Do not edit!
# All your edit might be overwritten!
DEPLOY_DIR=/data/monitor/mysqld_exporter
cd "${DEPLOY_DIR}" || exit 1
exec > >(tee -i -a "/data/monitor/mysqld_exporter/log/mysqld_exporter.log")
exec 2>&1
exec bin/mysqld_exporter \
--web.listen-address=":9401" \
--config.my-cnf /data/monitor/mysqld_exporter/.my.cnf \
--log.level="info" \
--collect.info_schema.processlist \
--collect.info_schema.innodb_metrics \
--collect.info_schema.innodb_tablespaces \
--collect.info_schema.innodb_cmp \
--collect.info_schema.innodb_cmpmem
5、确认监控指标正常
curl http://localhost:9401/metrics|grep mysql_up
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 295k 0 295k 0 0 9.9M 0 --:--:-- --:--:-- --:--:-- 10.3M
# HELP mysql_up Whether the MySQL server is up.
# TYPE mysql_up gauge
mysql_up 1
6、在Prometheus的server端添加job任务
vim prometheus.yml
scrape_configs
file_sd_configs
files
mysql.yml
job_name MySQL
metrics_path /metrics
relabel_configs
source_labels __address__
regex (.*)
target_label __address__
replacement $1
mysql.yml
labels
instance mysql1 3306 # grafana显示的实例的别名
targets
172.18.0.23:9401 # mysqld_exporter暴露的端口
labels
instance mysql2 3306 # grafana显示的实例的别名
targets
172.18.0.23:9402 # mysqld_exporter暴露的端口
7、在grafana中导入MySQL监控图表
点击import,在弹出界面中输入7362,数据源选择Prometheus
8、配置mysql_exporter告警规则
cat mysql_rules.yml
groups
name mysql.rules
rules
alert MysqlDown
expr up == 0
for 0m
labels
severity critical
annotations
title'MySQL down'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL instance is down"
alert MysqlRestarted
expr mysql_global_status_uptime < 60
for 0m
labels
severity info
annotations
title'MySQL Restarted'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL has just been restarted, less than one minute ago"
alert MysqlTooManyConnections(>80%)
expr avg by (instance) (rate(mysql_global_status_threads_connected 1m )) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 80
for 2m
labels
severity warning
annotations
title'MySQL too many connections (> 80%)'
description"Mysql实例: 【{{ $labels.instance }}】, More than 80% of MySQL connections are in use, Current Value: {{ $value }}%"
alert MysqlThreadsRunningHigh
expr mysql_global_status_threads_running > 40
for 2m
labels
severity warning
annotations
title'MySQL Threads_Running High'
description"Mysql实例: 【{{ $labels.instance }}】, Threads_Running above the threshold(40), Current Value: {{ $value }}"
alert MysqlQpsHigh
expr sum by (instance) (rate(mysql_global_status_queries 2m )) > 500
for 2m
labels
severity warning
annotations
title'MySQL QPS High'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL QPS above 500"
alert MysqlSlowQueries
expr increase(mysql_global_status_slow_queries 1m ) > 0
for 2m
labels
severity warning
annotations
title'MySQL slow queries'
description"Mysql实例: 【{{ $labels.instance }}】, has some new slow query."
alert MysqlTooManyAbortedConnections
expr round(increase(mysql_global_status_aborted_connects 5m )) > 20
for 2m
labels
severity warning
annotations
title'MySQL too many Aborted connections in 2 minutes'
description"Mysql实例: 【{{ $labels.instance }}】, {{ $value }} Aborted connections within 2 minutes"
alert MysqlTooManyAbortedClients
expr round(increase(mysql_global_status_aborted_clients 120m )) > 10
for 2m
labels
severity warning
annotations
title'MySQL too many Aborted connections in 2 hours'
description"Mysql实例: 【{{ $labels.instance }}】, {{ $value }} Aborted Clients within 2 hours"
alert MysqlSlaveIoThreadNotRunning
expr mysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_io_running == 0
for 0m
labels
severity critical
annotations
title'MySQL Slave IO thread not running'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL Slave IO thread not running"
alert MysqlSlaveSqlThreadNotRunning
expr mysql_slave_status_master_server_id > 0 and ON (instance) mysql_slave_status_slave_sql_running == 0
for 0m
labels
severity critical
annotations
title'MySQL Slave SQL thread not running'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL Slave SQL thread not running"
alert MysqlSlaveReplicationLag
expr mysql_slave_status_master_server_id > 0 and ON (instance) (mysql_slave_status_seconds_behind_master - mysql_slave_status_sql_delay) > 30
for 1m
labels
severity critical
annotations
title'MySQL Slave replication lag'
description"Mysql实例: 【{{ $labels.instance }}】, MySQL replication lag"
alert MysqlInnodbLogWaits
expr rate(mysql_global_status_innodb_log_waits 15m ) > 10
for 0m
labels
severity warning
annotations
title'MySQL InnoDB log waits'
description"Mysql实例: 【{{ $labels.instance }}】, innodb log writes stalling"
9、将告警规则集成到Prometheus
prometheus.yml
rule_files
"/rules/*.yml"
10、mysqld_exporter监控多个数据库
可以根据配置文件分开监控,日志也分开
拷贝2个mysqld_exporter安装包
cp -r mysqld_exporter-0.14.1.linux-amd64 mysqld_exporter_3306
cp -r mysqld_exporter-0.14.1.linux-amd64 mysqld_exporter_3307
修改监控文件为对应的库的配置
cd/ data/mysqld_exporter_3307
cat .my.cnf
[client]
host = 127.0.0.1
user = exporter
password = expoter12ssdc3
socket = /tmp/mysql_3307.sock
修改启动文件也根据情况修改
11、常见报错
- level=error msg=“Error pinging mysqld: Error 1045: Access denied for user ‘root’@‘localhost’ (using password: YES)” source=“exporter.go:146”
1、如上报错,是授权问题,数据库中记得授权,并且查看mysqld_exporter下的my.cnf中的配置
2、如果授权密码都没问题,就要注意密码最后一位不能是字符,特别是#,mysqld_exporter 不识别,导致密码不对无法登陆
- level=error msg=“Error pinging mysqld: this user requires old password authentication. If you still want to use it, please add ‘allowOldPasswords=1’ to your DSN. See also https://github.com/go-sql-driver/mysql/wiki/old_passwords” source=“exporter.go:146
见于老数据库,是由于old_paswords参数导致,启动时候设置DSN的环境变量来启动,不用配置文件 格式
DATA_SOURCE_NAME=用户名:密码@unixl(Sock)/?参数=值&参数=值or
DATA_SOURCE_NAME=用户名:密码@(ip地址:端口)/?参数=值&参数=值
DATA SOURCE NANE=exporter:expoter12Ssdc3unix\(/tmp/mysql 3306.sock\)/?allowOldPasswords=true