组件1:Prometheus 搜集时间戳监控项,编写函数触发警告
组件2:alertmanager 获取警告,并发送告警信息给邮箱,钉钉等
组件3:node_exporter 客户端,用来响应请求时间戳
下载地址:https://prometheus.io/download/
环境阿里云3台主机:
OS:centos7.5
我已下载至home,解压得到二进制文件
#tar -xzf prometheus-2.34.0-rc.2.linux-amd64.tar.gz
#tar -zxf alertmanager-0.23.0.linux-amd64.tar.gz
scp node_exporter-1.3.1.linux-amd64.tar.gz root@k8s-node101:/home
scp node_exporter-1.3.1.linux-amd64.tar.gz root@k8s-node102:/home
3台节点都运行
#tar -zxf node_exporter-1.3.1.linux-amd64.tar.gz
把二进制目录移动至自己喜欢的地方,比如我放在/usr/local/,当然你也可以放在别的地方
#mv prometheus-2.34.0-rc.2.linux-amd64 /usr/local/prometheus
#mv alertmanager-0.23.0.linux-amd64 /usr/local/alertmanager
3台节点都运行
#mv node_exporter-1.3.1.linux-amd64 /usr/local/node_exporter
用二进制文件配置systemd,这一步主要是问了systemctl 管理服务,并设置开机启动
#vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=https://prometheus.io
After=network.target
[Service]
WorkingDirectory=/usr/local/prometheus
Restart=on-failure
ExecStart=/usr/local/prometheus/prometheus
ExecReload=/bin/kill -HUP $MAINPID
Type=simple
KillMode=control-group
RestartSec=10
[Install]
WantedBy=multi-user.target
#vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=https://prometheus.io
After=network.target
[Service]
WorkingDirectory=/usr/local/alertmanager
Restart=on-failure
ExecStart=/usr/local/alertmanager
ExecStop=/bin/kill -KILL $MAINPID
Type=simple
#ExecReload=/bin/kill -HUP $MAINPID
KillMode=control-group
RestartSec=10
[Install]
WantedBy=multi-user.target
3台节点都运行(用户名密码一样)
#vim /usr/local/node_exporter/web-config.yml
basic_auth_users:
k8s: $2y$10$1ak5ZyK2wfbnMsv0/L78aOOAEiR3JI6PC2AnTf/Ictdh6prB4Um0.
注意格式:
这里的密码需要加密,加密命令为:htpasswd -nBC 10 "k8s" ,根据提示输入密码:
htpasswd 命令需要 yum install -y httpd-tools
如果想用https,请参考:
https://github.com/prometheus/exporter-toolkit/blob/v0.1.0/https/README.md
3台节点都运行
#vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=https://prometheus.io
After=network.target
[Service]
WorkingDirectory=/usr/local/node_exporter
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --web.config=web-config.yml
ExecStop=/bin/kill -KILL $MAINPID
Type=simple
#ExecReload=/bin/kill -HUP $MAINPID
KillMode=control-group
RestartSec=10
[Install]
WantedBy=multi-user.target
#systemctl daemon-reload
组件1监听端口:9090,无账号认证
#systemctl enable prometheus
组件2监听端口:
#systemctl enable alertmanager
组件3监听端口:9100
3台节点都运行
#systemctl start node_exporter
#systemctl enable node_exporter
添加节点,注意yml格式
#/usr/local/prometheus/prometheus.yml
检查语法
#cd /usr/local/prometheus
#./promtool check config prometheus.yml
启动prometheus
#systemctl start prometheus
登录:IP:9090
让我们添加3个监控项
#内存使用率
node_memory_Active_bytes/node_memory_MemTotal_bytes*100
#CPU使用率-平均5分钟内
avg(irate(node_cpu_seconds_total{mode="idle"}[5m]))by(instance)*100
#磁盘使用率
(1-node_filesystem_avail_bytes{mountpoint=~"/|/run ",device="rootfs"}/node_filesystem_size_bytes{mountpoint=~"/|/run ",device="rootfs"})*100
关于告警,没时间了,后面再说。
告警需要设置prometheus,发送alertmanager,然后在alertmanager设置邮箱和钉钉推送