1、部署Prometheus
###新建安装目录
mkdir /usr/local/prometheus
###下载Prometheus安装包
wget https://github.com/prometheus/prometheus/releases/download/v2.48.1/prometheus-2.48.1.linux-amd64.tar.gz
##下载可能会很慢,可直接用下属安装包
链接:安装包 提取码:q5t0
手动上传安装包:
###解压安装包到安装目录
tar -xf prometheus-2.48.1.linux-amd64.tar.gz -C /usr/local/prometheus
###目录重命名
mv prometheus-2.48.1.linux-amd64 prometheus
###Prometheus文件配置
##备份Prometheus配置文件
cp prometheus.yml prometheus.yml-bak
###重新配置Prometheus配置文件
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
###检查检验Prometheus配置文件
cd /usr/local/prometheus/prometheus
./promtool check config ./prometheus.yml
###启动Prometheus服务,有两种方式,可以手动启动,也可以systemctl启动
##手动启动
cd /usr/local/prometheus/prometheus
###手动启动
nohup ./prometheus --config.file=./prometheus.yml \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--storage.tsdb.retention=90d \
--storage.tsdb.path=./data &
查看启动日志,启动成功
查看进程,端口启动成功
###快速kill服务命令
ps -ef|grep prometheus|grep -v grep |awk '{print $2}'|xargs kill -9
###或者curl命令停止
curl -XPOST http://localhost:9090/-/quit
###重新加载Prometheus服务
curl -XPOST http://localhost:9090/-/reload
###方法二:启停服务
###以systemctl的方式,启停Prometheus服务
##配置Prometheussystemd中配置
cat > /usr/lib/systemd/system/prometheus.service <<EOF
[Unit]
Description=The Prometheus Server
After=network.target
[Service]
ExecStart=/usr/local/prometheus/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus/prometheus.yml \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--storage.tsdb.retention=90d \
--storage.tsdb.path="/usr/local/prometheus/prometheus/data/"
Restart=on-failure
RestartSec=15s
[Install]
WantedBy=multi-user.target
EOF
###重启加载配置
systemctl daemon-reload
##配置服务开机自启动
systemctl enable prometheus
##启停Prometheus服务
systemctl start prometheus
systemctl stop prometheus
服务启动后,web访问http://IP:9090
2、部署node_exporter
###下载安装包
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
###或者手动下载后,上传
###解压到指定安装目录
tar xf node_exporter-1.7.0.linux-amd64.tar.gz -C /usr/local/prometheus/
##重命名
mv node_exporter-1.7.0.linux-amd64 node_exporter
###服务启停,同样的,有两种方式,可手动启停,也可配置systemd方式启停
##手动启动服务
cd /usr/local/prometheus/node_exporter
nohup ./node_exporter --web.listen-address=0.0.0.0:4220 &
###--web.listen-address=0.0.0.0:4220 ###node_exporter暴露的端口
###服务快速kill命令
ps -ef|grep node_exporter|grep -v grep |awk '{print $2}'|xargs kill -9
###以systemd的方式管理node_exporter的启停
##配置服务systemd配置文件
cat > /usr/lib/systemd/system/node_exporter.service <<EOF
[Unit]
Description=The node_exporter Server
After=network.target
[Service]
ExecStart=/usr/local/prometheus/node_exporter/node_exporter \
--web.listen-address=0.0.0.0:4220 \
--collector.systemd \
--collector.systemd.unit-whitelist=(sshd|docker).service
Restart=on-failure
RestartSec=15s
SyslogIdentifier=node_exporter
[Install]
WantedBy=multi-user.target
EOF
###重新加载配置
systemctl daemon-reload
##配置服务开机自启动
systemctl enable node_exporter
##配置服务启停
systemctl start node_exporter
systemctl stop node_exporter
可以看到4220端口已启动
3、部署alertmanager
###下载安装包
cd /usr/local/prometheus
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
##或者手动下载后,上传服务器
###解压安装包
tar -xf alertmanager-0.26.0.linux-amd64.tar.gz -C /usr/local/prometheus
##重命名
mv alertmanager-0.26.0.linux-amd64 alertmanager
###配置共分为六部分,分别是global、templates、route、receivers、inhibit_rules、静默配置
##备份配置文件
cp alertmanager.yml alertmanager.yml-bak
###详细配置,根据实际情况修改配置
#global配置
global:
resolve_timeout: 5m #在报警恢复的时候不是立马发送的,在接下来的这个时间内,如果没有此报警信息触发,才发送报警恢复消息
smtp_smarthost: 'smtp.exmail.qq.com:465' #发件人对应邮件提供商的smtp地址,此处为腾讯企业邮箱stmp配置
smtp_from: 'xxx@company.com' #发件人邮箱地址
smtp_auth_username: 'xxx@company.com' #发件人的登陆用户名,默认和发件人地址一致
smtp_auth_password: 'xxxxxxx' #发件人的登陆密码,也可以是授权码。
smtp_require_tls: false #是否需要tls协议,默认是true
#templates配置
templates:
- '/usr/local/soft/alertmanager/email.tmpl' #自定义通知的模板的目录或者文件
#route配置
route: #每个输入警报进入根路由
group_by: ['alertname','cluster','service'] #将传入的报警中有这些标签的分为一个组,比如, cluster=A 和 alertname=LatencyHigh 会分成一个组
group_wait: 30s #指分组创建多久后才可以发送压缩的警报,也就是初次发警报的延时,这样会确保第一次通知的时候, 有更多的报警被压缩在一起
group_interval: 5m #当第一个通知发送,等待多久发送压缩的警报
repeat_interval: 1h #如果报警发送成功, 等待多久重新发送一次
receiver: 'email' #默认警报接收者
#receivers配置
receivers:
- name: 'email' #警报名称
email_configs:
- to: 'xxx@xxx.com' #接收警报的email
send_resolved: true #是否发送警报解除邮件
html: '{{ template "email.htm" . }}' #模板
headers: { Subject: "{{ .CommonLabels.severity }} {{ .CommonAnnotations.summary }}" } #标题
#报警抑制规则
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance'] #通过上面的配置,可以在alertname相同的情况下,critaical的报警会抑制warning级别的报警信息。
#静默配置
#静默配置是通过web界面配置的,通常用于服务升级或者长时间的服务故障,确保在接下来的时间内不会在收到同样报警信息
###手动启停alertmanager服务
cd /usr/local/prometheus/alertmanager
nohup ./alertmanager --config.file="alertmanager.yml" --web.listen-address=":9093" &
##停掉服务
ps -ef |grep alertmanager |grep -v grep |awk '{print $2}' | xargs kill -9
##重新加载服务
curl -XPOST http://localhost:9093/-/reload
###以systemd方式启停alertmanager服务
##配置systemd配置文件
cat > /usr/lib/systemd/system/alertmanager.service <<EOF
[Unit]
Description=The Prometheus Server
After=network.target
[Service]
ExecStart=/usr/local/prometheus/alertmanager/alertmanager \
--config.file=/usr/local/prometheus/alertmanager/alertmanager.yml \
--web.listen-address=0.0.0.0:9093
Restart=on-failure
RestartSec=15s
[Install]
WantedBy=multi-user.target
EOF
###重新加载配置
systemctl daemon-reload
##配置开机自启动
systemctl enable alertmanager
##配置服务启停
systemctl start alertmanager
systemctl stop alertmanager
web访问,http://IP:9093
4、部署grafana服务
###下载安装包
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.2.3.linux-amd64.tar.gz
##或者手动下载后,上传
##解压到指定目录
tar xf grafana-enterprise-10.2.3.linux-amd64.tar.gz -C /usr/local/prometheus/
##重命名
mv grafana-v10.2.3 grafana
##手动启动grafana
cd /usr/local/prometheus/grafana/bin/
nohup /usr/local/prometheus/grafana/bin/grafana-server &
##手动停服务
ps -ef |grep grafana |grep -v grep |awk '{print $2}'|xargs kill -9
通过浏览器访问:
http://IP:3000
默认登录用户名密码: admin/admin
初次登录后,会提示修改登录密码