0
点赞
收藏
分享

微信扫一扫

Open-Falcon 监控系统监控 MySQL/Redis/MongoDB 状态监控

背景:

Open-Falcon 是小米运维部开源的一款互联网企业级监控系统解决方案,具体的安装和使用说明请见官网:​​http://open-falcon.org/​​,是一款比较全的监控。而且提供各种API,只需要把数据按照规定给出就能出图,以及报警、集群支持等等。

监控:

1) MySQL 收集信息脚本(mysql_monitor.py)

#!/bin/env python
# -*- encoding: utf-8 -*-

from __future__ import division
import MySQLdb
import datetime
import time
import os
import sys
import fileinput
import requests
import json
import re


class MySQLMonitorInfo():

def __init__(self,host,port,user,password):
self.host = host
self.port = port
self.user = user
self.password = password

def stat_info(self):
try:
m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')
query = "SHOW GLOBAL STATUS"
cursor = m.cursor()
cursor.execute(query)
Str_string = cursor.fetchall()
Status_dict = {}
for Str_key,Str_value in Str_string:
Status_dict[Str_key] = Str_value
cursor.close()
m.close()
return Status_dict

except Exception, e:
print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
print e
Status_dict = {}
return Status_dict

def engine_info(self):
try:
m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')
_engine_regex = re.compile(ur'(History list length) ([0-9]+\.?[0-9]*)\n')
query = "SHOW ENGINE INNODB STATUS"
cursor = m.cursor()
cursor.execute(query)
Str_string = cursor.fetchone()
a,b,c = Str_string
cursor.close()
m.close()
return dict(_engine_regex.findall(c))
except Exception, e:
print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
print e
return dict(History_list_length=0)

if __name__ == '__main__':

open_falcon_api = 'http://192.168.200.86:1988/v1/push'

db_list= []
for line in fileinput.input():
db_list.append(line.strip())
for db_info in db_list:
# host,port,user,password,endpoint,metric = db_info.split(',')
host,port,user,password,endpoint = db_info.split(',')

timestamp = int(time.time())
step = 60
# tags = "port=%s" %port
tags = ""

conn = MySQLMonitorInfo(host,int(port),user,password)
stat_info = conn.stat_info()
engine_info = conn.engine_info()

mysql_stat_list = []
monitor_keys = [
('Com_select','COUNTER'),
('Qcache_hits','COUNTER'),
('Com_insert','COUNTER'),
('Com_update','COUNTER'),
('Com_delete','COUNTER'),
('Com_replace','COUNTER'),
('MySQL_QPS','COUNTER'),
('MySQL_TPS','COUNTER'),
('ReadWrite_ratio','GAUGE'),
('Innodb_buffer_pool_read_requests','COUNTER'),
('Innodb_buffer_pool_reads','COUNTER'),
('Innodb_buffer_read_hit_ratio','GAUGE'),
('Innodb_buffer_pool_pages_flushed','COUNTER'),
('Innodb_buffer_pool_pages_free','GAUGE'),
('Innodb_buffer_pool_pages_dirty','GAUGE'),
('Innodb_buffer_pool_pages_data','GAUGE'),
('Bytes_received','COUNTER'),
('Bytes_sent','COUNTER'),
('Innodb_rows_deleted','COUNTER'),
('Innodb_rows_inserted','COUNTER'),
('Innodb_rows_read','COUNTER'),
('Innodb_rows_updated','COUNTER'),
('Innodb_os_log_fsyncs','COUNTER'),
('Innodb_os_log_written','COUNTER'),
('Created_tmp_disk_tables','COUNTER'),
('Created_tmp_tables','COUNTER'),
('Connections','COUNTER'),
('Innodb_log_waits','COUNTER'),
('Slow_queries','COUNTER'),
('Binlog_cache_disk_use','COUNTER')
]

for _key,falcon_type in monitor_keys:
if _key == 'MySQL_QPS':
_value = int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0))
elif _key == 'MySQL_TPS':
_value = int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))
elif _key == 'Innodb_buffer_read_hit_ratio':
try:
_value = round((int(stat_info.get('Innodb_buffer_pool_read_requests',0)) - int(stat_info.get('Innodb_buffer_pool_reads',0)))/int(stat_info.get('Innodb_buffer_pool_read_requests',0)) * 100,3)
except ZeroDivisionError:
_value = 0
elif _key == 'ReadWrite_ratio':
try:
_value = round((int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0)))/(int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))),2)
except ZeroDivisionError:
_value = 0
else:
_value = int(stat_info.get(_key,0))

falcon_format = {
'Metric': '%s' % (_key),
'Endpoint': endpoint,
'Timestamp': timestamp,
'Step': step,
'Value': _value,
'CounterType': falcon_type,
'TAGS': tags
}
mysql_stat_list.append(falcon_format)

#_key : History list length
for _key,_value in engine_info.items():
_key = "Undo_Log_Length"
falcon_format = {
'Metric': '%s' % (_key),
'Endpoint': endpoint,
'Timestamp': timestamp,
'Step': step,
'Value': int(_value),
'CounterType': "GAUGE",
'TAGS': tags
}
mysql_stat_list.append(falcon_format)

print json.dumps(mysql_stat_list,sort_keys=True,indent=4)
requests.post(open_falcon_api, data=json.dumps(mysql_stat_list))

指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

指标

类型

说明

 Undo_Log_Length

 GAUGE

未清除的Undo事务数

 Com_select

 COUNTER

 select/秒=QPS

 Com_insert

 COUNTER

 insert/秒

 Com_update

 COUNTER

 update/秒

 Com_delete

 COUNTER

 delete/秒

 Com_replace

 COUNTER

 replace/秒

 MySQL_QPS

 COUNTER

 QPS

 MySQL_TPS

 COUNTER

 TPS 

 ReadWrite_ratio

 GAUGE

 读写比例

 Innodb_buffer_pool_read_requests

 COUNTER

 innodb buffer pool 读次数/秒

 Innodb_buffer_pool_reads

 COUNTER

 Disk 读次数/秒

 Innodb_buffer_read_hit_ratio

 GAUGE

 innodb buffer pool 命中率

 Innodb_buffer_pool_pages_flushed

 COUNTER

 innodb buffer pool 刷写到磁盘的页数/秒

 Innodb_buffer_pool_pages_free

 GAUGE

 innodb buffer pool 空闲页的数量

 Innodb_buffer_pool_pages_dirty

 GAUGE

 innodb buffer pool 脏页的数量

 Innodb_buffer_pool_pages_data

 GAUGE

 innodb buffer pool 数据页的数量

 Bytes_received

 COUNTER

 接收字节数/秒

 Bytes_sent

 COUNTER

 发送字节数/秒

 Innodb_rows_deleted

 COUNTER

 innodb表删除的行数/秒

 Innodb_rows_inserted

 COUNTER 

 innodb表插入的行数/秒

 Innodb_rows_read

 COUNTER 

 innodb表读取的行数/秒

 Innodb_rows_updated 

 COUNTER 

 innodb表更新的行数/秒

 Innodb_os_log_fsyncs

 COUNTER 

 Redo Log fsync次数/秒 

 Innodb_os_log_written

 COUNTER 

 Redo Log 写入的字节数/秒

 Created_tmp_disk_tables

 COUNTER 

 创建磁盘临时表的数量/秒

 Created_tmp_tables

 COUNTER 

 创建内存临时表的数量/秒

 Connections

 COUNTER 

 连接数/秒

 Innodb_log_waits

 COUNTER 

 innodb log buffer不足等待的数量/秒

 Slow_queries

 COUNTER 

 慢查询数/秒

 Binlog_cache_disk_use

 COUNTER 

 Binlog Cache不足的数量/秒

使用说明:读取配置到都数据库列表执行,配置文件格式如下(mysqldb_list.txt):

 IP,Port,User,Password,endpoint

192.168.2.21,3306,root,123,mysql-21:3306
192.168.2.88,3306,root,123,mysql-88:3306

最后执行:

python mysql_monitor.py mysqldb_list.txt

2) Redis 收集信息脚本(redis_monitor.py)

#!/bin/env python
#-*- coding:utf-8 -*-

import json
import time
import re
import redis
import requests
import fileinput
import datetime

class RedisMonitorInfo():

def __init__(self,host,port,password):
self.host = host
self.port = port
self.password = password

def stat_info(self):
try:
r = redis.Redis(host=self.host, port=self.port, password=self.password)
stat_info = r.info()
return stat_info
except Exception, e:
print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
print e
return dict()

def cmdstat_info(self):
try:
r = redis.Redis(host=self.host, port=self.port, password=self.password)
cmdstat_info = r.info('Commandstats')
return cmdstat_info
except Exception, e:
print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")
print e
return dict()

if __name__ == '__main__':

open_falcon_api = 'http://192.168.200.86:1988/v1/push'

db_list= []
for line in fileinput.input():
db_list.append(line.strip())
for db_info in db_list:
# host,port,password,endpoint,metric = db_info.split(',')
host,port,password,endpoint = db_info.split(',')

timestamp = int(time.time())
step = 60
falcon_type = 'COUNTER'
# tags = "port=%s" %port
tags = ""

conn = RedisMonitorInfo(host,port,password)

#查看各个命令每秒执行次数
redis_cmdstat_dict = {}
redis_cmdstat_list = []
cmdstat_info = conn.cmdstat_info()
for cmdkey in cmdstat_info:
redis_cmdstat_dict[cmdkey] = cmdstat_info[cmdkey]['calls']
for _key,_value in redis_cmdstat_dict.items():
falcon_format = {
'Metric': '%s' % (_key),
'Endpoint': endpoint,
'Timestamp': timestamp,
'Step': step,
'Value': int(_value),
'CounterType': falcon_type,
'TAGS': tags
}
redis_cmdstat_list.append(falcon_format)

#查看Redis各种状态,根据需要增删监控项,str的值需要转换成int
redis_stat_list = []
monitor_keys = [
('connected_clients','GAUGE'),
('blocked_clients','GAUGE'),
('used_memory','GAUGE'),
('used_memory_rss','GAUGE'),
('mem_fragmentation_ratio','GAUGE'),
('total_commands_processed','COUNTER'),
('rejected_connections','COUNTER'),
('expired_keys','COUNTER'),
('evicted_keys','COUNTER'),
('keyspace_hits','COUNTER'),
('keyspace_misses','COUNTER'),
('keyspace_hit_ratio','GAUGE'),
('keys_num','GAUGE'),
]
stat_info = conn.stat_info()
for _key,falcon_type in monitor_keys:
#计算命中率
if _key == 'keyspace_hit_ratio':
try:
_value = round(float(stat_info.get('keyspace_hits',0))/(int(stat_info.get('keyspace_hits',0)) + int(stat_info.get('keyspace_misses',0))),4)*100
except ZeroDivisionError:
_value = 0
#碎片率是浮点数
elif _key == 'mem_fragmentation_ratio':
_value = float(stat_info.get(_key,0))
#拿到key的数量
elif _key == 'keys_num':
_value = 0
for i in range(16):
_key = 'db'+str(i)
_num = stat_info.get(_key)
if _num:
_value += int(_num.get('keys'))
_key = 'keys_num'
#其他的都采集成counter,int
else:
try:
_value = int(stat_info[_key])
except:
continue
falcon_format = {
'Metric': '%s' % (_key),
'Endpoint': endpoint,
'Timestamp': timestamp,
'Step': step,
'Value': _value,
'CounterType': falcon_type,
'TAGS': tags
}
redis_stat_list.append(falcon_format)

load_data = redis_stat_list+redis_cmdstat_list
print json.dumps(load_data,sort_keys=True,indent=4)
requests.post(open_falcon_api, data=json.dumps(load_data))

指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

指标

类型

说明

 connected_clients

 GAUGE

连接的客户端个数

 blocked_clients

 GAUGE

被阻塞客户端的数量

 used_memory

 GAUGE

 Redis分配的内存的总量

 used_memory_rss

 GAUGE

 OS分配的内存的总量

 mem_fragmentation_ratio

 GAUGE

 内存碎片率,used_memory_rss/used_memory

 total_commands_processed

 COUNTER

 每秒执行的命令数,比较准确的QPS

 rejected_connections

 COUNTER

 被拒绝的连接数/秒

 expired_keys

 COUNTER

 过期KEY的数量/秒 

 evicted_keys

 COUNTER

 被驱逐KEY的数量/秒

 keyspace_hits

 COUNTER

 命中KEY的数量/秒

 keyspace_misses

 COUNTER

 未命中KEY的数量/秒

 keyspace_hit_ratio

 GAUGE

 KEY的命中率

 keys_num

 GAUGE

 KEY的数量

 cmd_*

 COUNTER

 各种名字都执行次数/秒

使用说明:读取配置到都数据库列表执行,配置文件格式如下(redisdb_list.txt):

 IP,Port,Password,endpoint

192.168.1.56,7021,zhoujy,redis-56:7021
192.168.1.55,7021,zhoujy,redis-55:7021

最后执行:

python redis_monitor.py redisdb_list.txt

3) MongoDB 收集信息脚本(mongodb_monitor.py)

...后续添加

 

4)其他相关的监控(需要装上agent),比如下面的指标:

告警项

触发条件

备注

load.1min

all(#3)>10

Redis服务器过载,处理能力下降

cpu.idle

all(#3)<10

CPU idle过低,处理能力下降

df.bytes.free.percent

all(#3)<20

磁盘可用空间百分比低于20%,影响从库RDB和AOF持久化

mem.memfree.percent

all(#3)<15

内存剩余低于15%,Redis有OOM killer和使用swap的风险

mem.swapfree.percent

all(#3)<80

使用20% swap,Redis性能下降或OOM风险

net.if.out.bytes

all(#3)>94371840

网络出口流量超90MB,影响Redis响应

net.if.in.bytes

all(#3)>94371840

网络入口流量超90MB,影响Redis响应

disk.io.util

all(#3)>90

磁盘IO可能存负载,影响从库持久化和阻塞写

 

相关文档:

​​https://github.com/iambocai/falcon-monit-scripts​​(redis monitor)

​​https://github.com/ZhuoRoger/redismon​​(redis monitor)

 

~~~~~~~~~~~~~~~ 万物之中,希望至美 ~~~~~~~~~~~~~~~

举报

相关推荐

0 条评论