使用zabbix最大的瓶颈在于数据库,维护好zabbix的数据存储,告警,就能很好地应用zabbix去构建监控系统。目前zabbix的数据主要存储在history和trends的2个表中,随着时间的推移,这两个表变得非常大,性能会非常差,影响监控的使用。对MySQL进行调优,能够极大的提升Zabbix的性能
传统的 Zabbix 监控方案存在问题:
1、数据库 MySQL 是单机的,因此不能支持较大量(指上几 T 量级的数据)的数据;
2、监控对象的数量和数据存储时间不能兼得;
现有zabbix优化方案及存在问题
1、分表分库
zabbix分表分库缓解了mysql单机的压力,但是当被监控设备达到一定体量的时候,仍然需要水平扩展,增加了监控成本。
2、表分区
对zabbix中的history和trends等表进行分区,按日期进行分区,每天一个,共保留90天分区。
虽然进行了分区。但总的体量还是存在,依然解决不了存储日益增大的问题。
zabbix分表分库方案
目的:
实现将监控告警与历史数据分在不同的库,监控告警只保留短期数据,超过这个周期外的数据删除,达到历史数据定量的目标。历史数据累存到另外一个库,只做查询使用。
zabbix-server将采集到数据主要存储于Histroy和Trends表中,其表结构中的数据类型如下:
实现步骤:
1、搭建二套zabbix-server(zabbix-server_A,zabbix-server_B),二套数据库(DB_A,DB_B)
2、zabbix-server_A正常配置接入被监控主机。zabbix_server_B就有所不同配置
a、zabbix_server_B的zabbix_server.conf中数据库账户只对zabbix的server库只读,不可让其有写入权限
[root@localhost ~]# cat /usr/local/zabbix-server/etc/zabbix_server.conf|egrep -v "^$|#"
LogFile=/tmp/zabbix_server.log
DBName=server
DBUser=zabbix
DBPassword=zabbix
DBSocket=/var/lib/mysql/mysql.sock
DBPort=3306
Timeout=4
LogSlowQueries=3000
StatsAllowedIP=127.0.0.1
StartVMwareCollectors=5
VMwareFrequency=60
VMwarePerfFrequency=60
VMwareCacheSize=8
b、zabbix_server_B的zabbix.conf.php数据库账户对zabbix的server库拥有读写权限[经测试如果用只读权限,将无法登录zabbix_server_B的web端]
[root@localhost ~]# cat /usr/local/nginx/html/conf/zabbix.conf.php|egrep -v "^$|#"
<?php
// Zabbix GUI configuration file.
$DB['TYPE'] = 'MYSQL';
$DB['SERVER'] = '10.1.100.6';
$DB['PORT'] = '3306';
$DB['DATABASE'] = 'server';
$DB['USER'] = 'root';#对server库可读写权限
$DB['PASSWORD'] = '123456';
// Schema name. Used for PostgreSQL.
$DB['SCHEMA'] = '';
// Used for TLS connection.
$DB['ENCRYPTION'] = false;
$DB['KEY_FILE'] = '';
$DB['CERT_FILE'] = '';
$DB['CA_FILE'] = '';
$DB['VERIFY_HOST'] = false;
$DB['CIPHER_LIST'] = '';
// Use IEEE754 compatible value range for 64-bit Numeric (float) history values.
// This option is enabled by default for new Zabbix installations.
// For upgraded installations, please read database upgrade notes before enabling this option.
$DB['DOUBLE_IEEE754'] = true;
$ZBX_SERVER = 'localhost';
$ZBX_SERVER_PORT = '10051';
$ZBX_SERVER_NAME = 'linfan';
$IMAGE_FORMAT_DEFAULT = IMAGE_FORMAT_PNG;
// Uncomment this block only if you are using Elasticsearch.
// Elasticsearch url (can be string if same url is used for all types).
//$HISTORY['url'] = [
// 'uint' => 'http://localhost:9200',
// 'text' => 'http://localhost:9200'
//];
// Value types stored in Elasticsearch.
//$HISTORY['types'] = ['uint', 'text'];
// Used for SAML authentication.
// Uncomment to override the default paths to SP private key, SP and IdP X.509 certificates, and to set extra settings.
//$SSO['SP_KEY'] = 'conf/certs/sp.key';
//$SSO['SP_CERT'] = 'conf/certs/sp.crt';
//$SSO['IDP_CERT'] = 'conf/certs/idp.crt';
//$SSO['SETTINGS'] = [];
3、配置OGG同步DB_A 到DB_B除了以上七个表的所有表数据
4、编写同步历史表及趋势表python脚本
import pymysql
import time
import threading
#由于七个表存在三种表结构,我这里写了三个函数体
def update(tablename):
#DB_B
To_db = pymysql.connect(host='10.1.100.6', port=3306, user='root', password='123456', db='server', charset='utf8')
To_cursor = To_db.cursor()
#DB_A
From_db = pymysql.connect(host='10.1.100.4', port=3306, user='root', password='123456', db='server', charset='utf8')
From_cursor = From_db.cursor()
result_list=[]
#初始我这里用mysqldump全同步,所有DB_B刚开始趋势表和历史表是存在数据的,查询最后一条数据的时间戳
To_sql = "select clock from %s order by clock DESC limit 1" % tablename
To_cursor.execute(To_sql)
To_results = To_cursor.fetchall()
for clock_lasts in To_results:
for clock_last in clock_lasts:
#在DB_A去查询大于这个时间戳的数据
From_sql = "select * from %s where clock > '%s' " % (tablename, clock_last)
From_cursor.execute(From_sql)
From_results = From_cursor.fetchall()
if From_results:
for values in From_results:
one_list = list(values)
result_list.append(one_list)
count = 0
for lists in result_list:
itemid_values=lists[0]
clock_values = lists[1]
value_values = lists[2]
ns_values = lists[3]
#为了数据量一次写入过大导致数据库缓慢,这里做了每插入3000条数据,休眠10s
count += 1
if count % 3000 ==0:
time.sleep(10)
#将上面查询到的结果写入DB_B
i_sql = "insert into %s (itemid,clock,value,ns) values ('%s','%s','%s','%s')" %(tablename,itemid_values,clock_values,value_values,ns_values)
To_cursor.execute(i_sql)
To_db.commit()
To_cursor.close()
To_db.close()
From_cursor.close()
From_db.close()
def update_history_log(tablename):
To_db = pymysql.connect(host='10.1.100.6', port=3306, user='root', password='123456', db='server', charset='utf8')
To_cursor = To_db.cursor()
From_db = pymysql.connect(host='10.1.100.4', port=3306, user='root', password='123456', db='server', charset='utf8')
From_cursor = From_db.cursor()
result_list = []
To_sql = "select clock from %s order by clock DESC limit 1" % tablename
To_cursor.execute(To_sql)
To_results = To_cursor.fetchall()
for clock_lasts in To_results:
for clock_last in clock_lasts:
From_sql = "select * from %s where clock > '%s' " % (tablename, clock_last)
From_cursor.execute(From_sql)
From_results = From_cursor.fetchall()
if From_results:
for values in From_results:
one_list = list(values)
result_list.append(one_list)
count = 0
for lists in result_list:
itemid_values = lists[0]
clock_values = lists[1]
timestamp_values = lists[2]
source_values = lists[3]
severity_values = lists[4]
value_values = lists[5]
logeventid_values = lists[6]
ns_values = lists[7]
count += 1
if count % 3000 == 0:
time.sleep(10)
i_sql = "insert into %s(itemid,clock,timestamp,source,severity,value,logeventid,ns) values ('%s','%s','%s','%s','%s','%s','%s','%s')" %(tablename,itemid_values,clock_values,timestamp_values,source_values,severity_values,value_values,logeventid_values,ns_values)
To_cursor.execute(i_sql)
To_db.commit()
To_cursor.close()
To_db.close()
From_cursor.close()
From_db.close()
def update_trends(tablename):
To_db = pymysql.connect(host='10.1.100.6', port=3306, user='root', password='123456', db='server', charset='utf8')
To_cursor = To_db.cursor()
From_db = pymysql.connect(host='10.1.100.4', port=3306, user='root', password='123456', db='server', charset='utf8')
From_cursor = From_db.cursor()
result_list = []
To_sql = "select clock from %s order by clock DESC limit 1" % tablename
To_cursor.execute(To_sql)
To_results = To_cursor.fetchall()
for clock_lasts in To_results:
for clock_last in clock_lasts:
From_sql = "select * from %s where clock > '%s' " % (tablename, clock_last)
From_cursor.execute(From_sql)
From_results = From_cursor.fetchall()
if From_results:
for values in From_results:
one_list = list(values)
result_list.append(one_list)
count = 0
for lists in result_list:
itemid_values = lists[0]
clock_values = lists[1]
num_values = lists[2]
value_min_values = lists[3]
value_avg_values = lists[4]
value_max_values = lists[5]
count += 1
if count % 3000 == 0:
time.sleep(10)
i_sql = "insert into %s (itemid,clock,num,value_min,value_avg,value_max) values ('%s','%s','%s','%s','%s','%s')" %(tablename,itemid_values,clock_values,num_values,value_min_values,value_avg_values,value_max_values)
To_cursor.execute(i_sql)
To_db.commit()
To_cursor.close()
To_db.close()
From_cursor.close()
From_db.close()
def main():
#为了脚本执行速度快,我这里使用了多线程
t_history = threading.Thread(target=update, args=("history",))
t_history_str = threading.Thread(target=update, args=("history_str", ))
t_history_uint = threading.Thread(target=update, args=("history_uint",))
t_history_text = threading.Thread(target=update, args=("history_text",))
t_history_log = threading.Thread(target=update_history_log, args=("history_log",))
t_update_trends = threading.Thread(target=update_trends, args=("trends",))
t_trends_uint = threading.Thread(target=update_trends, args=("trends_uint",))
t_history.daemon=False
t_history.start()
t_history_str.daemon = False
t_history_str.start()
t_history_uint.daemon = False
t_history_uint.start()
t_history_text.daemon = False
t_history_text.start()
t_history_log.daemon = False
t_history_log.start()
t_update_trends.daemon = False
t_update_trends.start()
t_trends_uint.daemon = False
t_trends_uint.start()
if __name__ == '__main__':
starttime = time.time()
main()
endtime = time.time()
print(endtime - starttime)
5、编写删除zabbix-server_A周期外趋势及历史数据的python脚本
脚本编写以时间戳为删除标准,跟上面脚本写法类似,上古将不在此展示了。