我们使用HAProxy+Keepalived的方式部署游戏服务器前端负载均衡和高可用,因此需要对HAProxy的监控状况进行实时监控.

本文使用的HAProxy版本是1.4.24

参考官方文档 中的

 Statistics and monitoring

1.监控原理描述

HAProxy提供HTTP页面和状态Unix Socket可以显示HAProxy的状态信息,并且可以以CSV的格式导出。

HTTP页面可以通过类似 的方式查看

Unix Socket可以通过

echo "show info;show stat" | sudo socat stdio unix-connect:/tmp/haproxy  

本文主要通过第二种方式获取HAProxy的状态信息

在haproxy.cfg配置文件中设置状态socket

stats socket  /tmp/haproxy level admin 

 

level后面可以跟级别user,operator,admin

user是最低权限级别,只能看到一些非敏感信息

operator可以看到全部信息,但是只能修改一些非敏感信息

admin可以看到并且操作所有信息,需要慎用

$echo "show help" | sudo socat stdio unix-connect:/tmp/haproxy 

Unknown command. Please enter one of the following commands only :

  clear counters : clear max statistics counters (add 'all' for all counters)

  help           : this message

  prompt         : toggle interactive mode with prompt

  quit           : disconnect

  show info      : report information about the running process

  show stat      : report counters for each proxy and server

  show errors    : report last request and response errors for each proxy

  show sess [id] : report the list of current sessions or dump this session

  get weight     : report a server's current weight

  set weight     : change a server's weight

  set timeout    : change a timeout setting

  disable server : set a server in maintenance mode

  enable server  : re-enable a server that was previously in maintenance mode

show info 报告当前的HAProxy进程信息

Name: HAProxy

Version: 1.4.24

Release_date: 2013/06/17

Nbproc: 1

Process_num: 1

Pid: 7020

Uptime: 110d 16h25m55s

Uptime_sec: 9563155

Memmax_MB: 0

Ulimit-n: 131101

Maxsock: 131101

Maxconn: 65536

Maxpipes: 0

CurrConns: 14

PipesUsed: 0

PipesFree: 0

Tasks: 26

Run_queue: 1

node: master_loadbalance1

description: lb1

show stat显示HAProxy各个指标的计数

# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkf

ail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_cod

e,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,

srv_abrt,

login_game_pool,FRONTEND,,,24,868,2000,196721023,87244966860,121969199234,0,0,171448,,,,,OPEN,,,,,,,,,1,1,0,,,,0,95,0,628

,,,,0,195071390,0,1619236,28338,2034,,93,611,196721000,,,

login_pool,web1_80,0,0,0,38,2000,8333681,2356031055,2827436427,,0,,0,3,2211,11,UP,30,1,0,902,0,9558963

,0,,1,2,1,,8329209,,2,1,,199,L7OK,200,1,20,7967292,0,361648,7,0,0,,,,136,0,

login_pool,web2_80,0,0,0,63,2000,8333998,2358035705,2826639220,,0,,1,6,2281,13,UP,30,1,0,861,0,9558963

  0. pxname: proxy name                        1. svname: service name (FRONTEND for frontend, BACKEND for backend, any name    for server)  2. qcur: current queued requests  3. qmax: max queued requests  4. scur: current sessions  5. smax: max sessions  6. slim: sessions limit  7. stot: total sessions  8. bin: bytes in  9. bout: bytes out 10. dreq: denied requests 11. dresp: denied responses 12. ereq: request errors 13. econ: connection errors 14. eresp: response errors (among which srv_abrt) 15. wretr: retries (warning) 16. wredis: redispatches (warning) 17. status: status (UP/DOWN/NOLB/MAINT/MAINT(via)...) 18. weight: server weight (server), total weight (backend) 19. act: server is active (server), number of active servers (backend) 20. bck: server is backup (server), number of backup servers (backend) 21. chkfail: number of failed checks 22. chkdown: number of UP->DOWN transitions 23. lastchg: last status change (in seconds) 24. downtime: total downtime (in seconds) 25. qlimit: queue limit 26. pid: process id (0 for first instance, 1 for second, ...) 27. iid: unique proxy id 28. sid: service id (unique inside a proxy) 29. throttle: warm up status 30. lbtot: total number of times a server was selected 31. tracked: id of proxy/server if tracking is enabled 32. type (0=frontend, 1=backend, 2=server, 3=socket) 33. rate: number of sessions per second over last elapsed second 34. rate_lim: limit on new sessions per second 35. rate_max: max number of new sessions per second 36. check_status: status of last health check, one of:        UNK     -> unknown        INI     -> initializing        SOCKERR -> socket error        L4OK    -> check passed on layer 4, no upper layers testing enabled        L4TMOUT -> layer 1-4 timeout        L4CON   -> layer 1-4 connection problem, for example                   "Connection refused" (tcp rst) or "No route to host" (icmp)        L6OK    -> check passed on layer 6        L6TOUT  -> layer 6 (SSL) timeout        L6RSP   -> layer 6 invalid response - protocol error        L7OK    -> check passed on layer 7        L7OKC   -> check conditionally passed on layer 7, for example 404 with                   disable-on-404        L7TOUT  -> layer 7 (HTTP/SMTP) timeout        L7RSP   -> layer 7 invalid response - protocol error        L7STS   -> layer 7 response error, for example HTTP 5xx 37. check_code: layer5-7 code, if available 38. check_duration: time in ms took to finish last health check 39. hrsp_1xx: http responses with 1xx code 40. hrsp_2xx: http responses with 2xx code 41. hrsp_3xx: http responses with 3xx code 42. hrsp_4xx: http responses with 4xx code 43. hrsp_5xx: http responses with 5xx code 44. hrsp_other: http responses with other codes (protocol error) 45. hanafail: failed health checks details 46. req_rate: HTTP requests per second over last elapsed second 47. req_rate_max: max number of HTTP requests per second observed 48. req_tot: total number of HTTP requests received 49. cli_abrt: number of data transfers aborted by the client 50. srv_abrt: number of data transfers aborted by the server (inc. in eresp)

需要注意的是如果HAProxy是以多进程方式启动即设置nbproc的值不为1,那么每个进程都可以通过socket显示它的状态信息,所以看到的状态信息是在多个进程间切换的。

2.监控脚本编写

这里有三个监控脚本

haproxy_info.sh                   用于收集HAProxy的基本信息

haproxy_pool_discovery.py         用于zabbix通过LLD功能发现各个pool对,如login_pool:BACKEND,login_pool:web1_80等,通过低级发现可以动态的根据配置文件中配置的后端主机监控各个后端主机的状态

haproxy_stat.sh                   通过向stat socket发送show stat命令收集各个状态的值,脚本中会根据,进行判断第二个字段的值,因为有些字段是只有FRONTEND或BACKEND才会有,或者除了FRONTEND和BACKEND,其他都有等

haproxy_info.sh

#!/bin/bash#This script is used for getting haproxy info such as version ,uptime and number of processes etcmetric=$1stats_socket=/tmp/haproxyinfo_file=/tmp/haproxy_info.csvecho "show info"|/usr/bin/sudo /usr/bin/socat   unix-connect:$stats_socket  stdio > $info_filegrep $metric $info_file|awk '{print $2}'

haproxy_pool_discovery.py

需要安装socat并且要设置zabbxi客户端用户具有sudo权限执行socat

执行visudo命令更改

如下

## Disable "ssh hostname sudo 
", because it will show the password in clear.#         You have to run "ssh -t hostname sudo 
".#Defaults    !requirettyzabbixagent   ALL=(root)      NOPASSWD:/usr/bin/socat

#/usr/bin/python#This script is used to discovery disk on the serverimport subprocessimport jsonargs='''echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy|egrep -v '^#|^$'|awk -F',' '{print $1":"$2}' '''t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0]pools=[]for pool in t.split('\n'):    if len(pool) != 0:       pools.append({'{#POOL_NAME}':pool})print json.dumps({'data':pools},indent=4,separators=(',',':'))

执行结果

{    "data":[        {            "{#POOL_NAME}":"login_game_pool:FRONTEND"        },        {            "{#POOL_NAME}":"login_pool:web1_80"        },        {            "{#POOL_NAME}":"login_pool:web2_80"        },        {            "{#POOL_NAME}":"login_pool:BACKEND"        },            ]}

haproxy_stat.sh

#!/bin/bash# login_game_pool:FRONTENDpool_name=$(echo $1|awk -F':' '{print $1}')server_name=$(echo $1|awk -F':' '{print $2}')metric=$2stat_socket=/tmp/haproxystat_file=/tmp/haproxy_stat.csvecho "show stat"|sudo socat stdio unix-connect:/tmp/haproxy > $stat_filecase $metric in          qcur)              #current queued requests              if [ "$server_name" != "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $3}' $stat_file              else                  echo 0              fi             ;;          qmax)              #max queued requests              if [ "$server_name" != "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $4}' $stat_file              else                  echo 0              fi             ;;          scur)              #current sessions              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $5}' $stat_file             ;;          smax)              #max sessions              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $6}' $stat_file             ;;          slim)              #sessions limit              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $7}' $stat_file             ;;          stol)              #total sessions              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $8}' $stat_file             ;;           bin)              #bytes in               awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $9}' $stat_file             ;;          bout)              #bytes out              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $10}' $stat_file             ;;          dreq)              #denied requests              #only FRONTEND and BACKEND has this field              if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $11}' $stat_file              else                  echo 0              fi             ;;         dresp)              #denied responses              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $12}' $stat_file             ;;          ereq)              #request errors              #only FRONTEND has this field              if [ "$server_name" == "FRONTEND" ];then                 awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $13}' $stat_file              else                 echo 0              fi             ;;          econ)              #connection errors              #FRONTEND has not this field              if [ "$server_name" != "FRONTEND" ];then                 awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $14}' $stat_file              else                 echo 0              fi             ;;         eresp)              #response errors              #FRONTEND has not this field              if [ "$server_name" != "FRONTEND" ];then                 awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $15}' $stat_file              else                 echo 0              fi             ;;        status)              #status              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $18}' $stat_file              ;;       chkfail)              #number of failed checks              #FRONTEND and BACKEND has not this field              if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then                 echo 0              else                 awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $22}' $stat_file              fi              ;;       chkdown)              #number of UP->DOWN transitions              #FRONTEND has not this field will return 0              if [ "$server_name" != "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $23}' $stat_file              else                 echo 0              fi              ;;       lastchg)              #last status change in seconds              #FRONTEND has not this field will return 0              if [ "$server_name" != "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $24}' $stat_file              else                 echo 0              fi              ;;      downtime)              #total downtime in seconds              #FRONTEND has not this field will return 0              if [ "$server_name" != "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $25}' $stat_file              else                 echo 0              fi              ;;         lbtot)              #total number of times a server was selected              #FRONTEND has not this field              if [ "$server_name" != "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $31}' $stat_file              else                 echo 0              fi              ;;          rate)              #number of sessions per second over last elapsed second              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $34}' $stat_file              ;;    rate_limit)              #limit on new sessions per second              #only FRONTEND has this field              if [ "$server_name" == "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $35}' $stat_file              else                  echo 0              fi              ;;      rate_max)              #max number of new sessions per second              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $36}' $stat_file              ;;  check_status)              #status of last health check                if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then                 echo "NULL"              else                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $37}' $stat_file              fi              ;;      hrsp_1xx)              #http response with 1xx code              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $40}' $stat_file              ;;      hrsp_2xx)              #http response with 2xx code              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $41}' $stat_file              ;;      hrsp_3xx)              #http response with 3xx code              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $42}' $stat_file              ;;      hrsp_4xx)              #http response with 4xx code              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $43}' $stat_file              ;;      hrsp_5xx)              #http response with 5xx code              awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $44}' $stat_file              ;;      req_rate)              #HTTP requests per second over last elapsed second              #only FRONTEND has this field,others will return 0              if [ "$server_name" == "FRONTEND" ];then                 awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $47}' $stat_file              else                 echo 0              fi              ;;  req_rate_max)              #max number of HTTP requests per second observed              #only FRONTEND has this field,others will return 0              if [ "$server_name" == "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $48}' $stat_file              else                  echo 0              fi              ;;       req_tot)              #total number of HTTP requests recevied              #only FRONTEND has this field,others will return 0              if [ "$server_name" == "FRONTEND" ];then                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $49}' $stat_file              else                  echo 0              fi              ;;             *)               echo "please input the correct argument"              ;; esac

3.zabbix配置文件更改

添加haproxy_status.conf

### Option: UserParameter#	User-defined parameter to monitor. There can be several user-defined parameters.#	Format: UserParameter=
,
# See 'zabbix_agentd' directory for examples.## Mandatory: no# Default:# UserParameter=UserParameter=haproxy.info[*],/usr/local/zabbix/bin/haproxy_info.sh $1UserParameter=haproxy.discovery,/usr/bin/python /usr/local/zabbix/bin/haproxy_pool_discovery.pyUserParameter=haproxy.stat[*],/usr/local/zabbix/bin/haproxy_stat.sh $1 $2

4.添加zabbix模板

详细模板参考附件