Tuesday, 6 November 2012

Monitoring Linux System from Nagios


Linux system monitoring from Nagios is simple and straight forward. Install NRPE agent and Nagios Plugins on the system to be monitored and edit nrpe.cfg file to specify Nagios server / collector's IP address as shown below

allowed_hosts=<nagios server ip>,localhost
dont_blame_nrpe=1


Start the agent (nrpe) service

sudo /etc/init.d/nrpe start

Run below command on collector's terminal to verify the communication between the agent and collector. If everything is OK the command should return agent version details as shown below.

/usr/local/groundwork/nagios/libexec/check_nrpe -H <monitored system IP>
NRPE v2.12


Refer below table for trouble shooting communication issues

Error in communication


Trouble shooting

CHECK_NRPE: Socket timeout after 10 seconds.
Telnet to monitored host from collector on port number 5666, if it times out check hardware / software firewall rules. TCP port 5666 should be opened from collector to monitored host.
CHECK_NRPE: Error - Could not complete SSL handshake. Check nrpe.cfg file on monitored host for collector's IP. 
Connection refused by host Check if nrpe process is runing on monitored host. If process is not running start the agent (nrpe) service.
Once the above check is successful specify the required commands for plugin execution in nrpe.cfg file as shown below

command[check_users]=/usr/lib/nagios/plugins/check_users -w $ARG1$ -c $ARG2$
command[check_load]=/usr/lib/nagios/plugins/check_load -w $ARG1$ -c $ARG2$ $ARG3$
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ $ARG4$
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$
....
command[check_mem]=/usr/lib/nagios/plugins/check_mem -w $ARG1$ -c $ARG2$

Configure all required services for this host by using groundwork configuration tab.

No comments:

Post a Comment