How-To: Fix service check time outs in Nagios + NRPE deployed in CentOS/RHEL 5
Once you get used to writing plug-ins in Nagios and the complexity of the plug-ins you write grows, you may encounter this error, service check timed out
.
If some of your service checks have this problem, you can isolate the problem in these 3 values:
1. how slow is the plugin
- This is the first thing you should do. Check if how much time does your plugin needs before it can finish checking and provide an exit status. Log-on to the server you’re monitoring and run the plugin locally. Use the
time
- command to measure.
$ time /usr/lib/nagios/plugins/check_service
2. how short is NRPE’s patience
- Once you have the value (in seconds) in step #1, check your NRPE configuration in that same server . The default location of NRPE’s configuration is
/etc/nagios/nrpe.cfg
- Find this parameter,
command_timeout
- . The value of this parameter, in seconds, must be greater than the value that you’ve got in step #1.
- Once the parameter’s been set, restart the NRPE service (
service nrpe restart
- ).
3. how short is Nagios’ patience
- Nagios executes a command,
check_nrpe
- , to connect to a NRPE service.
check_nrpe
- has a timeout paramer,
-t
- . This parameter must have a bigger value than the one you set in step #2.
- Log-on to your Nagios server and you can set this by opening the commands configuration file,
/etc/nagios/objects/commands.cfg
- Find
check_nrpe
- , and edit its
command_line
- and set the
-t
- parameter. For instance, if you want the timeout value to be 500 seconds, it will look like this:
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 500
- Restart the Nagios service afterwards (
service nagios restart
- ).
In most cases these 3 steps should do