Translations of this page:

check_generic in 30 minutes - step by step tutorial

check_generic could be the swiss knife in your monitoring tool box, but you have to endeavor to find its real capabilities and strengths. Let's have a short session and see what we can do with check_generic.

1. Always test the stuff as 'nagios' user

First do what you always should do when exploring plugins: login as 'nagios' user. Lots of errors occur while the people are testing as root and are wondering while the plugins are behaving different under UID 'nagios' in the real productive operation.

# su - nagios
$ _

Secondly we want to have a brief overview and just call the plugin without any option

$ ./check_generic 
check_generic error: no commandline specified

check_generic -e <cmdline> -o|u|w|c <expression> [-f false_state] [-n name] [-t timeout] [-r level]
check_generic [-h | --help]
check_generic [-V | --version]

Good start. ;-) Complains about a missing command. Time to fix it…

2. check_generic --execute "command" --critical "perl expr"

OK, let's begin with the two parts you need for every call of check_generic:

  1. -e/–execute to select the command to be run and
  2. -c/–critical to define the condition which makes the plugins state critical.

It can be also -w/–warning or -u/–unknown but anyway: we start small and grow lateron.

3. Nagiostats to monitor Nagios itself

Now we're looking for something to monitor. (Normally you should know this before you begin writing or configuring a plugin ;-)). What's about monitoring Nagios itself? There's a small program called nagiostats which is part of each nagios installation. Now lets see what we can do with it.
If you start nagiostats on a running Nagios system it gives lots of figures which describe the number of checks and the performance of the whole Nagios system.
We want to concentrate on the performance, and this is described by the latency. Latency for a service means that Nagios schedules a check for a certain service. But mostly the service check is executed a bit later than scheduled. The difference between the time scheduled and really executed is the Service Check Latency.
The next step for our check is to extract this figure from nagiostats output. We could do this with the small Unix command line

$ /usr/local/nagios/bin/nagiostats  | grep "Active Service Latency:" | awk '{print $8}'

But there is a better way - we can use the MRTG output option

$ /usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT

It returns the average latency in milliseconds. So now try it:

$ ./check_generic -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT"

Sh…its still complaining something like
Sorry Dave. No evaluation expression specified.
Okidok - now comes the trick:

4. Perl expression to evaluate the command output

Every perl expression is allowed to evaluate the command output.

  • You can do a ”>100” or a ”<50” if you have numerical comparison.
  • If you want to check a string, just take “eq abc”.
  • Regular expressions are allowed: ”=~/perl-regex/”

For our example we begin with 1 minute latency, which is 60000 milliseconds:

$ ./check_generic -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT" -c ">60000"

By the way: for my opinion this threshold notation is much easier and much less confusing like the original nagios threshold mimik with -c “2:5”… But no more time to lose, lets see what our plugin is doing:

$ ./check_generic -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT" -c ">60000"
CHANGEME OK - result:533 match:none

What does it mean? Our plugin has done a simple perl evaluation: “533>60000” → false
But see the details of the result CHANGEME OK - result:533 match:none:

  • CHANGEME - you can define a name here with the -n option
  • OK - there is no critical state, everything is fine
  • result: 533 - the current service check latency is 533ms
  • match:none - there is no match against any (here: the critical) threshold

5. Congretulation, your first check_generic monitoring is running

Now lets enhance it a little bit. First of all we want also a warning threshold. Just add -w ”>30000”. OK, we want to see something. So we try the following commandline (we now also have a name for our check!)

$ ./check_generic -n nagios_service_latency -e "/usr/local/nagios/bin/nagiostats -m -d AVGACTSVCLAT" -c ">60000" -w ">500"
nagios_service_latency WARNING - result:616 match:>500 severities:warning
  

Wow, there's something new:

  • The plugin has a name: nagios_service_latency (due to the -n option)
  • it has the state WARNING
  • it matches against the rule ”>500” (see match:>500)
  • severities:warning means that the warning rule matched here. It can be more than one rule that is matching.

6. You can now add this check to your Nagios config now

That's it. It took some time, but tell me: was it really difficult? ;-)

For the end some hints to 'configure' your check_generic settings:

  • Logon as 'nagios'. Just play with the plugin until your config fits.
  • If you are running a more complicated command, first check this command outside check_generic until it works.
  • Play with the thresholds to provoke warning and critical events. Just to see that everything is working.
  • Enjoy (and have a look onto the linux page).
projects/check_generic/tutorial.txt · Last modified: 2007/10/19 14:12 by flackem
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0