Nagios plugin to count apache threads

Overview

At work I have a misbehaving web server. Sometimes it spawns the maximum number of apache threads (which has a hardcoded maximum of 256, no matter what you configure) and then occupies 100% of the processor. I have decided that the normal nagios checks for the http site and ssh and so on aren’t good enough for monitoring purposes.

So I wrote my own simple nagios check. And then I put it in an rpm for easy deployment.

The nagios check

Here is the code for check_apache_threads, although you can check the latest version at my github page.

#!/bin/sh
# File: /usr/lib64/nagios/plugins/check_apache_threads
# Author: bgstack15@gmail.com
# Startdate: 2017-01-09 15:53
# Title: Nagios Check for Apache Threads
# Purpose: For a troublesome dmz wordpress host
# Package: nagios-plugins-apache-threads
# History:
# Usage:
# In nagios/nconf, use this checkcommand check command line: $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "$USER1$/check_apache_threads -w $ARG1$ -c $ARG2$"
# Reference: general design /usr/lib64/nagios/plugins/check_sensors
# general design http://www.kernel-panic.it/openbsd/nagios/nagios6.html
# case -w http://www.linuxquestions.org/questions/programming-9/ash-test-is-string-a-contained-in-string-b-671773/
# Improve:
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION="0.0.1"
. $PROGPATH/utils.sh
print_usage() {
cat <<EOF
Usage: $PROGNAME -w <thresh_warn> -c <thresh_crit>
EOF
}
print_help() {
print_revision $PROGNAME $REVISION
echo ""
print_usage
echo ""
echo "This plugin checks for the number of active apache threads."
echo ""
support
exit $STATE_OK
}
# MAIN
# Total httpd threads
tot_apache_threads="$( ps -ef | grep -ciE "httpd$" )"
verbosity=0
thresh_warn=
thresh_crit=
while test -n "${1}";
do
case "$1" in
--help|-h)
print_help
exit $STATE_OK
;;
--version|-V)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
-v | --verbose)
verbosity=$(( verbosity + 1 ))
shift
;;
-w | --warning | -c | --critical)
if [[ -z "$2" || "$2" = -* ]];
then
# Threshold not provided
echo "$PROGNAME: Option '$1' requires an argument."
print_usage
exit $STATE_UNKNOWN
elif [[ "$2" = +([0-9]) ]];
then
# Threshold is a number
thresh="$2"
# use for a percentage template, from reference 2
#elif [[ "$2" = +([0-9])% ]]; then
# # Threshold is a percentage
# thresh=$(( tot_mem * ${2%\%} / 100 ))
else
# Threshold is not a number or other valid input
echo "$PROGNAME: Threshold must be an integer."
print_usage
exit $STATE_UNKNOWN
fi
case "$1" in *-w*) thresh_warn=$thresh;; *) thresh_crit=$thresh;; esac
shift 2
;;
-?)
print_usage
exit $STATE_OK
;;
*)
echo "$PROGNAME: Invalid option '$1'"
print_usage
exit $STATE_UNKNOWN
;;
esac
done
if test -z "$thresh_warn" || test -z "$thresh_crit";
then
# One or both values were unspecified
echo "$PROGNAME: Threshold not set"
print_usage
exit $STATE_UNKNOWN
elif test "$thresh_crit" -le "$thresh_warn";
then
echo "$PROGNAME: Critical value must be greater than warning value."
print_usage
exit $STATE_UNKNOWN
fi
if test "$verbosity" -ge 2;
then
# Print debugging information
/bin/cat <<EOF
Debugging information:
Warning threshold: $thresh_warn
Critical threshold: $thresh_crit
Verbosity level: $verbosity
Apache threads: ${tot_apache_threads}
EOF
fi
if test "${tot_apache_threads}" -gt "${thresh_crit}";
then
# too many apache threads
echo "APACHE CRITICAL - $tot_apache_threads"
exit $STATE_CRITICAL
elif test "${tot_apache_threads}" -gt "${thresh_warn}";
then
echo "APACHE WARNING - $tot_apache_threads"
exit $STATE_WARNING
else
# fine
echo "APACHE OK - $tot_apache_threads"
exit $STATE_OK
fi

Walking through the code

I included the code above so it gets cached by web crawlers. You should look at the code on github so you get the proper indentations, and line numbers.

So the general format of this script I got from a local file, check_sensor, and Reference 1 below.

The utils.sh call provides nagios-related definitions, including the exit codes that you see used like $STATE_OK.

The shell script is pretty self-explanatory, really. The variables are initialized and the actual checked value is calculated (ps -ef | grep httpd). About half the script (lines 51-100) is parsing the parameters, which is a nice, simple solution if you have predictable and simplified input (like from nagios) and you don’t do the proper parameter parsing that includes -XvalueofXhere with no space between the flag and the value.

Some sanity checking for threshholds (102-113) and debugging information if given enough verbosity (115-125), and then the actual results are determined in 127-140.

Final thoughts

The hardest part of using this plugin is not writing, using, or deploying the shell script. The hardest part is getting the script to run. To use this check properly, you actually need to write a nagios checkcommand like so:
$USER1$/check_by_ssh -H $HOSTADDRESS$ -C "$USER1$/check_apache_threads -w $ARG1$ -c $ARG2$"
With the arguments as the numbers for your thresholds. I used the values 50 and 150 for warning and critical.

Any questions?

References

Weblinks

  1. General design http://www.kernel-panic.it/openbsd/nagios/nagios6.html
  2. case -w http://www.linuxquestions.org/questions/programming-9/ash-test-is-string-a-contained-in-string-b-671773/
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s