Nagios on Debian primer

Nagios is useful for monitoring pretty much any kind of network service, with a wide variety of community-made plugins to test pretty much anything you might need. However, its configuration and interface can be a little bit cryptic to initiates. Fortunately, Nagios is well-packaged in Debian and Ubuntu and provides a basic default configuration that is instructive to read and extend.

There’s a reason that a lot of system administrators turn into monitoring fanatics when tools like Nagios are available. The rapid feedback of things going wrong and being fixed and the pleasant sea of green when all your services are up can get addictive for any halfway dedicated administrator.

In this article I’ll walk you through installing a very simple monitoring setup on a Debian or Ubuntu server. We’ll assume you have two computers in your home network, a workstation on 192.168.1.1 and a server on 192.168.1.2, and that you maintain a web service of some sort on a remote server, for which I’ll use www.example.com. We’ll install a Nagios instance on the server that monitors both local services and the remote webserver, and emails you if it detects any problems.

For those not running a Debian-based GNU/Linux distribution or perhaps BSD, much of the configuration here will still apply, but the initial setup will probably be peculiar to your ports or packaging system unless you’re compiling from source.

Installing the packages

We’ll work on a freshly installed Debian Stable box as the server, which at the time of writing is version 6.0.3 “Squeeze”. If you don’t have it working already, you should start by installing Apache HTTPD:

# apt-get install apache2

Visit the server on http://192.168.1.1/ and check that you get the “It works!”, and that should be all you need. Note that by default this installation of Apache is not terribly secure, so you shouldn’t allow access to it from outside your private network until you’ve locked it down a bit, which is outside the scope of this article.

Next we’ll install the nagios3 package, which will include a default set of useful plugins, and a simple configuration. The list of packages it needs to support these is quite long so you may need to install a lot of dependencies, which apt-get will manage for you.

# apt-get install nagios3

The installation procedure will include requesting a password for the administration area; provide it with a suitable one. You may also get prompted to configure a workgroup for the samba-common package; don’t worry, you aren’t installing a samba service by doing this, it’s just information for the smbclient program in case you want to monitor any SMB/CIFS services.

That should provide you with a basic self-monitoring Nagios setup. Visit http://192.168.1.1/nagios3/ in your browser to verify this; use the username nagiosadmin and the password you gave during the install process. If you see something like the below, you’re in business; this is the Nagios web reporting and administration panel.

The Nagios administration area's front page

Default setup

To start with, click the Services link in the left menu. You should see something like the below, which is the monitoring for localhost and the service monitoring that the packager set up for you by default:

Default Nagios monitoring hosts and services

Note that on my system, monitoring for the already-existing HTTP and SSH daemons was automatically set up for me, along with the default checks for load average, user count, and process count. If any of these pass a threshold, they’ll turn yellow for WARNING, and red for CRITICAL states.

This is already somewhat useful, though a server monitoring itself is a bit problematic because of course it won’t be able to tell you if it goes completely down. So for the next step, we’re going to set up monitoring for the remote host www.example.com, which means firing up your favourite text editor to edit a few configuration files.

Default configuration

Nagios configuration is at first blush a bit complex, because monitoring setups need to be quite finely-tuned in order to be useful long term, particularly if you’re managing a large number of hosts. Take a look at the files in /etc/nagios3/conf.d.

# ls /etc/nagios3/conf.d
contacts_nagios2.cfg
extinfo_nagios2.cfg
generic-host_nagios2.cfg
generic-service_nagios2.cfg
hostgroups_nagios2.cfg
localhost_nagios2.cfg
services_nagios2.cfg
timeperiods_nagios2.cfg

You can actually arrange a Nagios configuration any way you like, including one big well-ordered file, but it makes some sense to break it up into sections if you can. In this case, the default setup includes the following files:

contacts_nagios2.cfg defines the people and groups of people who should receive notifications and alerts when Nagios detects problems or resolutions.
extinfo_nagios2.cfg makes some miscellaneous enhancements to other configurations, kept in a separate file for clarity.
generic-host_nagios2.cfg is Debian’s host template, defining a few common variables that you’re likely to want for most hosts, saving you repeating yourself when defining host definitions.
generic-service_nagios2.cfg is the same idea, but it’s a template service to monitor.
hostgroups_nagios2.cfg defines groups of hosts in case it’s valuable for you to monitor individual groups of hosts, which the Nagios admin allows you to do.
localhost_nagios2.cfg is where the monitoring for the localhost host we were just looking at is defined.
services_nagios2.cfg is where further services are defined that might be applied to groups.
timeperiods_nagios2.cfg defines periods of time for monitoring services; for example, you might want to get paged if a webserver dies 24/7, but you might not care as much about 5% packet loss on some international link at 2am on Saturday morning.

This isn’t my favourite method of organising Nagios configuration, but it’ll work fine for us. We’ll start by defining a remote host, and add services to it.

Testing services

First of all, let’s check we actually have connectivity to the host we’re monitoring from this server for both of the services we intend to check; ICMP ECHO (PING) and HTTP.

$ ping -n -c 1 www.example.com
PING www.example.com (192.0.43.10) 56(84) bytes of data.
64 bytes from 192.0.43.10: icmp_req=1 ttl=243 time=168 ms
--- www.example.com ping statistics --- 1 packets transmitted, 1 received,
0% packet loss, time 0ms rtt min/avg/max/mdev = 168.700/168.700/168.700/0.000 ms

$ wget www.example.com -O - | grep -i found
tom@novus:~$ wget www.example.com -O -
--2012-01-26 21:12:00--  http://www.example.com/
Resolving www.example.com... 192.0.43.10, 2001:500:88:200::10
Connecting to www.example.com|192.0.43.10|:80... connected.
HTTP request sent, awaiting response... 302 Found
...

All looks well, so we’ll go ahead and add the host and its services.

Defining the remote host

Write a new file in the /etc/nagios3/conf.d directory called www.example.com_nagios2.cfg, with the following contents:

define host {
    use        generic-host
    host_name  www.example.com
    address    www.example.com
}

The first stanza of localhost_nagios2.conf looks very similar to this, indeed, it uses the same host template, generic-host. All we need to do is define what to call the host, and where to find it.

However, in order to get it monitoring appropriate services, we might need to add it to one of the already existing groups. Open up hostgroups_nagios2.cfg, and look for the stanza that includes hostgroup_name http-servers. Add www.example.com to the group’s members, so that that stanza looks like this:

# A list of your web servers
define hostgroup {
    hostgroup_name  http-servers
    alias           HTTP servers
    members         localhost,www.example.com
}

With this done, you need to restart the Nagios process:

# service nagios3 restart

If that succeeds, you should notice under your Hosts and Services section is a new host called “www.example.com”, and it’s being monitored for HTTP. At first, it’ll be PENDING, but when the scheduled check runs, it should come back (hopefully!) as OK.

Example remote Nagios host and service

You can add other webhost monitoring the same way, by creating new hosts with appropriate addresses and adding them to the http-servers group.

Email notification

If you have a mail daemon running on your server like exim4 or postfix that’s capable of remote email delivery, you can edit contacts_nagios2.cfg and change root@localhost to your own email address. Restart Nagios again, and you should now receive an email alert each time a service goes up or down.

Further suggestions

This primer has barely scratched the surface of what you can do with Nagios. For further exercises, read the configuration files in a bit more depth, and see if you can get these working:

Set up an smtp-servers group, and add your ISP’s mail server to it so that you can monitor whether their SMTP process is up. Define the service for any host that’s in the group.
Add your private workstation to the hosts, but set it up so that it only notifies you if it goes down during working hours in your timezone.
[Advanced] Automatically post notifications to your system’s MOTD.

In a future article, I’ll be describing how to use the Map functionality of Nagios along with parent definitions to set up a very simple network weathermap.

I have now completed my first book, the Nagios Core Administration Cookbook, which takes readers beyond the basics of monitoring with Nagios Core. Give it a look if this sounds interesting!