My Site
Categories
Nagios is my hero
Our company is growing fast. When I started in March, 2003, we were hosting our website with some company in Texas. I assumed the position of "Systems Administrator" (I knew Linux fairly well at the time, but I had no formal training in administration) when we moved from the hosting company to our first dedicated server colocated at Xmission. Since that time, we moved from a shared rack to our own cabinet, now housing 10 servers (one is currently powered off, because if we turn it on, we risk blowing our power circuit... yes, we are waiting on a power upgrade).

As the number of servers we're using adds up, so does the stress of having to manage it all. There are a lot of little things to keep track of to make sure everything is running smoothly, and doing so can sometimes be a lot of work, especially when my official title here is "Programmer", and not "Systems Administrator". Earlier this month, I toured another companies data center and I learned of something that, as of yesterday, is going to make my life a LOT easier: Nagios.

From their site: "Nagios is an Open Source host, service and network monitoring program." To me, that doesn't quite sum up the capabilities of this awesome system. Here's what we now use it for:
  • Monitoring the RAID setups on every server (we currently use 3ware, LSI, and software RAID setups; each one is monitored separately). If a drive/array goes bad, our admin staff gets an email, and I get a text message.
  • Monitoring disk usage on every server; you can set a warning and a critical threshold - for example, if disk usage goes over 80% you get a warning, if it goes over 90% you get a critical notice.
  • Monitoring system load (thresholds here are also completely customizable)
  • Monitoring MySQL replication status
  • Each server is PINGed periodically to make sure it is still up. If a machine goes down, a notification is sent out
  • Each web server is monitored to make sure i's receiving HTTP connections
  • Each server is monitored to make sure it's receiving SSH connections
  • Monitoring MySQL status (number of connections, slow queries, etc)
  • Each service on the mail server (POP, IMAP, SMTP, the mail queue) is monitored
For every item you monitor, you can specify when, how, and how often you get notified of events (if at all). You can also tell the system you want to be notified when something returns to normal.

Nagios uses plugins to monitor different items. It comes with a bunch to monitor commonly used systems and services (like http, ftp, disk usage, etc). There is also a community website where people post plugins that they have written here: http://www.nagiosexchange.org. If it's not included, and you can't find one on nagiosexchange, they are super easy to write in almost any language.

Not only that, Nagios gives you a nifty web interface to see what's going on. Here are some screenshots:

The Status Map



... I mostly included this screenshot because it looks cool. It looks tons cooler when you have hundreds of machines being monitored - if one of the machines has a problem, the green circle around it shows as red.

The Host Overview



As you can see in that last screenshot, we currenly have one critical notice. That's a degraded array that, even though I had written a script to check for such a situation, we wouldn't have known about without installing Nagios (for whatever reason, the script I wrote failed to notify us). That array is currently rebuilding thanks to this sweet system.

In conclusion, if you're feeling growing pains at your company when it comes to monitoring your servers, I highly recommend you give this a try. I'll warn you: It's not exactly hard to set up, but it is rather tedious. In my opinion, it's worth the time you'll spend.
Filed under: Linux, Administration
Comments:

From Gary on Dec. 3 @ 12:09 a.m. 2007

I was in a similar situation about a year ago -- our small company was growing fast, and we needed a reliable way to monitor our web application. I knew what Nagios was, and had seen it in use, but had never set it up before.

It didn't take long to set up, though, and now it's monitoring from three different locations -- one of the servers in the cabinet at the colo, a server in our office and the server at my house. Like yours, my Nagios setup watches RAID status, disk space, system load, and availability of HTTPS (on the web servers) and MySQL (on the database servers).

One recommendation: keep your Nagios configuration files in your version control system. (I would say the same about DNS zones, firewall configurations, httpd.conf, and a host of other files that aren't your application code, but this post was just about Nagios, so that might be off topic :) )
From Tristan Rhodes on Dec. 3 @ 8:22 a.m. 2007

Perhaps you picked the wrong hero? If you think that Nagios "Status Map" is cool, you will be blown away by the AJAX node map in Zenoss. You can see how it works by watching the flash video on their website.

Here are two recent articles about Zenoss:

http://www.enterprisenetworkingplanet.com/netos/article.php/3713531

http://www.computerworld.com.au/index.php/id;1396479161;pp;2;fp;4;fpid;78268965

From Tristan Rhodes on Dec. 3 @ 8:23 a.m. 2007

I will add that you can use Nagios plugins within Zenoss, if you find one that you really need.
From Adam Olsen on Dec. 4 @ 8:59 a.m. 2007

Cool, I'll take a look
From Greg Ryman on Dec. 6 @ 10:55 p.m. 2007

You crack head, I told you about Nagios 2 years ago! I mean really... All you need to do is ask.. :)
From Rick Harper on June 11 @ 12:41 p.m. 2009

i am looking for a NAGIOS developer for a 6 month project in Massachusetts.
can anyone refer me to anyone.

RICK HARPER.
Global American Staffing.
rharper@globalams.us

Add a comment:
captcha

Optional, for comment reply notifications
 
Note: If you enter your email address, you will be subscribed to this article and will recieve comment updates via email. This is the only thing your address will be used for. A link will be provided at the end of each email that will allow you to unsubscribe should you need to, or you can go to http://synicworld.com//unsubscribe to unsubscribe from any/all updates.