From: Kennedy Clark [mailto:hkclark@gmail.com]
Great thread. Thanks to all for their input!
Bryan asks a good question about whether we are looking for a full SNMP tool. Actually, we have some pretty big systems that already handle those functions. For this project, we are just looking for ping-type "is it up or down" information on a "subset of the overall network."
It looks like Jacob's recommendation for Mon could be spot on. But, that being said, OpenNMS and Zabbix look pretty cool and worth a look (possibly for other projects with more complex needs). Re a script, I was going to do something like that if I couldn't get anything here :-) but I was trying to avoid having to add in the "don't send and email every time it checks during a prolonged outage" (just send at the beginning and maybe the end).
I'll second the recommendation of Mon. It is simple and powerful with very few requirements. Mon.cgi will give you web-based access to the status information. It also has the advantage of being written in Perl, so making minor changes to the monitoring or alerting routines is very easy. The logic to only notify once (or twice, or three times) per outage is built-in. You can also change the notification routines based on the time of day and/or day of the week. For instance, I set mine so that it doesn't bother notifying me about scheduled daily reboots of some of the servers.
Bowie
I'll second the recommendation of Mon. It is simple and powerful with very few requirements. Mon.cgi will give you web-based access to the status information. It also has the advantage of being written in Perl, so making minor changes to the monitoring or alerting routines is very easy. The logic to only notify once (or twice, or three times) per outage is built-in. You can also change the notification routines based on the time of day and/or day of the week. For instance, I set mine so that it doesn't bother notifying me about scheduled daily reboots of some of the servers.
Bowie _______________________________________________
While we are on the topic of cool tools in the network management arena, I should probably put in a plug for SmokePing: http://people.ee.ethz.ch/~oetiker/webtools/smokeping/
It's a wonderful open source tool from Tobi Oetiker, the same guy who created the excellent MRTG and RRDtool (Tobi has been getting some excellent help on SmokePing from Niko Tyni lately). Note that SmokePing isn't designed to measure stuff like ifInOctets and IfOutOctets (volume of traffic on a port/interface) via SNMP like MRTG. Instead, it's fairly unique in that it focuses on packet loss, latency and jitter (variations in latency). It's the only tool of it's kind I know of. It's great for monitoring stuff on the Internet (where SNMP generally isn't an option). It usually works off ICMP pings, but there are a variety of nifty modules that do other measurements (DNS response time, remote "proxy pings" via the Cisco MIB, TCP sockets, etc.).
If you go to the site, the key to reading the graphs is: * The COLOR of the line shows PACKET LOSS. * The HEIGHT of the line shows MEDIAN LATENCY. * The SMOKE around the line shows JITTER (variation in latency).
It requires RRDtool, which you can "yum install rrdtool" right from Dag's site. The only tricky part of the installation/setup is getting speedycgi (a cool thing that speeds up CGI scripts without mod_perl complexity) working. I have not been able to get the RPMs at http://www.daemoninc.com/SpeedyCGI/CGI-SpeedyCGI-2.22/binaries/ to work with CentOS. If all else fails, you can bag speedycgi and live with slower CGI rendering by changing the first line of smokeping.cgi from "#!/usr/bin/speedy" to "#!/usr/bin/perl".
It also has some pretty cool alarm features (can watch packet loss and latency trends over time and alert you when certain "patterns" are found).
Thanks, Kennedy