On May 29, 2007, at 4:24 PM, Tony Mountifield wrote:
I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up.
Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes.
I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case.
So my question is: does anyone know of an existing too that does this sort of thing?
Cheers Tony
Nagios does this... although it can be a bit much to configure. And what you're particularly looking for seems to be "dependency" support, ie If your gateway is down, you don't want to be notified that every server you have to connect through that gateway is also down.
A nice basic tutorial for Nagios I found is at:
http://www2.maxsworld.org/howtos/nagios.html
It doesn't delve on dependencies too much, but it shouldn't be that difficult.
dex
---------- Mobile: +63 (917) 5357191, Office: +63 (2) 6312718 i4 Asia Incorporated - http://www.i4asiacorp.com/