[CentOS] Remote system up/down monitoring tool?

Tue May 29 08:24:09 UTC 2007
Tony Mountifield <tony at softins.clara.co.uk>

I have a small number of boxes in different locations, and currently have
a fairly crude cron job running on each, which does a ping of one or more
of the other boxes, and if the ping fails, it emails me to say the other
box might be down. It then emails me again the next time the other box
appears to be up.

Of course, this can't distinguish between the remote box really being down
and there being a network problem somewhere between the local and remote
boxes.

I've been mulling over the idea of a more sophisticated scheme, where
a number of boxes send each other messages, indicating not only their
presence, but which other boxes they believe to be up. Then if a box
goes down, the other boxes all see it has gone and agree that it really
is down. However, if there is instead a network outage or routing flap
so that a box is reachable from some places but not all, it might be
possible to distinguish this case.

So my question is: does anyone know of an existing too that does this
sort of thing?

Cheers
Tony

-- 
Tony Mountifield
Work: tony at softins.co.uk - http://www.softins.co.uk
Play: tony at mountifield.org - http://tony.mountifield.org