[CentOS] Remote system up/down monitoring tool?

Tue May 29 13:02:59 UTC 2007
Les Mikesell <lesmikesell at gmail.com>

Tony Mountifield wrote:
> I have a small number of boxes in different locations, and currently have
> a fairly crude cron job running on each, which does a ping of one or more
> of the other boxes, and if the ping fails, it emails me to say the other
> box might be down. It then emails me again the next time the other box
> appears to be up.
> 
> Of course, this can't distinguish between the remote box really being down
> and there being a network problem somewhere between the local and remote
> boxes.
> 
> I've been mulling over the idea of a more sophisticated scheme, where
> a number of boxes send each other messages, indicating not only their
> presence, but which other boxes they believe to be up. Then if a box
> goes down, the other boxes all see it has gone and agree that it really
> is down. However, if there is instead a network outage or routing flap
> so that a box is reachable from some places but not all, it might be
> possible to distinguish this case.
> 
> So my question is: does anyone know of an existing too that does this
> sort of thing?

It might be overkill for this case, but OpenNMS (http://www.opennms.org) 
has a concept of "path outage" to limit the notifications for things 
past a network link that is down.  Plus it can maintain graphs of any 
values you can obtain via snmp, like bandwidth and CPU use.

-- 
   Les Mikesell
    lesmikesell at gmail.com