Tony Mountifield wrote: > I have a small number of boxes in different locations, and currently have > a fairly crude cron job running on each, which does a ping of one or more > of the other boxes, and if the ping fails, it emails me to say the other > box might be down. It then emails me again the next time the other box > appears to be up. > > Of course, this can't distinguish between the remote box really being down > and there being a network problem somewhere between the local and remote > boxes. > > I've been mulling over the idea of a more sophisticated scheme, where > a number of boxes send each other messages, indicating not only their > presence, but which other boxes they believe to be up. Then if a box > goes down, the other boxes all see it has gone and agree that it really > is down. However, if there is instead a network outage or routing flap > so that a box is reachable from some places but not all, it might be > possible to distinguish this case. > > So my question is: does anyone know of an existing too that does this > sort of thing? It might be overkill for this case, but OpenNMS (http://www.opennms.org) has a concept of "path outage" to limit the notifications for things past a network link that is down. Plus it can maintain graphs of any values you can obtain via snmp, like bandwidth and CPU use. -- Les Mikesell lesmikesell at gmail.com