I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up.
Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes.
I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case.
So my question is: does anyone know of an existing too that does this sort of thing?
Cheers Tony