I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up.
Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes.
I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case.
So my question is: does anyone know of an existing too that does this sort of thing?
Cheers Tony
Tony Mountifield wrote:
I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up.
Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes.
I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case.
So my question is: does anyone know of an existing too that does this sort of thing?
Cheers Tony
Tony,
Nagios, maybe.
Not familiar with it, but there has been a lot of talk on the list.
Bob...
Tony Mountifield wrote:
I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up.
Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes.
I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case.
So my question is: does anyone know of an existing too that does this sort of thing?
It might be overkill for this case, but OpenNMS (http://www.opennms.org) has a concept of "path outage" to limit the notifications for things past a network link that is down. Plus it can maintain graphs of any values you can obtain via snmp, like bandwidth and CPU use.
From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Tony Mountifield Sent: Tuesday, May 29, 2007 3:24 AM To: centos@centos.org Subject: [CentOS] Remote system up/down monitoring tool?
I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up.
Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes.
I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case.
So my question is: does anyone know of an existing too that does this sort of thing?
Cheers Tony
Check out Hobbit, supports many platforms (son of Big Brother):
http://sourceforge.net/projects/hobbit/
And a sample display from the server:
Frank M. Ramaekers Jr. Systems Programmer; MCP, MCP+I, MCSE & RHCE American Income Life Insurance Company Phone: (254) 761-6649 Fax: (254) 741-5777
---------------------------------------- This message contains information which is privileged and confidential and is solely for the use of the intended recipient. If you are not the intended recipient, be aware that any review, disclosure, copying, distribution, or use of the contents of this message is strictly prohibited. If you have received this in error, please destroy it immediately and notify us at PrivacyAct@ailife.com.
---------------------------------------- This message contains information which is privileged and confidential and is solely for the use of the intended recipient. If you are not the intended recipient, be aware that any review, disclosure, copying, distribution, or use of the contents of this message is strictly prohibited. If you have received this in error, please destroy it immediately and notify us at PrivacyAct@ailife.com.
On Tue, 29 May 2007, Frank M. Ramaekers wrote:
Check out Hobbit, supports many platforms (son of Big Brother):
http://sourceforge.net/projects/hobbit/
And a sample display from the server:
+1 for Hobbit.
Hobbit is Great!!
Regards,
On 5/29/07, Tony Mountifield tony@softins.clara.co.uk wrote:
So my question is: does anyone know of an existing too that does this sort of thing?
Perhaps running SmokePing on multiple systems?
http://oss.oetiker.ch/smokeping/
On May 29, 2007, at 4:24 PM, Tony Mountifield wrote:
I have a small number of boxes in different locations, and currently have a fairly crude cron job running on each, which does a ping of one or more of the other boxes, and if the ping fails, it emails me to say the other box might be down. It then emails me again the next time the other box appears to be up.
Of course, this can't distinguish between the remote box really being down and there being a network problem somewhere between the local and remote boxes.
I've been mulling over the idea of a more sophisticated scheme, where a number of boxes send each other messages, indicating not only their presence, but which other boxes they believe to be up. Then if a box goes down, the other boxes all see it has gone and agree that it really is down. However, if there is instead a network outage or routing flap so that a box is reachable from some places but not all, it might be possible to distinguish this case.
So my question is: does anyone know of an existing too that does this sort of thing?
Cheers Tony
Nagios does this... although it can be a bit much to configure. And what you're particularly looking for seems to be "dependency" support, ie If your gateway is down, you don't want to be notified that every server you have to connect through that gateway is also down.
A nice basic tutorial for Nagios I found is at:
http://www2.maxsworld.org/howtos/nagios.html
It doesn't delve on dependencies too much, but it shouldn't be that difficult.
dex
---------- Mobile: +63 (917) 5357191, Office: +63 (2) 6312718 i4 Asia Incorporated - http://www.i4asiacorp.com/
In article f3gnv9$k18$1@softins.clara.co.uk, Tony Mountifield tony@softins.clara.co.uk wrote:
So my question is: does anyone know of an existing too that does this sort of thing?
Thanks for all the responses. To summarise: I had several recommendations for Nagios, some for Hobbit, and one each for OpenNMS and SmokePing.
Time to go and investigate them!
Cheers Tony