Monitoring services

List overview All Threads
Download

newer

older

hal, gnome.. custom mount options...

Network Situation

Kenneth Porter

27 Nov 2011 27 Nov '11

11:01 p.m.

What's available to remotely monitor services? What I'd like is something that can run scripts for each service to connect to a port and verify that it's up, and then send me an SMS message (phone text) to let me know which, if any, are down.

Also, does a script exist that checks all the services listed by chkconfig and reports those that should be up but are down?

Show replies by date

Corey Henderson

27 Nov 27 Nov

11:22 p.m.

On 11/27/2011 4:01 PM, Kenneth Porter wrote:

...

What's available to remotely monitor services? What I'd like is something that can run scripts for each service to connect to a port and verify that it's up, and then send me an SMS message (phone text) to let me know which, if any, are down.

Nagios ( http://www.nagios.org/ ) is one of the many pieces of software that can do this.

...

Also, does a script exist that checks all the services listed by chkconfig and reports those that should be up but are down?

None that I'm aware of. If you're going to write one, keep in mind that some init scripts list as "on" in chkconfig and run on boot but don't actually launch a process.

Kenneth Porter

11:33 p.m.

--On Sunday, November 27, 2011 4:22 PM -0700 Corey Henderson corman@cormander.com wrote:

...

None that I'm aware of. If you're going to write one, keep in mind that some init scripts list as "on" in chkconfig and run on boot but don't actually launch a process.

True. I was thinking that a script could run "chkconfig --list" to first find the processes that should be running, then run "service $servicename status" on each to look for ones that were down. Alas, I don't think there's a standard for the output, but the oddballs that don't match RHEL's conventions should be few.

Roberto Alvarado

28 Nov 28 Nov

12:11 p.m.

You can try zabbix

www.zabbix.com

On 11/27/2011 08:01 PM, Kenneth Porter wrote:

...

What's available to remotely monitor services? What I'd like is something that can run scripts for each service to connect to a port and verify that it's up, and then send me an SMS message (phone text) to let me know which, if any, are down.

Also, does a script exist that checks all the services listed by chkconfig and reports those that should be up but are down? _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Paul Heinlein

6:35 p.m.

On Sun, 27 Nov 2011, Kenneth Porter wrote:

...

What's available to remotely monitor services? What I'd like is something that can run scripts for each service to connect to a port and verify that it's up, and then send me an SMS message (phone text) to let me know which, if any, are down.

We use Nagios at work, as many others have suggested.

I've never configured Nagios to do SMS directly. We have text-message escalations by e-mail, since several cellular carriers allow e-mails to be sent directly to your phone:

* AT&T: <phone number>@txt.att.net * T-Mobile: <phone number>@tmomail.net * Verizon: <phone number>@vtext.com

Where <phone number> is your 10-digit number.

-- Paul Heinlein <> heinlein@madboa.com <> http://www.madboa.com/

Rajagopal Swaminathan

29 Nov 29 Nov

10:58 a.m.

Greetings,

On Tue, Nov 29, 2011 at 12:05 AM, Paul Heinlein heinlein@madboa.com wrote:

...

On Sun, 27 Nov 2011, Kenneth Porter wrote:

...
What's available to remotely monitor services?

I have deployed Zabbix successfully to remotely monitor about 240+ geographically distributed locations connected by ADSSL links (IOW, no fixed IP) for the second largest public transport corporations (next only to Germany) in India successfully.

Perhaps, you may consider that.

-- Regards, Rajagopal

me＠tdiehl.org

3:53 p.m.

On Tue, 29 Nov 2011, Rajagopal Swaminathan wrote:

...

Greetings,

On Tue, Nov 29, 2011 at 12:05 AM, Paul Heinlein heinlein@madboa.com wrote:

...
On Sun, 27 Nov 2011, Kenneth Porter wrote:

...
What's available to remotely monitor services?

I have deployed Zabbix successfully to remotely monitor about 240+ geographically distributed locations connected by ADSSL links (IOW, no fixed IP) for the second largest public transport corporations (next only to Germany) in India successfully.

Perhaps, you may consider that.

Another possibility is http://sourceforge.net/projects/xymon/

Regards,

-- Tom me@tdiehl.org Spamtrap address me123@tdiehl.org

Jon Detert

5:24 p.m.

did anyone mention https://www.icinga.org/ ? I'm a long-time nagios user, but just heard about it yesterday. It is a fork of nagios, has a more modern web interface, and nagios plugins are compatible with it. It looks/sounds good. Anyone have experience with it?

----- Original Message -----

...

From: me@tdiehl.org To: "CentOS mailing list" centos@centos.org Sent: Tuesday, November 29, 2011 9:53:54 AM Subject: Re: [CentOS] Monitoring services

On Tue, 29 Nov 2011, Rajagopal Swaminathan wrote:

...
Greetings,

On Tue, Nov 29, 2011 at 12:05 AM, Paul Heinlein heinlein@madboa.com wrote:

...
On Sun, 27 Nov 2011, Kenneth Porter wrote:

...
What's available to remotely monitor services?

I have deployed Zabbix successfully to remotely monitor about 240+ geographically distributed locations connected by ADSSL links (IOW, no fixed IP) for the second largest public transport corporations (next only to Germany) in India successfully.

Perhaps, you may consider that.

Another possibility is http://sourceforge.net/projects/xymon/

Regards,

-- Tom me@tdiehl.org Spamtrap address me123@tdiehl.org _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Alan McKay

13 Dec 13 Dec

3:28 p.m.

I am just trying out Zabbix and I have to say it sure is easy to set up (once you get beyond a few minor quirks). I'm pretty impressed so far with my evaluation.

-- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"

Karanbir Singh

3:43 p.m.

On 12/13/2011 03:28 PM, Alan McKay wrote:

...

I am just trying out Zabbix and I have to say it sure is easy to set up (once you get beyond a few minor quirks). I'm pretty impressed so far with my evaluation.

I've use zabbix quite extensively over the last 2 odd years ( we even use Zabbix to keep an eye on things inside .centos.org machines ). Its not bad ( hence why i continue to use it ). But man, I miss being able to automatically deliver nagios configs out of a central 'fact' system.

the zabbix api is getting there, but not quite there yet ( there is also zabcon, but that wont let you completely build from scratch a config base ). Most zabbix people will tell you that the xml import / export works for host and templates - dont buy that. It *does* work, but since you cant have host level overrides for specific template items or triggers or actions without getting into macros for the template ( and therefore, needing to manage it as a macro for every host that inherits that template... ) the xml management isnt nearly as nice as being able to do static configs.

But its easy to get going, its functional, reliable and you can tune it to almost any sort of a role you might need, proxies are trivial, HA is trivial. And as long as you stick with established ( or their delivered implementation patterns ) its almost trivial to deploy and manage.

- KB

Alan McKay

16 Dec 16 Dec

4:19 p.m.

OK, I've had a Zabbix and a Zenoss server running now for 2 or 3 days and would like to morph this thread into a discussion of what each of these systems can and cannot do.

At the base of what I see so far, Zabbix is only able to monitor devices that have the Zabbix agent on it - is that correct?

On the one hand I like having an agent on the remove device since it allows you to have functionality that is more purpose-driven to what we are trying to do. On the other hand, what above devices that cannot run the agent? e.g. monitoring switches and routers. Though to counter my own concern - those are the sorts of things that are either up or down anyway and I"m not sure that they can be "monitored" per-se outside of that. Sure you can graph their traffic and so forth, but is any monitoring software able to actually say "there is a potential problem with your router or switch"? Other than "your device is now down" which is pretty easy to figure out anyway without monitoring software since just about anything connected to it is going to start throwing alarms once it is down.

Zenoss seems to let you monitor anything via SNMP which may not necessarily be as purpose-driven as having an agent, but it does allow you to monitor just about anything under the sun since pretty much everything supports SNMP. On the upside, getting this ste up has forced me to do some reading on configuring net-snmp on Linux and I've gotten that working and it could turn out to be useful elsewhere even if I do not choose Zenoss

As for the dashboard and general web interface, configuring things, viewing things and so on, both of them seem to be pretty easy to set up and use. I find the Zabbix interface a little more useful, with better default graphs and so on.

But I'm still left wondering whether I should fall back to Nagios. One very nice thing about Nagios is that you can do some really fine-grained tests on systems to determine whether or not it is currently working. Like you can log in to an FTP server and test for a specific file or something like that. You are always testing from the outside which may have its downsides too, but it has a lot of upsides because that's how users view the boxes anyway - from the outside.

Do Zabbix or Zenoss allow for this sort of testing that Nagios has?

Incidentally I also looked at OpenNMS which has a live demo online - I don't like the dashboard and basic functionality as much as Zabbix or Zenoss. And since I did not set it up myself nor configure it, I cannot comment on that.

I am also looking at Icinga which is a fork of Nagios but seems to have gone in a very different direction after the fork. They have a live demo on their site as well. I have not dug much into this yet so cannot comment on how I like.

Thoughts form anyone on any of this?

-- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"

Les Mikesell

5:51 p.m.

On Fri, Dec 16, 2011 at 10:19 AM, Alan McKay alan.mckay@gmail.com wrote:

...

On the one hand I like having an agent on the remove device since it allows you to have functionality that is more purpose-driven to what we are trying to do. On the other hand, what above devices that cannot run the agent? e.g. monitoring switches and routers. Though to counter my own concern - those are the sorts of things that are either up or down anyway and I"m not sure that they can be "monitored" per-se outside of that. Sure you can graph their traffic and so forth, but is any monitoring software able to actually say "there is a potential problem with your router or switch"? Other than "your device is now down" which is pretty easy to figure out anyway without monitoring software since just about anything connected to it is going to start throwing alarms once it is down.

Yes, you can configure most managed devices to send snmp traps and/or syslog messages about problems to your monitoring receiver. And your monitor polling for snmp values can alarm on failures and thresholds exceeded in the values (like bandwidth percentage used, interface errors, interface drops, etc.).

...

Incidentally I also looked at OpenNMS which has a live demo online - I don't like the dashboard and basic functionality as much as Zabbix or Zenoss. And since I did not set it up myself nor configure it, I cannot comment on that.

Opennms starts with the assumption that you will be monitoring more things than you can usefully display, so what you see on the home screen will be mostly counts of systems with errors in each category with a drill-down to the actual node entries. This won't mean much if you don't customize the categories for your network. By default it will collect histories of a large number of snmp values for most common systems/devices and generate events/notifications for failures and thresholds. But, by default you have to go to a particular node and pick one or more things from its 'resource graph' list. If you want to see a group of graphs from different devices (like the bandwidth on several important router interfaces or the CPU load on a farm of servers) you can can arrange them on a 'Key SNMP Customized' (KSC) report page. These pages auto-refresh when viewed and often are the best way to watch something. It is fairly easy to install your own system since you can do a yum-based install.

...

Thoughts form anyone on any of this?

Network monitoring is not trivial no matter what tool you use. Pick something that you trust to scale to the proportions you will need so you don't do a lot of work and then hit a wall. And if you have a lot of systems, avoid anything that needs per-system configuration or agent installation.

-- Les Mikesell lesmikesell@gmail.com

Alan McKay

6:02 p.m.

...

...
Thoughts form anyone on any of this?

Network monitoring is not trivial no matter what tool you use. Pick something that you trust to scale to the proportions you will need so you don't do a lot of work and then hit a wall. And if you have a lot of systems, avoid anything that needs per-system configuration or agent installation.

Agreed. I'm definitely not looking for trivial - just trying to make sure I understand the strengths and weaknesses of each system to help me make the right decision. Because once I've made that decision, I have to live with it :-) Our environment is relatively small. About 80 servers that are mostly grouped into 3 compute clusters for the scientists I support. A few switches, and no routers under my direct control (though a few Linux boxes routing between NICs since some of the environment is on our own private LAN behind said Linux box, cut off from the Hospital's network)

cheers, -Alan

-- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"

Les Mikesell

6:22 p.m.

On Fri, Dec 16, 2011 at 12:02 PM, Alan McKay alan.mckay@gmail.com wrote:

...

...
...
Thoughts form anyone on any of this?

Network monitoring is not trivial no matter what tool you use. Pick something that you trust to scale to the proportions you will need so you don't do a lot of work and then hit a wall. And if you have a lot of systems, avoid anything that needs per-system configuration or agent installation.

Agreed. I'm definitely not looking for trivial - just trying to make sure I understand the strengths and weaknesses of each system to help me make the right decision. Because once I've made that decision, I have to live with it :-) Our environment is relatively small. About 80 servers that are mostly grouped into 3 compute clusters for the scientists I support. A few switches, and no routers under my direct control (though a few Linux boxes routing between NICs since some of the environment is on our own private LAN behind said Linux box, cut off from the Hospital's network)

You may not need 'direct' control of the routers - just read access for snmp to monitor them. And if the switches have snmp you can get per-interface traffic which will obviously match whatever is on the other end of the wire. Does the cluster software have its own close-coupled monitor like ganglia? One thing I haven't found in any of the frameworks I've seen that everybody is likely to need is a good concept of aggregates. That is, you will have some level of redundancy in fail-over sets and some level of group capacity in load-balanced sets. While you may want to be alerted about individual failures, what you really need to track is how close you are to capacity across the working group members - and nothing does that very well.

-- Les Mikesell lesmikesell@gmail.com

Lucian

9:28 p.m.

On 16 December 2011 16:19, Alan McKay alan.mckay@gmail.com wrote:

...

But I'm still left wondering whether I should fall back to Nagios.

If you're considering that then also have a look at Opsview: http://www.opsview.com/community/compare-opsview

Karanbir Singh

17 Dec 17 Dec

9:30 a.m.

On 12/16/2011 04:19 PM, Alan McKay wrote:

...

At the base of what I see so far, Zabbix is only able to monitor devices that have the Zabbix agent on it - is that correct?

not at all. Zabbix can do active and passive tests, it can even proxy them via relays and can aggregate results based on conditions across multiple resources ( including across different resources ).

...

graph their traffic and so forth, but is any monitoring software able to actually say "there is a potential problem with your router or switch"?

ofcourse, thats the most interesting part of monitoring. Knowing about service 'anomalies' is important.

...

Other than "your device is now down" which is pretty easy to figure out anyway without monitoring software since just about anything connected to it is going to start throwing alarms once it is down.

only if its very badly setup. With relationships in place, you should only see an alert for the real problem, and not the dozens of fallouts from that problem.

...

Do Zabbix or Zenoss allow for this sort of testing that Nagios has?

yes.

- KB

Alan McKay

1:25 p.m.

...

...
Do Zabbix or Zenoss allow for this sort of testing that Nagios has?

yes.

OK, thanks. I'll dig more into passive checks

-- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food"

Karanbir Singh

5 p.m.

On 12/17/2011 01:25 PM, Alan McKay wrote:

...

...
...
Do Zabbix or Zenoss allow for this sort of testing that Nagios has?

OK, thanks. I'll dig more into passive checks

I think its the active tests you are looking for ( tests run from the zabbix-server rather than on the agent ). Easy one to start with is the web tests, where the zabbix-server can go through a series of mouse-clicks and test for content on each 'step', resulting in a pass/fail.

quite handy to be able to throw those together in seconds. I remember having to use www::mechanize and webrat etc to write a httpclient in order to achieve this on nagios

- KB

Les Mikesell

5:26 p.m.

On Sat, Dec 17, 2011 at 11:00 AM, Karanbir Singh mail-lists@karan.org wrote:

...

...
...
...
Do Zabbix or Zenoss allow for this sort of testing that Nagios has?

OK, thanks. I'll dig more into passive checks

I think its the active tests you are looking for ( tests run from the zabbix-server rather than on the agent ). Easy one to start with is the web tests, where the zabbix-server can go through a series of mouse-clicks and test for content on each 'step', resulting in a pass/fail.

In OpenNMS that would be a 'page sequence monitor': http://www.opennms.org/wiki/Page_Sequence_Monitor_%28PSM%29_Setup

While a lot of the setup can be done in the web interface, this is one where you still have to edit xml config files.

-- Les Mikesell lesmikesell@gmail.com

5215

Age (days ago)

5235

Last active (days ago)

discuss@lists.centos.org

18 comments

11 participants

tags (0)

participants (11)

Alan McKay
Corey Henderson
Jon Detert
Karanbir Singh
Kenneth Porter
Les Mikesell
Lucian
me＠tdiehl.org
Paul Heinlein
Rajagopal Swaminathan
Roberto Alvarado