[CentOS] Monitoring services

Fri Dec 16 17:51:02 UTC 2011

On Fri, Dec 16, 2011 at 10:19 AM, Alan McKay <alan.mckay at gmail.com> wrote:
>
> On the one hand I like having an agent on the remove device since it allows
> you to have functionality that is more purpose-driven to what we are trying
> to do.   On the other hand, what above devices that cannot run the agent?
>  e.g. monitoring switches and routers.  Though to counter my own concern -
> those are the sorts of things that are either up or down anyway and I"m not
> sure that they can be "monitored" per-se outside of that.  Sure you can
> graph their traffic and so forth, but is any monitoring software able to
> actually say "there is a potential problem with your router or switch"?
>  Other than "your device is now down" which is pretty easy to figure out
> anyway without monitoring software since just about anything connected to
> it is going to start throwing alarms once it is down.

Yes, you can configure most managed devices to send snmp traps and/or
syslog messages about problems to your monitoring receiver.  And your
monitor polling for snmp values can alarm on failures and thresholds
exceeded in the values  (like bandwidth percentage used, interface
errors, interface drops, etc.).

> Incidentally I also looked at OpenNMS which has a live demo online - I
> don't like the dashboard and basic functionality as much as Zabbix or
> Zenoss.  And since I did not set it up myself nor configure it, I cannot
> comment on that.

Opennms starts with the assumption that you will be monitoring more
things than you can usefully display, so what you see on the home
screen will be mostly counts of systems with errors in each category
with a drill-down to the actual node entries.  This won't mean much if
you don't customize the categories for your network.  By default it
will collect histories of a large number of snmp values for most
common systems/devices and generate events/notifications for failures
and thresholds.   But, by default you have to go to a particular node
and pick one or more things from its 'resource graph'  list.  If you
want to see a group of graphs from different devices (like the
bandwidth on several important router interfaces or the CPU load on a
farm of servers) you can can arrange them on a 'Key SNMP Customized'
(KSC) report page.  These pages auto-refresh when viewed and often are
the best way to watch something.   It is fairly easy to install your
own system since you can do a yum-based install.

> Thoughts form anyone on any of this?

Network monitoring is not trivial no matter what tool you use.  Pick
something that you trust to scale to the proportions you will need so
you don't do a lot of work and then hit a wall.   And if you have a
lot of systems, avoid anything that needs per-system configuration or
agent installation.

-- 
   Les Mikesell
     lesmikesell at gmail.com