[CentOS] Monitoring services

On Fri, Dec 16, 2011 at 12:02 PM, Alan McKay <alan.mckay at gmail.com> wrote:
>> > Thoughts form anyone on any of this?
>>
>> Network monitoring is not trivial no matter what tool you use.  Pick
>> something that you trust to scale to the proportions you will need so
>> you don't do a lot of work and then hit a wall.   And if you have a
>> lot of systems, avoid anything that needs per-system configuration or
>> agent installation.
>>
>
> Agreed.  I'm definitely not looking for trivial - just trying to make sure
> I understand the strengths and weaknesses of each system to help me make
> the right decision.  Because once I've made that decision, I have to live
> with it :-)   Our environment is relatively small.  About 80 servers that
> are mostly grouped into 3 compute clusters for the scientists I support.  A
> few switches, and no routers under my direct control (though a few Linux
> boxes routing between NICs since some of the environment is on our own
> private LAN behind said Linux box, cut off from the Hospital's network)

You may not need 'direct' control of the routers - just read access
for snmp to monitor them.  And if the switches have snmp you can get
per-interface traffic which will obviously match whatever is on the
other end of the wire.  Does the cluster software have its own
close-coupled monitor like ganglia?   One thing I haven't found in any
of the frameworks I've seen that everybody is likely to need is a good
concept of aggregates.  That is, you will have some level of
redundancy in fail-over sets and some level of group capacity in
load-balanced sets.  While you may want to be alerted about individual
failures, what you really need to track is how close you are to
capacity across the working group members - and nothing does that very
well.

-- 
   Les Mikesell
    lesmikesell at gmail.com