[CentOS] Re: [OT] What is the best network monitoring tool?

Mon Oct 13 22:38:02 UTC 2008
nate <centos at linuxpowered.net>

Les Mikesell wrote:

> Can you be more specific about how snmp is wrong and what you do to get
> a more accurate value?   Is it just that the snmp value needs to be
> scaled by the number of processors?

Seems like the SNMPD included in CentOS 5.x has improved somewhat
vs v4.

>From the FAQ

What do the CPU statistics mean - is this the load average?
----------------------------------------------------------

  No.  Unfortunately, the original definition of the various CPU statistics
  was a little vague.  It referred to a "percentage", without specifying
  what period this should be calculated over.  It was therefore
  implemented slightly differently on different architectures.

    Recent releases includes "raw counters", which can be used to
  calculate the percentage usage over any desired period.  This is
  the "right" way to handle things in the SNMP model.  The original
  flawed percentage objects should not be used, and will be removed
  in a future release of the agent.

    Note that this is different from the Unix load average, which is
  available via the loadTable, and is supported on all architectures.

---

Older versions would basically spit out random values for CPU
usage. For about the past 5 years I have used scripts that run
out of cron, that run sar and parse the output and send the
results to a file, then configure SNMP to tail that file when
a particular OID is queried. This has given me really dependable
results over the years.

[root at us-cfe002:/home/monitor/stats]# tail -n 1 *
==> disk.usage <==
DISK_T:60707 DISK_U:9567

==> mem.usage <==
RAM_T:3950 RAM_F:2732 RAM_B:58 RAM_C:731 SWAP_T:8189 SWAP_U:0

==> sar.usage <==
USER:0.01 NICE:0.00 SYS:0.01 IO:0.00 FAULT:41.16 TCPSOCK:21

Last I checked as well the SNMP daemon didn't return cpu i/o
wait values, which is pretty handy to have.

Then I have a script that queries the data(along with other
data) and feeds it into cacti as a single set of results
(to be stored in 1 RRD file) which really helps cacti scale

[cacti at dc1-mon002:~/bin]$ ./linux-basics-net.pl us-cfe002 public
USER:0.01 NICE:0.00 SYS:0.02 IO:0.00 FAULT:61.78 TCPSOCK:21 RAM_T:3950
RAM_F:2732 RAM_B:58 RAM_C:731 SWAP_T:8189 SWAP_U:0 DISK_T:60707 DISK_U:9567
1MIN:0.00 5MIN:0.00 15MIN:0.00 E0_IN:747203652 E0_OUT:520021358 E1_IN:0
E1_OUT:0

Unfortunately with every passing revision of sar it becomes
more and more difficult to parse, I really miss the version
from RHEL 3 days, that one was great, it had a special
human readable output option which has since been taken out
(it would spit out each stat on one line making it easy
to parse).

nate