[CentOS] Re: [OT] What is the best network monitoring tool?
centos at linuxpowered.net
Mon Oct 13 22:38:02 UTC 2008
Les Mikesell wrote:
> Can you be more specific about how snmp is wrong and what you do to get
> a more accurate value? Is it just that the snmp value needs to be
> scaled by the number of processors?
Seems like the SNMPD included in CentOS 5.x has improved somewhat
>From the FAQ
What do the CPU statistics mean - is this the load average?
No. Unfortunately, the original definition of the various CPU statistics
was a little vague. It referred to a "percentage", without specifying
what period this should be calculated over. It was therefore
implemented slightly differently on different architectures.
Recent releases includes "raw counters", which can be used to
calculate the percentage usage over any desired period. This is
the "right" way to handle things in the SNMP model. The original
flawed percentage objects should not be used, and will be removed
in a future release of the agent.
Note that this is different from the Unix load average, which is
available via the loadTable, and is supported on all architectures.
Older versions would basically spit out random values for CPU
usage. For about the past 5 years I have used scripts that run
out of cron, that run sar and parse the output and send the
results to a file, then configure SNMP to tail that file when
a particular OID is queried. This has given me really dependable
results over the years.
[root at us-cfe002:/home/monitor/stats]# tail -n 1 *
==> disk.usage <==
==> mem.usage <==
RAM_T:3950 RAM_F:2732 RAM_B:58 RAM_C:731 SWAP_T:8189 SWAP_U:0
==> sar.usage <==
USER:0.01 NICE:0.00 SYS:0.01 IO:0.00 FAULT:41.16 TCPSOCK:21
Last I checked as well the SNMP daemon didn't return cpu i/o
wait values, which is pretty handy to have.
Then I have a script that queries the data(along with other
data) and feeds it into cacti as a single set of results
(to be stored in 1 RRD file) which really helps cacti scale
[cacti at dc1-mon002:~/bin]$ ./linux-basics-net.pl us-cfe002 public
USER:0.01 NICE:0.00 SYS:0.02 IO:0.00 FAULT:61.78 TCPSOCK:21 RAM_T:3950
RAM_F:2732 RAM_B:58 RAM_C:731 SWAP_T:8189 SWAP_U:0 DISK_T:60707 DISK_U:9567
1MIN:0.00 5MIN:0.00 15MIN:0.00 E0_IN:747203652 E0_OUT:520021358 E1_IN:0
Unfortunately with every passing revision of sar it becomes
more and more difficult to parse, I really miss the version
from RHEL 3 days, that one was great, it had a special
human readable output option which has since been taken out
(it would spit out each stat on one line making it easy
More information about the CentOS