nate wrote:
Last I checked as well the SNMP daemon didn't return cpu i/o wait values, which is pretty handy to have.
It must... I haven't waded through the details of how it does it, but a default OpenNMS install will collect and graph a CPU usage chart that stacks user/nice/wait/system/interrupts and seems accurate except that it is per-cpu (i.e. will go to 400% on a hyperthreaded dual-cpu box).
It also does a CPU statistics chart that does a line graph for the 1/5/15 minute values with the space under the line color-coded for %cpu utilization.
Then I have a script that queries the data(along with other data) and feeds it into cacti as a single set of results (to be stored in 1 RRD file) which really helps cacti scale
[cacti@dc1-mon002:~/bin]$ ./linux-basics-net.pl us-cfe002 public USER:0.01 NICE:0.00 SYS:0.02 IO:0.00 FAULT:61.78 TCPSOCK:21 RAM_T:3950 RAM_F:2732 RAM_B:58 RAM_C:731 SWAP_T:8189 SWAP_U:0 DISK_T:60707 DISK_U:9567 1MIN:0.00 5MIN:0.00 15MIN:0.00 E0_IN:747203652 E0_OUT:520021358 E1_IN:0 E1_OUT:0
And it does a system memory stats graph with color-coded: used/io_buff/shared/filesytem cache/available/swap/real values.
Unfortunately with every passing revision of sar it becomes more and more difficult to parse, I really miss the version from RHEL 3 days, that one was great, it had a special human readable output option which has since been taken out (it would spit out each stat on one line making it easy to parse).
You might want to look at opennms if you haven't already. Now that they have a yum repo it is very easy to install.