Les Mikesell wrote:
Scott Silva wrote:
on 6-4-2009 2:14 PM Les Mikesell spake the following:
Scott Silva wrote:
on 6-4-2009 5:37 AM Theo Band spake the following:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
This one is easy. 4 cpu's, 100% total each, a maximum of 400%.
Since one core is at 100%, the other 145% is spread across the other 3 cores.
Is there any reasonable way to figure out the available CPU capacity from an SNMP monitoring tool? (You want to know if the reported >100% usage is a problem but you don't know anything else about the machine).
That can be difficult, because a machine in I/O wait can be slower than a machine at full CPU utilization. There is nothing technically wrong with a machine at 100% cpu. It is just means that the cpu is busy doing useful tasks, instead of sitting idle doing nothing. Where it is more critical is in a system that has occasional peaks of load. If the system is already busy, then these tasks will wait. Unless your system idles down and lowers the cpu freq. to save power, it isn't really saving anything by being idle. As long as the system gets its work done in a timely manner, then it isn't overloaded.
SNMP does a reasonable job of reporting user/system/iowait. That's not so much the question as how to know how many CPU's some machine has so you can know whether 400% is all of your capacity. That is, how many CPUs it has, since it doesn't scale the percentage against the total for you.
The internal CPU usage stuff that SNMP on linux provides is worse than worthless as it provides incorrect data in many cases, from the FAQ -
What about multi-processor systems? ----------------------------------
Sorry - the CPU statistics (both original percentages, and the newer raw statistics) both refer to the system as a whole. There is currently no way to access individual statistics for a particular processor (except on Solaris systems - see below).
Note that although the Host Resources table includes a hrProcessorTable, the current implementation suffers from two major flaws. Firstly, it doesn't currently recognise the presence of multiple processors, and simply assumes that all systems have precisely one CPU. Secondly, it doesn't calculate the hrProcessorLoad value correctly, and either returns a dummy value (based on the load average) or nothing at all.
As of net-snmp version 5.1, the Solaris operating system delivers some information about multiple CPU's such as speed and type.
Other than that, to monitor a multi-processor system, you're currently out of luck. We hope to address this in a future release of the agent.
---
I wrote a few scripts that get CPU usage and feed it into SNMP for retrieval for my cacti systems.
My company used to rely on the built in linux SNMP stuff for cpu usage(before I was hired) and they complained how it always seemed to max out at 50%(on a dual cpu system).
I've been using my own methods of CPU usage extraction using sar for about 6 years now and it works great, only downside is sar keeps being re-written and with every revision they make it harder and harder to parse it(RHEL 3 was the easiest by far).
Sample graph - http://portal.aphroland.org/~aphro/cacti-cpu.png
That particular cacti server is collecting roughly 20 million data points daily(14,500/minute). *Heavily* customized for higher scalability.
nate