I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
top: PID USER PR NI VIRT RES SWAP SHR S %CPU %MEM TIME+ COMMAND 18037 thba 31 15 304m 242m 62m 44m R 245.3 4.1 148:58.72 ic
Also in the log of some programs I see this strange factor: CPU Seconds = 2632 Wall Clock Seconds = 1090
There are all single threaded programs, so it's not that more cores are being used.
[thba@fazant]$ uname -a Linux fazant 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:10:25 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[thba@fazant]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Core(TM) i7 CPU 940 @ 2.93GHz stepping : 4 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpimmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm bogomips : 5871.54 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: [8]
Any ideas?
Thanks, Theo
On Thu, Jun 4, 2009 at 1:37 PM, Theo Band theo.band@greenpeak.com wrote:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
Theo, by any chance are you using cumulative mode on top?
Hakan Koseoglu wrote:
On Thu, Jun 4, 2009 at 1:37 PM, Theo Band theo.band@greenpeak.com wrote:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
Theo, by any chance are you using cumulative mode on top?
Not that I was aware of. I did toggle Iris mode (what's that?) and then the cpu % goes down to about 62.2% (instead of 100% as the top line tells me). With cumulative mode on or off there is no difference in reading.
Theo
On Thu, Jun 4, 2009 at 1:37 PM, Theo Band theo.band@greenpeak.com wrote:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
Press 1 in top to see the per CPU info
top: PID USER PR NI VIRT RES SWAP SHR S %CPU %MEM TIME+ COMMAND 18037 thba 31 15 304m 242m 62m 44m R 245.3 4.1 148:58.72 ic
Also in the log of some programs I see this strange factor: CPU Seconds = 2632 Wall Clock Seconds = 1090
There are all single threaded programs, so it's not that more cores are being used.
[thba@fazant]$ uname -a Linux fazant 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:10:25 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[thba@fazant]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Core(TM) i7 CPU 940 @ 2.93GHz stepping : 4 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpimmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm bogomips : 5871.54 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: [8]
Any ideas?
Thanks, Theo
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Didi wrote:
On Thu, Jun 4, 2009 at 1:37 PM, Theo Band theo.band@greenpeak.com wrote:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
Press 1 in top to see the per CPU info
Tasks: 165 total, 3 running, 162 sleeping, 0 stopped, 0 zombie Cpu0 : 0.5%us, 0.8%sy, 19.6%ni, 79.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 1.2%sy, 98.6%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 13.0%ni, 87.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 1.1%us, 0.8%sy, 74.5%ni, 23.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 6097924k total, 5837764k used, 260160k free, 126288k buffers Swap: 4194296k total, 112k used, 4194184k free, 5119488k cached
PID USER PR NI VIRT RES SWAP SHR S %CPU %MEM TIME+ COMMAND 4742 made 39 15 117m 24m 93m 11m R 262.8 0.4 0:49.35 eldo_64.exe 18037 thba 34 15 340m 277m 63m 45m R 244.9 4.7 275:09.10 ic
This doesn't make a difference for the listed process.
Theo
on 6-4-2009 5:37 AM Theo Band spake the following:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
This one is easy. 4 cpu's, 100% total each, a maximum of 400%.
Since one core is at 100%, the other 145% is spread across the other 3 cores.
Scott Silva wrote:
on 6-4-2009 5:37 AM Theo Band spake the following:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
This one is easy. 4 cpu's, 100% total each, a maximum of 400%.
Since one core is at 100%, the other 145% is spread across the other 3 cores.
Is there any reasonable way to figure out the available CPU capacity from an SNMP monitoring tool? (You want to know if the reported >100% usage is a problem but you don't know anything else about the machine).
on 6-4-2009 2:14 PM Les Mikesell spake the following:
Scott Silva wrote:
on 6-4-2009 5:37 AM Theo Band spake the following:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
This one is easy. 4 cpu's, 100% total each, a maximum of 400%.
Since one core is at 100%, the other 145% is spread across the other 3 cores.
Is there any reasonable way to figure out the available CPU capacity from an SNMP monitoring tool? (You want to know if the reported >100% usage is a problem but you don't know anything else about the machine).
That can be difficult, because a machine in I/O wait can be slower than a machine at full CPU utilization. There is nothing technically wrong with a machine at 100% cpu. It is just means that the cpu is busy doing useful tasks, instead of sitting idle doing nothing. Where it is more critical is in a system that has occasional peaks of load. If the system is already busy, then these tasks will wait. Unless your system idles down and lowers the cpu freq. to save power, it isn't really saving anything by being idle. As long as the system gets its work done in a timely manner, then it isn't overloaded.
Scott Silva wrote:
on 6-4-2009 2:14 PM Les Mikesell spake the following:
Scott Silva wrote:
on 6-4-2009 5:37 AM Theo Band spake the following:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
This one is easy. 4 cpu's, 100% total each, a maximum of 400%.
Since one core is at 100%, the other 145% is spread across the other 3 cores.
Is there any reasonable way to figure out the available CPU capacity from an SNMP monitoring tool? (You want to know if the reported >100% usage is a problem but you don't know anything else about the machine).
That can be difficult, because a machine in I/O wait can be slower than a machine at full CPU utilization. There is nothing technically wrong with a machine at 100% cpu. It is just means that the cpu is busy doing useful tasks, instead of sitting idle doing nothing. Where it is more critical is in a system that has occasional peaks of load. If the system is already busy, then these tasks will wait. Unless your system idles down and lowers the cpu freq. to save power, it isn't really saving anything by being idle. As long as the system gets its work done in a timely manner, then it isn't overloaded.
SNMP does a reasonable job of reporting user/system/iowait. That's not so much the question as how to know how many CPU's some machine has so you can know whether 400% is all of your capacity. That is, how many CPUs it has, since it doesn't scale the percentage against the total for you.
Les Mikesell wrote:
Scott Silva wrote:
on 6-4-2009 2:14 PM Les Mikesell spake the following:
Scott Silva wrote:
on 6-4-2009 5:37 AM Theo Band spake the following:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
This one is easy. 4 cpu's, 100% total each, a maximum of 400%.
Since one core is at 100%, the other 145% is spread across the other 3 cores.
Is there any reasonable way to figure out the available CPU capacity from an SNMP monitoring tool? (You want to know if the reported >100% usage is a problem but you don't know anything else about the machine).
That can be difficult, because a machine in I/O wait can be slower than a machine at full CPU utilization. There is nothing technically wrong with a machine at 100% cpu. It is just means that the cpu is busy doing useful tasks, instead of sitting idle doing nothing. Where it is more critical is in a system that has occasional peaks of load. If the system is already busy, then these tasks will wait. Unless your system idles down and lowers the cpu freq. to save power, it isn't really saving anything by being idle. As long as the system gets its work done in a timely manner, then it isn't overloaded.
SNMP does a reasonable job of reporting user/system/iowait. That's not so much the question as how to know how many CPU's some machine has so you can know whether 400% is all of your capacity. That is, how many CPUs it has, since it doesn't scale the percentage against the total for you.
The internal CPU usage stuff that SNMP on linux provides is worse than worthless as it provides incorrect data in many cases, from the FAQ -
What about multi-processor systems? ----------------------------------
Sorry - the CPU statistics (both original percentages, and the newer raw statistics) both refer to the system as a whole. There is currently no way to access individual statistics for a particular processor (except on Solaris systems - see below).
Note that although the Host Resources table includes a hrProcessorTable, the current implementation suffers from two major flaws. Firstly, it doesn't currently recognise the presence of multiple processors, and simply assumes that all systems have precisely one CPU. Secondly, it doesn't calculate the hrProcessorLoad value correctly, and either returns a dummy value (based on the load average) or nothing at all.
As of net-snmp version 5.1, the Solaris operating system delivers some information about multiple CPU's such as speed and type.
Other than that, to monitor a multi-processor system, you're currently out of luck. We hope to address this in a future release of the agent.
---
I wrote a few scripts that get CPU usage and feed it into SNMP for retrieval for my cacti systems.
My company used to rely on the built in linux SNMP stuff for cpu usage(before I was hired) and they complained how it always seemed to max out at 50%(on a dual cpu system).
I've been using my own methods of CPU usage extraction using sar for about 6 years now and it works great, only downside is sar keeps being re-written and with every revision they make it harder and harder to parse it(RHEL 3 was the easiest by far).
Sample graph - http://portal.aphroland.org/~aphro/cacti-cpu.png
That particular cacti server is collecting roughly 20 million data points daily(14,500/minute). *Heavily* customized for higher scalability.
nate
nate wrote:
I wrote a few scripts that get CPU usage and feed it into SNMP for retrieval for my cacti systems.
My company used to rely on the built in linux SNMP stuff for cpu usage(before I was hired) and they complained how it always seemed to max out at 50%(on a dual cpu system).
I've been using my own methods of CPU usage extraction using sar for about 6 years now and it works great, only downside is sar keeps being re-written and with every revision they make it harder and harder to parse it(RHEL 3 was the easiest by far).
Sample graph - http://portal.aphroland.org/~aphro/cacti-cpu.png
That particular cacti server is collecting roughly 20 million data points daily(14,500/minute). *Heavily* customized for higher scalability.
Have you looked at OpenNMS for this? It's java with a postgresql backend for some data and jrobin (equivalent to rrd) for some. It needs a lot of RAM and has the same i/o bottleneck as anything else updating large numbers of rrd files but otherwise is pretty scalable and includes a lot more features than cacti.
Les Mikesell wrote:
nate wrote:
I wrote a few scripts that get CPU usage and feed it into SNMP for retrieval for my cacti systems.
My company used to rely on the built in linux SNMP stuff for cpu usage(before I was hired) and they complained how it always seemed to max out at 50%(on a dual cpu system).
I've been using my own methods of CPU usage extraction using sar for about 6 years now and it works great, only downside is sar keeps being re-written and with every revision they make it harder and harder to parse it(RHEL 3 was the easiest by far).
Sample graph - http://portal.aphroland.org/~aphro/cacti-cpu.png
That particular cacti server is collecting roughly 20 million data points daily(14,500/minute). *Heavily* customized for higher scalability.
Have you looked at OpenNMS for this? It's java with a postgresql backend for some data and jrobin (equivalent to rrd) for some. It needs a lot of RAM and has the same i/o bottleneck as anything else updating large numbers of rrd files but otherwise is pretty scalable and includes a lot more features than cacti.
Not recently, the main issue as you mention is I/O bottleneck. I've modified my cacti stuff so much that it minimizes the amount of I/O required. I average 9.2 data points per RRD, a lot of other systems(including one I wrote several years ago) typically put 1 data point per rrd, which makes for horrible scaling. The downside is that the amount of management overhead required to add a new system to cacti is obscene, but everything has it's trade offs I guess.
For monitoring our storage array I went even farther in that the only thing I'm using cacti for is to display the data, all data collection and RRD updates occur outside of cacti. Mainly because cacti's architecture wouldn't be able to scale gracefully to gather stats from our array, which would be represented by a single host, but have more than 6,000 points of data to collect per minute. With cacti's spine it distributes the load on a per-host basis, and a host can't span more than one thread. Also my system detects new things as they are added to the array automatically and creates/updates RRDs for them(though data isn't visible in cacti until they are manually added to the UI).
Even with 14,500 data point updates per minute, the amount of I/O required is trivial, takes about 20 seconds to do each run(much of that time is data collection). I used to host it on NFS, though the NFS cluster software wasn't optimized for the type of I/O rrdtool does, so it was quite a bit slower. Once I moved to iSCSI(same back end storage as the NFS cluster), performance went up 6x, and I/O wait went almost to 0.
At some point I'll get some time to check out other solutions again, for now at least for my needs cacti sucks the least (not denying that it does suck). And their road map doesn't inspire confidence long term. But as long as things are in RRDs they are portable.
I just wish that there was an easier way to provide a UI to rrdtool directly, I used to use rrdcgi several years ago though many people are spoiled by the cacti UI so that's one reason I've gone to it. I'm not a programmer, so my own abilities to provide a UI are really limited but I can make a pretty scalable back end system without much trouble(been using RRD for 6 years now).
nate
nate wrote:
That particular cacti server is collecting roughly 20 million data points daily(14,500/minute). *Heavily* customized for higher scalability.
Have you looked at OpenNMS for this? It's java with a postgresql backend for some data and jrobin (equivalent to rrd) for some. It needs a lot of RAM and has the same i/o bottleneck as anything else updating large numbers of rrd files but otherwise is pretty scalable and includes a lot more features than cacti.
Not recently, the main issue as you mention is I/O bottleneck. I've modified my cacti stuff so much that it minimizes the amount of I/O required. I average 9.2 data points per RRD, a lot of other systems(including one I wrote several years ago) typically put 1 data point per rrd, which makes for horrible scaling.
OpenNMS has a 'store-by-group' option that is supposed to help but I haven't tried it because you have to start over with the history. It would be as tunable as anything else as far as what is stored and how often.
The downside is that the amount of management overhead required to add a new system to cacti is obscene, but everything has it's trade offs I guess.
That's one of the beauties of opennms - it will autodiscover ranges and pretty much take care of itself except for grouping related machines for graph pages.
For monitoring our storage array I went even farther in that the only thing I'm using cacti for is to display the data, all data collection and RRD updates occur outside of cacti. Mainly because cacti's architecture wouldn't be able to scale gracefully to gather stats from our array, which would be represented by a single host, but have more than 6,000 points of data to collect per minute. With cacti's spine it distributes the load on a per-host basis, and a host can't span more than one thread. Also my system detects new things as they are added to the array automatically and creates/updates RRDs for them(though data isn't visible in cacti until they are manually added to the UI).
I think opennms defaults to re-probing for new things daily, but that would be tunable.
At some point I'll get some time to check out other solutions again, for now at least for my needs cacti sucks the least (not denying that it does suck). And their road map doesn't inspire confidence long term. But as long as things are in RRDs they are portable.
Opennms has some fairly serious ongoing work.
I just wish that there was an easier way to provide a UI to rrdtool directly, I used to use rrdcgi several years ago though many people are spoiled by the cacti UI so that's one reason I've gone to it. I'm not a programmer, so my own abilities to provide a UI are really limited but I can make a pretty scalable back end system without much trouble(been using RRD for 6 years now).
It has the option of using rrdtool or jrobin, with the tradeoffs that jrobin is in-process and uses a portable (to java) file format and rrdtool is an external process with non-portable files - but ones that other tools know something about. There's not much difference in their capabilities or output. The web UI is sort-of separate, running as jsp pages either under tomcat or embedded jetty. Part of the ongoing development is aimed at some sort of API around it to make it easier to customise the UI though. Plus it can collect WMI, JMX, and some other stuff that cacti can't and integrates with some other things like RT and rancid.
Scott Silva wrote:
on 6-4-2009 5:37 AM Theo Band spake the following:
I have a quad core CPU running Centos5.
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%.
This one is easy. 4 cpu's, 100% total each, a maximum of 400%.
Since one core is at 100%, the other 145% is spread across the other 3 cores.
Not quite. If I run 4 processes (4 times cpuburn-in) I see this:
Cpu(s): 50.2%us, 0.9%sy, 48.9%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st 14696 thba 25 0 2064 984 1080 308 R 244.9 0.0 0:40.57 cpuburn-in 14695 thba 25 0 2064 984 1080 308 R 243.2 0.0 0:43.21 cpuburn-in 14698 thba 25 0 2064 984 1080 308 R 242.9 0.0 0:34.47 cpuburn-in 14697 thba 25 0 2068 988 1080 308 R 162.0 0.0 0:25.86 cpuburn-in 14402 made 31 15 117m 24m 93m 11m R 40.9 0.4 1:11.56 eldo_64.exe 13746 kedo 39 15 696m 611m 85m 23m R 40.3 10.3 34:29.50 common_shell_ex
So in total 100% (first line) and counting the process % (244.9+243.2+242.9+162.0+40.9+40.3=974%). One of the cores runs three processes also totaling up to (162+40.9+40.3)=243% To me it looks like all values are just multiplied by 2.43 (400%x2.43=972%)
I did disable hyperthreading in the bios. The machine would otherwise show up with 8 CPU. Hyperthreading does benefit my application.
Theo
Hi,
On Thu, Jun 4, 2009 at 08:37, Theo Band theo.band@greenpeak.com wrote:
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%. There are all single threaded programs, so it's not that more cores are being used.
Are you sure?
You can type "H" in top to show separate threads, that way it would show up if you have more than one thread running in one of those programs.
HTH, Filipe
Filipe Brandenburger wrote:
Hi,
On Thu, Jun 4, 2009 at 08:37, Theo Band theo.band@greenpeak.com wrote:
When I use top, I see that running processes use 245% instead of 100%. If I use gkrellm, I just see one core being used 100%. There are all single threaded programs, so it's not that more cores are being used.
Are you sure?
You can type "H" in top to show separate threads, that way it would show up if you have more than one thread running in one of those programs
Yes I'm quite sure. For instance cpuburn on two machines, the only difference is hardware (two versus four cores). The H option does not show more threads:
Machine a (dual core Centos5 64 bit) Intel(R) Core(TM)2 Duo CPU E8400 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:10:25 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
top - 09:26:00 up 62 days, 21:43, 1 user, load average: 0.30, 0.16, 0.17 Tasks: 120 total, 3 running, 117 sleeping, 0 stopped, 0 zombie Cpu(s): 50.0%us, 0.2%sy, 0.0%ni, 49.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4050728k total, 2448800k used, 1601928k free, 405860k buffers Swap: 3538936k total, 22172k used, 3516764k free, 1762448k cached
PID USER PR NI VIRT RES SWAP SHR S %CPU %MEM TIME+ COMMAND 16916 thba 25 0 2068 988 1080 308 R 100.2 0.0 0:11.48 cpuburn-in
Machine b (quad core Centos5 64 bit) Intel(R) Core(TM) i7 CPU 940 @ 2.93GHz 2.6.18-128.1.6.el5 #1 SMP Wed Apr 1 09:10:25 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
top - 09:28:24 up 25 days, 40 min, 2 users, load average: 1.44, 1.83, 1.83 Tasks: 165 total, 3 running, 162 sleeping, 0 stopped, 0 zombie Cpu(s): 25.1%us, 0.5%sy, 25.0%ni, 49.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6097924k total, 4366540k used, 1731384k free, 152248k buffers Swap: 4194296k total, 112k used, 4194184k free, 3322344k cached
PID USER PR NI VIRT RES SWAP SHR S %CPU %MEM TIME+ COMMAND 13873 thba 25 0 2068 988 1080 308 R 243.8 0.0 0:26.97 cpuburn-in
The total cpu reported is about correct (for the second machine two jobs ran, one cpuburn-in=25% and one other with nice15=25%). It's just the individual process on this quad core machine that's way off. When I build the machine a couple of months ago, I did benchmarks and used top as well. It did show "normal" results, most of the time 100% for a process and sometimes a little more. So I guess an update in the mean time has changed something.
Theo