I'm running a server recently installed with centos 4.2.
It's running the kernel in the subject line, on an PIII 866Mhz with 512mb ram.
The system is running basically two processes:
1) ssh to remote system, receiving a stream of bytes
piped into
2) gzip the stream, write to disk file
The system is slow, relatively (866Mhz cpu) and the network is fast (gigabit) so the limiting factor should be CPU.
And it is.
However, at times, the system gets into a 'weird' state where instead of using about 85% user/ 15% system, it goes to 50% user and 50% system.
Now 50% system time for this load is ridiculous, and as I said before, most of the time it is 85/15 and only occasionally get's 'stuck' in 50/50.
So I got oprofile running to find out what part of the kernel it is stuck in, and here is the output of opreport -l vmlinux (top scores only):
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % symbol name 21117 36.5340 default_idle 3850 6.6608 __copy_from_user_ll 3108 5.3771 find_get_page 1707 2.9532 __copy_user_intel 1580 2.7335 __might_sleep 1356 2.3460 handle_IRQ_event 1323 2.2889 __copy_to_user_ll 1168 2.0207 __find_get_block_slow 1134 1.9619 __find_get_block 1093 1.8910 finish_task_switch 963 1.6661 bh_lru_install 638 1.1038 __wake_up 605 1.0467 __do_softirq
This is CRAZY! How can default_idle be sucking away cycles when the system is (should be) cpubound?
Can anyone explain this?
David
David Mansfield wrote:
I'm running a server recently installed with centos 4.2.
It's running the kernel in the subject line, on an PIII 866Mhz with 512mb ram.
The system is running basically two processes:
- ssh to remote system, receiving a stream of bytes
piped into
- gzip the stream, write to disk file
The system is slow, relatively (866Mhz cpu) and the network is fast (gigabit) so the limiting factor should be CPU.
David,
I am not a kernel hacker, so I'm hoping you will end up with a more informed opinion, but it seems to me the real limiting factor would be IO. Disks are slow, and you claim you're writing to the disk. IMHO, an 800 MHz system should be able to handle ssh and gzip far easier then the IO.
--Bill
William (Bill) Triest wrote:
I am not a kernel hacker, so I'm hoping you will end up with a more informed opinion, but it seems to me the real limiting factor would be IO. Disks are slow, and you claim you're writing to the disk. IMHO, an 800 MHz system should be able to handle ssh and gzip far easier then the IO.
I concur -- gzip is a beast on the IO, this is most likely where your perceived lag is. Install the "sysstat" RPM on your machine (and make sure it's started) and let the 'sa' processes gather data for you.
Then, use iostat and 'sar -P ALL' to examine the relative IO alongside the CPU usage and you'll have a much clearer picture. It's possible you're running out of ram during the gzip (hey, who knows!) and are hitting your swap, which could be a killer.
-te
On Wed, 2006-03-08 at 09:14 -0800, Troy Engel wrote:
William (Bill) Triest wrote:
I am not a kernel hacker, so I'm hoping you will end up with a more informed opinion, but it seems to me the real limiting factor would be IO. Disks are slow, and you claim you're writing to the disk. IMHO, an 800 MHz system should be able to handle ssh and gzip far easier then the IO.
I concur -- gzip is a beast on the IO, this is most likely where your perceived lag is. Install the "sysstat" RPM on your machine (and make sure it's started) and let the 'sa' processes gather data for you.
I think you mistake uncompressing for compressing. Compressing is usually CPU bound on most machines, and this is no exception. The volume of data being written is small (it's compressed) and the data being read is read from the network, not from disk. For fun, create a large tar file WITHOUT compression. Next, gzip it, and see whether vmstat shows 100% cpu utilization. Bet it will.
The machine is writing only approx. 1MB/s and can sustain over 15MB/s raw I/O througput (when unloaded). The reason it is writing so little is because the kernel is stealing cpu cycles AWAY from gzip (and ssh).
A quick test on another machine with the same cpu shows that raw gzip speed on that cpu (with similar data and same gzip options) should be about 5MB/s. Take 50% cpu away because the kernel is stealing it. Take 20% away for ssh (according to 'top'; it's doing decryption of all the data) you're left with 30%, which comes quite close to my 1MB/s estimate above.
Another datapoint: vmstat is reporting 100% cpu utilization (split 50/50 between user and system), 0% idle, 0% iowait.
The concern I have is that instead of burning cpu in user space (running gzip) it's burning it in kernel space (and even worse, the kernel function burning my cpu is default_idle!).
Good suggestion with sysstat though, just to gather some more information about the problem. Maybe something will turn up.
David
On Wednesday 08 March 2006 11:18, David Mansfield wrote:
The system is slow, relatively (866Mhz cpu) and the network is fast (gigabit) so the limiting factor should be CPU.
What is the actual throughput on the GigE? What KIND of GigE? Which driver?
However, at times, the system gets into a 'weird' state where instead of using about 85% user/ 15% system, it goes to 50% user and 50% system.
Now 50% system time for this load is ridiculous, and as I said before, most of the time it is 85/15 and only occasionally get's 'stuck' in 50/50.
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % symbol name 21117 36.5340 default_idle
This is CRAZY! How can default_idle be sucking away cycles when the system is (should be) cpubound?
Can anyone explain this?
GigE DMA, perhaps?
DMA time would show as cpu idle, I would think.
Bus contention between the GigE and the disk? (Servers have split buses for a reason...and PCIe is a switched architecture for a reason)
On Thu, 2006-03-09 at 13:24 -0500, Lamar Owen wrote:
On Wednesday 08 March 2006 11:18, David Mansfield wrote:
The system is slow, relatively (866Mhz cpu) and the network is fast (gigabit) so the limiting factor should be CPU.
What is the actual throughput on the GigE? What KIND of GigE? Which driver?
Appears to be in excess of 10MBytes (not bits)/s. e1000 driver. Not sure why so slow (I think the switch is crap), but suffice it to say it's 10x the throughput I'm getting. Used 'rsh' instead of 'ssh' to eliminate cpu loading. Full throttle used about 15% system time (i.e. plenty of idle time remaining). Copying 160 megs uses 1.6 seconds of cpu time over about 14 seconds of clock-on-the-wall.
Putting the same exact data piped through 'gzip -3 -c' then to /dev/null reduces throughput to 3.5MBytes/S. Strangely, the system time increases to a whopping 7.5 seconds even though:
- identical amount of data arriving over network (first ISO from RHEL4 ;-)
- less data being written to /dev/null (it's compressed), besides - it's /dev/null for chrissake.
However, at times, the system gets into a 'weird' state where instead of using about 85% user/ 15% system, it goes to 50% user and 50% system.
Now 50% system time for this load is ridiculous, and as I said before, most of the time it is 85/15 and only occasionally get's 'stuck' in 50/50.
CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % symbol name 21117 36.5340 default_idle
This is CRAZY! How can default_idle be sucking away cycles when the system is (should be) cpubound?
Can anyone explain this?
GigE DMA, perhaps?
DMA time would show as cpu idle, I would think.
There are two kinds of 'idle'. In my case, vmstat is showing NO idle time, however, the top culprit in oprofile output is a kernel function named default_idle.
Bus contention between the GigE and the disk? (Servers have split buses for a reason...and PCIe is a switched architecture for a reason)
Sounds like a possibility. Is there any way to measure bus contention? Is there a way to 'tune' the bus characteristics to alleviate bus contention?
David