[CentOS] too much cpu system time with kernel 2.6.9-22.0.2.EL

Wed Mar 8 16:18:36 UTC 2006
David Mansfield <centos at dm.cobite.com>

I'm running a server recently installed with centos 4.2.

It's running the kernel in the subject line, on an PIII 866Mhz with
512mb ram.

The system is running basically two processes:

1) ssh to remote system, receiving a stream of bytes

piped into

2) gzip the stream, write to disk file

The system is slow, relatively (866Mhz cpu) and the network is fast
(gigabit) so the limiting factor should be CPU.

And it is.

However, at times, the system gets into a 'weird' state where instead of
using about 85% user/ 15% system, it goes to 50% user and 50% system.

Now 50% system time for this load is ridiculous, and as I said before,
most of the time it is 85/15 and only occasionally get's 'stuck' in
50/50.

So I got oprofile running to find out what part of the kernel it is
stuck in, and here is the output of  opreport -l vmlinux (top scores
only):

CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %        symbol name
21117    36.5340  default_idle
3850      6.6608  __copy_from_user_ll
3108      5.3771  find_get_page
1707      2.9532  __copy_user_intel
1580      2.7335  __might_sleep
1356      2.3460  handle_IRQ_event
1323      2.2889  __copy_to_user_ll
1168      2.0207  __find_get_block_slow
1134      1.9619  __find_get_block
1093      1.8910  finish_task_switch
963       1.6661  bh_lru_install
638       1.1038  __wake_up
605       1.0467  __do_softirq


This is CRAZY! How can default_idle be sucking away cycles when the
system is (should be) cpubound?

Can anyone explain this?

David