[CentOS] too much cpu system time with kernel 2.6.9-22.0.2.EL

Thu Mar 9 19:36:22 UTC 2006
David Mansfield <centos at dm.cobite.com>

On Thu, 2006-03-09 at 13:24 -0500, Lamar Owen wrote:
> On Wednesday 08 March 2006 11:18, David Mansfield wrote:
> > The system is slow, relatively (866Mhz cpu) and the network is fast
> > (gigabit) so the limiting factor should be CPU.
> 
> What is the actual throughput on the GigE?  What KIND of GigE?  Which driver?
> 

Appears to be in excess of 10MBytes (not bits)/s.  e1000 driver.  Not
sure why so slow (I think the switch is crap), but suffice it to say
it's 10x the throughput I'm getting.  Used 'rsh' instead of 'ssh' to
eliminate cpu loading.  Full throttle used about 15% system time (i.e.
plenty of idle time remaining).  Copying 160 megs uses 1.6 seconds of
cpu time over about 14 seconds of clock-on-the-wall.

Putting the same exact data piped through 'gzip -3 -c' then to /dev/null
reduces throughput to 3.5MBytes/S.  Strangely, the system time increases
to a whopping 7.5 seconds even though:

- identical amount of data arriving over network (first ISO from
RHEL4 ;-)

- less data being written to /dev/null (it's compressed), besides -
it's /dev/null for chrissake.

> > However, at times, the system gets into a 'weird' state where instead of
> > using about 85% user/ 15% system, it goes to 50% user and 50% system.
> 
> > Now 50% system time for this load is ridiculous, and as I said before,
> > most of the time it is 85/15 and only occasionally get's 'stuck' in
> > 50/50.
> 
> > CPU: CPU with timer interrupt, speed 0 MHz (estimated)
> > Profiling through timer interrupt
> > samples  %        symbol name
> > 21117    36.5340  default_idle
> 
> > This is CRAZY! How can default_idle be sucking away cycles when the
> > system is (should be) cpubound?
> 
> > Can anyone explain this?
> 
> GigE DMA, perhaps?
> 
> DMA time would show as cpu idle, I would think.
> 

There are two kinds of 'idle'.  In my case, vmstat is showing NO idle
time, however, the top culprit in oprofile output is a kernel function
named default_idle.

> Bus contention between the GigE and the disk?  (Servers have split buses for a 
> reason...and PCIe is a switched architecture for a reason)

Sounds like a possibility.  Is there any way to measure bus contention?
Is there a way to 'tune' the bus characteristics to alleviate bus
contention?


David