[CentOS] too much cpu system time with kernel 2.6.9-22.0.2.EL

Thu Mar 9 14:32:13 UTC 2006
David Mansfield <centos at dm.cobite.com>

On Wed, 2006-03-08 at 09:14 -0800, Troy Engel wrote:
> William (Bill) Triest wrote:
> > 
> > I am not a kernel hacker, so I'm hoping you will end up with a more 
> > informed opinion, but it seems to me the real limiting factor would be 
> > IO.  Disks are slow, and you claim you're writing to the disk.  IMHO, an 
> > 800 MHz system should be able to handle ssh and gzip far easier then the 
> > IO.
> 
> I concur -- gzip is a beast on the IO, this is most likely where your 
> perceived lag is. Install the "sysstat" RPM on your machine (and make 
> sure it's started) and let the 'sa' processes gather data for you.

I think you mistake uncompressing for compressing.  Compressing is
usually CPU bound on most machines, and this is no exception.  The
volume of data being written is small (it's compressed) and the data
being read is read from the network, not from disk.  For fun, create a
large tar file WITHOUT compression.  Next, gzip it, and see whether
vmstat shows 100% cpu utilization.  Bet it will.

The machine is writing only approx. 1MB/s and can sustain over 15MB/s
raw I/O througput (when unloaded).  The reason it is writing so little
is because the kernel is stealing cpu cycles AWAY from gzip (and ssh).

A quick test on another machine with the same cpu shows that raw gzip
speed on that cpu (with similar data and same gzip options) should be
about 5MB/s.  Take 50% cpu away because the kernel is stealing it.  Take
20% away for ssh (according to 'top'; it's doing decryption of all the
data) you're left with 30%, which comes quite close to my 1MB/s estimate
above.

Another datapoint: vmstat is reporting 100% cpu utilization (split 50/50
between user and system), 0% idle, 0% iowait.

The concern I have is that instead of burning cpu in user space (running
gzip) it's burning it in kernel space (and even worse, the kernel
function burning my cpu is default_idle!).

Good suggestion with sysstat though, just to gather some more
information about the problem.  Maybe something will turn up.

David