[CentOS] 3Ware 9550SX and latency/system responsiveness

Tue Oct 2 16:41:53 UTC 2007
Ross S. W. Walker <rwalker at medallion.com>

Simon Banton wrote:
> 
> >What is the recurring performance problem you are seeing?
> 
> Pretty much exactly the symptoms described in 
> http://bugzilla.kernel.org/show_bug.cgi?id=7372 relating to read 
> starvation under heavy write IO causing sluggish system response.
> 
> I recently graphed the blocks in/blocks out from vmstat 1 for the 
> same test using each of the four IO schedulers (see the PDF attached 
> to the article below):
> 
> http://community.novacaster.com/showarticle.pl?id=7492
> 
> The test was:
> 
> dd if=/dev/sda of=/dev/null bs=1M count=4096 &; sleep 5; dd 
> if=/dev/zero of=./4G bs=1M count=4096 &
> 
> Despite appearances, interactive responsiveness subjectively felt 
> better using deadline than cfq - but this is obviously an atypical 
> workload and so now I'm focusing on finishing building the machine 
> completely so I can try profiling the more typical patterns of 
> activity that it'll experience when in use.
> 
> I find myself wondering whether the fact that the array looks like a 
> single SCSI disk to the OS means that cfq is able to perform better 
> in terms of interleaving reads and writes to the card but that some 
> side effect of its work is causing the responsiveness issue at the 
> same time. Pure speculation on my part - this is way outside my 
> experience.
> 
> I'm also looking into trying an Areca card instead (avoiding LSI 
> because they're cited as having the same issue in the bugzilla 
> mentioned above).

If the performance issue is identical to the kernel bug mentioned
in the posting then the only real fix that was mentioned was to
switch to 32bit from 64bit or to down-rev your kernel, which on
CentOS means to go down to 4.5 from 5.0.

I'm trying to get confirmation that the culprit has been isolated,
but I have a suspicion that it lies in process scheduling on x86_64
and not in the io scheduler.

And while, yes the hardware RAID appears as a single disk to the io
scheduler the CFQ makes certain assumptions on a disk's performance
characteristics that are single-disk minded.

The CFQ is meant to favor reads over writes which is more important
for a single-user workstation then a multi-user server which should
handle these fairly while preventing total starvation of either,
which is what the deadline was designed to do.

So for a server I would use 'deadline' and a workstation I would use
'cfq'.

I myself am thinking of down-reving to CentOS 4.5 to avoid the x86_64
scheduling issue, but I keep holding out that the issue will be
uncovered upstream in time for 5.1...

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.