Simon Banton wrote:
What is the recurring performance problem you are seeing?
Pretty much exactly the symptoms described in http://bugzilla.kernel.org/show_bug.cgi?id=7372 relating to read starvation under heavy write IO causing sluggish system response.
I recently graphed the blocks in/blocks out from vmstat 1 for the same test using each of the four IO schedulers (see the PDF attached to the article below):
http://community.novacaster.com/showarticle.pl?id=7492
The test was:
dd if=/dev/sda of=/dev/null bs=1M count=4096 &; sleep 5; dd if=/dev/zero of=./4G bs=1M count=4096 &
Despite appearances, interactive responsiveness subjectively felt better using deadline than cfq - but this is obviously an atypical workload and so now I'm focusing on finishing building the machine completely so I can try profiling the more typical patterns of activity that it'll experience when in use.
I find myself wondering whether the fact that the array looks like a single SCSI disk to the OS means that cfq is able to perform better in terms of interleaving reads and writes to the card but that some side effect of its work is causing the responsiveness issue at the same time. Pure speculation on my part - this is way outside my experience.
I'm also looking into trying an Areca card instead (avoiding LSI because they're cited as having the same issue in the bugzilla mentioned above).
If the performance issue is identical to the kernel bug mentioned in the posting then the only real fix that was mentioned was to switch to 32bit from 64bit or to down-rev your kernel, which on CentOS means to go down to 4.5 from 5.0.
I'm trying to get confirmation that the culprit has been isolated, but I have a suspicion that it lies in process scheduling on x86_64 and not in the io scheduler.
And while, yes the hardware RAID appears as a single disk to the io scheduler the CFQ makes certain assumptions on a disk's performance characteristics that are single-disk minded.
The CFQ is meant to favor reads over writes which is more important for a single-user workstation then a multi-user server which should handle these fairly while preventing total starvation of either, which is what the deadline was designed to do.
So for a server I would use 'deadline' and a workstation I would use 'cfq'.
I myself am thinking of down-reving to CentOS 4.5 to avoid the x86_64 scheduling issue, but I keep holding out that the issue will be uncovered upstream in time for 5.1...
-Ross
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.