[CentOS] 3Ware 9550SX and latency/system responsiveness

Tue Oct 2 13:39:09 UTC 2007
Ross S. W. Walker <rwalker at medallion.com>

Simon Banton wrote:
> 
> At 12:30 +0200 2/10/07, matthias platzer wrote:
> >
> >What I did to work around them was basically switching to XFS for 
> >everything except / (3ware say their cards are fast, but only on 
> >XFS) AND using very low nr_requests for every blockdev on the 3ware 
> >card.
> 
> Hi Matthias,
> 
> Thanks for this. In my CentOS 5 tests the nr_requests turned out by 
> default to be 128, rather than the 8192 of CentOS 4.5. I'll have a go 
> at reducing it still further.

Yes, the nr_requests should be a realistic reflection of what the
card itself can handle. If too high you will see io_waits stack up
high.

64 or 128 are good numbers, rarely have I seen a card that can handle
a depth larger then 128 (some older scsi cards did 256 I think).

> >If you can, you could also try _not_ putting the system disks on the 
> >3ware card, because additionally the 3ware driver/card gives writes 
> >priority.
> 
> I've noticed that kicking off a simulataneous pair of dd reads and 
> writes from/to the RAID 1 array indicates that very clearly - only 
> with cfq as the elevator did reads get any kind of look-in. Sadly, 
> I'm not able to separate the system disks off as there's no on-board 
> SATA on the mboard nor any room for inboard disks, the original 
> intention was to provide the resilience of hardware RAID 1 for the 
> entire machine.

CFQ will give reads a first to the line priority, but this can cause
all sorts of negative side effects for a RAID setup, workloads can be
such that a read operation is dependant on a write succeeding first,
but both were issued in an io overlapping scenario, you can see the
problem. If reads are getting starved with your workload you can try
'anticipatory', but if I remember you have BBU write-back cache
enabled and this should really limit the impact.

You will always see an impact though, that is just the nature of it.

Writes will beat reads, random will beat sequential, it's the rock,
paper, scissors game that all storage systems must play.

> >People suggested the unresponsive system behaviour is because the 
> >cpu hanging in iowait for writing and then reading the system 
> >binaries won't happen till the writes are done, so the binaries 
> >should be on another io path.
> 
> Yup, that certainly seems to be what's happening. Wish I had 
> another io path...

You can have another io path, just add more disks to the 3ware,
create another RAID array and locate your application data there.

> >All this seem to be symptoms of a very complex issue consisting of 
> >kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware 
> >Combination.
> >here is another link:
> >http://bugzilla.kernel.org/show_bug.cgi?id=7372
> 
> Ouch - thanks for that link :-( Looks like I'm screwed big time.

There is always a way out of any mess (without scraping the whole
project).

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.