[CentOS] 3Ware 9550SX and latency/system responsiveness

Tue Oct 2 13:19:39 UTC 2007
Simon Banton <centos at web.org.uk>

At 12:30 +0200 2/10/07, matthias platzer wrote:
>
>What I did to work around them was basically switching to XFS for 
>everything except / (3ware say their cards are fast, but only on 
>XFS) AND using very low nr_requests for every blockdev on the 3ware 
>card.

Hi Matthias,

Thanks for this. In my CentOS 5 tests the nr_requests turned out by 
default to be 128, rather than the 8192 of CentOS 4.5. I'll have a go 
at reducing it still further.

>If you can, you could also try _not_ putting the system disks on the 
>3ware card, because additionally the 3ware driver/card gives writes 
>priority.

I've noticed that kicking off a simulataneous pair of dd reads and 
writes from/to the RAID 1 array indicates that very clearly - only 
with cfq as the elevator did reads get any kind of look-in. Sadly, 
I'm not able to separate the system disks off as there's no on-board 
SATA on the mboard nor any room for inboard disks, the original 
intention was to provide the resilience of hardware RAID 1 for the 
entire machine.

>People suggested the unresponsive system behaviour is because the 
>cpu hanging in iowait for writing and then reading the system 
>binaries won't happen till the writes are done, so the binaries 
>should be on another io path.

Yup, that certainly seems to be what's happening. Wish I had another io path...

>All this seem to be symptoms of a very complex issue consisting of 
>kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware 
>Combination.
>here is another link:
>http://bugzilla.kernel.org/show_bug.cgi?id=7372

Ouch - thanks for that link :-( Looks like I'm screwed big time.

S.