At 12:30 +0200 2/10/07, matthias platzer wrote:
What I did to work around them was basically switching to XFS for everything except / (3ware say their cards are fast, but only on XFS) AND using very low nr_requests for every blockdev on the 3ware card.
Hi Matthias,
Thanks for this. In my CentOS 5 tests the nr_requests turned out by default to be 128, rather than the 8192 of CentOS 4.5. I'll have a go at reducing it still further.
If you can, you could also try _not_ putting the system disks on the 3ware card, because additionally the 3ware driver/card gives writes priority.
I've noticed that kicking off a simulataneous pair of dd reads and writes from/to the RAID 1 array indicates that very clearly - only with cfq as the elevator did reads get any kind of look-in. Sadly, I'm not able to separate the system disks off as there's no on-board SATA on the mboard nor any room for inboard disks, the original intention was to provide the resilience of hardware RAID 1 for the entire machine.
People suggested the unresponsive system behaviour is because the cpu hanging in iowait for writing and then reading the system binaries won't happen till the writes are done, so the binaries should be on another io path.
Yup, that certainly seems to be what's happening. Wish I had another io path...
All this seem to be symptoms of a very complex issue consisting of kernel bugs/bad drivers/... and they seem to be worst on a AMD/3ware Combination. here is another link: http://bugzilla.kernel.org/show_bug.cgi?id=7372
Ouch - thanks for that link :-( Looks like I'm screwed big time.
S.