[CentOS] 3Ware 9550SX and latency/system responsiveness

Fri Sep 21 18:12:47 UTC 2007
Simon Banton <centos at web.org.uk>

>>At 17:34 +0800 14/9/07, Feizhou wrote:
>>>.oh....do you have a BBU for your write cache on your 3ware board?
>>
>>Not installed, but the machine's on a UPS.
>
>Ugh. The 3ware code will not give OK then until the stuff has hit disk.

Having now installed BBUs, it's made no difference to the underlying 
responsiveness problem I'm afraid.

With ports 2 and 3 now configured as RAID 0, with ext3 filesystem and 
mounted on /mnt/raidtest, running this bonnie++ command:

bonnie++ -m RA-256_NR-8192 -n 0 -u 0 -r 4096 -s 20480 -f -b -d /mnt/raidtest

(RA- and NR- relate to kernel params for readahead and nr_requests 
respectively - the values above are Centos post-installation defaults)

...causes load to climb:

16:36:12 up 13 min,  2 users,  load average: 8.77, 4.78, 1.98

... and uninterruptible processes:

  ps ax | grep D
   PID TTY      STAT   TIME COMMAND
    59 ?        D      0:03 [kswapd0]
  2159 ?        D      0:01 [kjournald]
  2923 ?        Ds     0:00 syslogd -m 0
  4155 ?        D      0:00 [pdflush]
  4175 ?        D      0:00 [pdflush]
  4192 ?        D      0:00 [pdflush]
  4193 ?        D      0:00 [pdflush]
  4197 ?        D      0:00 [pdflush]
  4199 ?        D      0:00 [pdflush]
  4201 pts/1    R+     0:00 grep D

... plus an Out of Memory kill of sshd. Second time around (logged in 
on the console rather than over ssh), it's just the same except it's 
hald that happens to get clobbered instead.

Now that the presence or otherwise of a BBU has been ruled out along 
with OS, 3ware recommended kernel param tweaks, RAID level, LVM, slot 
speed, different but identical-spec hardware (both machine and card), 
what's left to try?

I see there's a new firmware version out today (3ware codeset 9.4.1.3 
- driver's still at 2.26.05.007 but the fw's updated to from 
3.08.02.005 to 3.08.02.007), so I guess I'll update it and push the 
whole thing back up the hill for another go.

If there's anyone out there with a 9550SX and a two-disk RAID 1 or 
RAID 0 config on CentOS 4.5 who can give the above bonnie++ benchmark 
a go (params adjusted for their own installed RAM - I'm benchmarking 
using 5x my installed amount) and let me know if they also have the 
same responsiveness problem or not, I'd seriously appreciate it.

S.