[CentOS] 3Ware 9550SX and latency/system responsiveness
centos at web.org.uk
Thu Sep 13 12:17:01 UTC 2007
I thought I'd just share my experiences with this 3Ware card, and see
if anyone might have any suggestions.
System: Supermicro H8DA8 with 2 x Opteron 250 2.4GHz and 4GB RAM
installed. 9550SX-8LP hosting 4x Seagate ST3250820SV 250GB in a RAID
1 plus 2 hot spare config. The array is properly initialized, write
cache is on, as is queueing (and supported by the drives). StoreSave
set to Protection.
OS is CentOS 4.5 i386, minimal install, default partitioning as
suggested by the installer (ext3, small /boot on /dev/sda1, remainder
as / on LVM VolGroup with 2GB swap).
Firmware from 3Ware codeset 188.8.131.52 in use, firmware/driver details:
//serv1> /c0 show all
/c0 Driver Version = 2.26.05.007
/c0 Model = 9550SX-8LP
/c0 Memory Installed = 112MB
/c0 Firmware Version = FE9X 3.08.02.005
/c0 Bios Version = BE9X 3.08.00.002
/c0 Monitor Version = BL9X 3.01.00.006
I initially noticed something odd while installing 4.4, that writing
the inode tables took a longer time than I expected (I thought the
installer had frozen) and the system overall felt sluggish when doing
its first yum update, certainly more sluggish than I'd expect with a
comparatively powerful machine and hardware RAID 1.
I tried a few simple benchmarks (bonnie++, iozone, dd) and noticed up
to 8 pdflush commands hanging about in uninterruptible sleep when
writing to disk, along with kjournald and kswapd from time to time.
Loadave during writing climbed considerably (up to >12) with 'ls'
taking up to 30 seconds to give any output. I've tried CentOS 4.4,
4.5, RHEL AS 4 update 5 (just in case) and openSUSE 10.2 and they all
show the same symptoms.
Googling around makes me think that this may be related to queue
depth, nr_requests and possibly VM params (the latter from
https://bugzilla.redhat.com/show_bug.cgi?id=121434#c275). These are
the default settings:
/sys/block/sda/device/queue_depth = 254
/sys/block/sda/queue/nr_requests = 8192
/proc/sys/vm/dirty_expire_centisecs = 3000
/proc/sys/vm/dirty_ratio = 30
3Ware mentions elevator=deadline, blockdev --setra 16384 along with
nr_requests=512 in their performance tuning doc - these alone seem to
make no difference to the latency problem.
Setting dirty_expire_centisecs = 1000 and dirty_ratio = 5 does indeed
reduce the number of processes in 'b' state as reported by vmstat 1
during an iozone benchmark (./iozone -s 20480m -r 64 -i 0 -i 1 -t 1
-b filename.xls as per 3Ware's own tuning doc) but the problem is
obviously still there, just mitigated somewhat. The comparison graphs
are in a PDF here:
Incidentally, the vmstat 1 output was directed to an NFS-mounted disk
to avoid writing it to the arry during the actual testing.
I've tried eliminating LVM from the equation, going to ext2 rather
than ext3 and booting single-processor all to no useful effect. I've
also tried benchmarking with different blocksizes from 512B to 1M in
powers of 2 and the problem remains - many processes in
uninterruptible sleep blocking other IO. I'm about to start
downloading CentOS 5 to give it a go, and after that I might have to
resort to seeing if WinXP has the same issue.
My only real question is "where do I go from here?" I don't have
enough specific tuning knowledge to know what else to look at.
Thanks for any pointers.
More information about the CentOS