On Tue, Mar 16, 2010 at 08:43:26AM +0100, Lennert Buytenhek wrote:
On two different machines, I've been experiencing disk I/O stalls after upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS expander, with one machine having 16 1T WD disks hooked up to it, and the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T disks.
When there's a disk I/O stall, all reads and writes to any disk behind the SAS controller/expander just hang for a while (typically for almost exactly eight seconds), so not just the I/O to one particular disk or a subset of the disks. The disks on other (on-board SATA) controllers still pass I/O requests when the SAS I/O stalls.
FWIW, on the first machine mentioned above, I upgraded the system BIOS, mptsas controller option ROM, and kernel (to the CentOS 5.5 kernel) all in one go (in an attempt to minimise downtime), and the problem has so far (after ~1 hour of I/O) not resurfaced yet.
Since this is a Supermicro i7 board and the second machine mentioned above has a totally different board, I suspect that the system BIOS upgrade will not have made a difference. I'll try to upgrade the second machine to the CentOS 5.5 kernel soonish and see if that by itself makes the problem go away -- if not, I'll try upgrading the option ROM on that machine's mptsas controller as well.
(I tried upgrading the SAS controller's firmware as well, but the LSI mpt tool refuses to do that, as it complains that the Product ID on the controller doesn't match "SAS3442E" which is apparently what it expected to see.) (This is a Supermicro AOC-USAS-L8i, and the firmware update files came straight from supermicro's ftp site.)