[CentOS-devel] disk i/o stalls with mptsas since upgrade to centos 5.4
Lennert Buytenhek
buytenh at wantstofly.org
Thu May 27 17:42:54 UTC 2010
On Tue, Mar 16, 2010 at 10:12:52AM -0400, Ross Walker wrote:
> > On two different machines, I've been experiencing disk I/O stalls
> > after
> > upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E
> > MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS
> > expander, with one machine having 16 1T WD disks hooked up to it, and
> > the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and
> > 2T
> > disks.
> >
> > When there's a disk I/O stall, all reads and writes to any disk behind
> > the SAS controller/expander just hang for a while (typically for
> > almost
> > exactly eight seconds), so not just the I/O to one particular disk
> > or a
> > subset of the disks. The disks on other (on-board SATA) controllers
> > still pass I/O requests when the SAS I/O stalls.
> >
> > I hacked up the attached (dirty) perl script to demonstrate this
> > effect
> > -- it will read /proc/diskstats in a tight loop, and keep track of
> > which request entered the request queue when, and when it completed,
> > and
> > it will WTF if a request took more than a second. (The same thing can
> > probably be done with blktrace, but I was lazy.) New requests get
> > submitted, but the pending ones fail to complete for a while, and then
> > they all complete at once.
> >
> > This happens on kernel-2.6.18-164.11.1.el5, while reverting to the
> > latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue
> > go
> > away again, i.e. no more stalls.
> >
> > It doesn't seem to matter whether the I/O load is high or not -- the
> > stalls happen even under almost no load at all.
> >
> > Before I dig into this further, has anyone experienced anything
> > similar?
> > A quick google search didn't come up with much.
>
> I would use iostat -x and see if there is a disk or group of disks
> that show abnormal service times and/or utilization.
I/O to all 16 disks stalls simultaneously, for 8 seconds at a time,
and 'iostat -k 1' shows zero kb/s read and written to each of the
disks (sdb - sdq) for the entire interval.
> Are there any errors in the logs?
Nope.
> How are the disks configured? Software raid?
Yes, two 8-disk RAID6 sets -- but that doesn't seem relevant.
> Is the adapter's firmware at the latest revision?
Not sure. I tried upgrading it but the vendor's firmware updater
won't let me (see other email for details).
> Was .128 kernel running stock drivers?
Yes.
> Is .164 kernel running stock drivers?
Yes.
> (maybe weak-updates from .128 kernel?)
Nope.
> What IO scheduler is this? Default CFQ?
Yes.
More information about the CentOS-devel
mailing list