Hi!
On two different machines, I've been experiencing disk I/O stalls after upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS expander, with one machine having 16 1T WD disks hooked up to it, and the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T disks.
When there's a disk I/O stall, all reads and writes to any disk behind the SAS controller/expander just hang for a while (typically for almost exactly eight seconds), so not just the I/O to one particular disk or a subset of the disks. The disks on other (on-board SATA) controllers still pass I/O requests when the SAS I/O stalls.
I hacked up the attached (dirty) perl script to demonstrate this effect -- it will read /proc/diskstats in a tight loop, and keep track of which request entered the request queue when, and when it completed, and it will WTF if a request took more than a second. (The same thing can probably be done with blktrace, but I was lazy.) New requests get submitted, but the pending ones fail to complete for a while, and then they all complete at once.
This happens on kernel-2.6.18-164.11.1.el5, while reverting to the latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue go away again, i.e. no more stalls.
It doesn't seem to matter whether the I/O load is high or not -- the stalls happen even under almost no load at all.
Before I dig into this further, has anyone experienced anything similar? A quick google search didn't come up with much.
thanks, Lennert
#!/usr/bin/perl -w
use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
sub read_stats { local *STATS; my %stats;
%stats = ();
open STATS, "< /proc/diskstats"; while (<STATS>) { my @fields;
chomp; @fields = split; if ($fields[2] =~ /^sd[a-z]$/) { my $head = $fields[3] + $fields[7]; my $tail = $head + $fields[11];
$stats{$fields[2]} = [ $head, $tail ]; } } close STATS;
return %stats; }
my %heads; my %queues;
%heads = (); %queues = ();
while (1) { my $now; my %stats; my $disk;
$now = clock_gettime(CLOCK_MONOTONIC);
%stats = %{read_stats()}; foreach $disk (sort keys %stats) { my $head; my $tail;
next if ($disk eq "sda");
$head = $stats{$disk}->[0]; $tail = $stats{$disk}->[1];
if (not defined $heads{$disk}) { print "new disk $disk\n"; $heads{$disk} = $head; $queues{$disk} = []; }
while ($heads{$disk} + scalar(@{$queues{$disk}}) < $tail) { push @{$queues{$disk}}, $now; print "$now: $disk add -> " . scalar(@{$queues{$disk}}) . "\n"; }
while ($heads{$disk} < $head) { my $s;
$heads{$disk}++; $s = int(1000 * ($now - (shift @{$queues{$disk}})));
if ($s > 1000) { print "WTF on " . localtime(time()) . " for $s ms\n"; } print "$now: $disk remove after $s ms -> " . scalar(@{$queues{$disk}}) . "\n"; } } }
On Mar 16, 2010, at 3:43 AM, Lennert Buytenhek buytenh@wantstofly.org wrote:
Hi!
On two different machines, I've been experiencing disk I/O stalls after upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS expander, with one machine having 16 1T WD disks hooked up to it, and the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T disks.
When there's a disk I/O stall, all reads and writes to any disk behind the SAS controller/expander just hang for a while (typically for almost exactly eight seconds), so not just the I/O to one particular disk or a subset of the disks. The disks on other (on-board SATA) controllers still pass I/O requests when the SAS I/O stalls.
I hacked up the attached (dirty) perl script to demonstrate this effect -- it will read /proc/diskstats in a tight loop, and keep track of which request entered the request queue when, and when it completed, and it will WTF if a request took more than a second. (The same thing can probably be done with blktrace, but I was lazy.) New requests get submitted, but the pending ones fail to complete for a while, and then they all complete at once.
This happens on kernel-2.6.18-164.11.1.el5, while reverting to the latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue go away again, i.e. no more stalls.
It doesn't seem to matter whether the I/O load is high or not -- the stalls happen even under almost no load at all.
Before I dig into this further, has anyone experienced anything similar? A quick google search didn't come up with much.
I would use iostat -x and see if there is a disk or group of disks that show abnormal service times and/or utilization.
Are there any errors in the logs?
How are the disks configured? Software raid?
Is the adapter's firmware at the latest revision?
Was .128 kernel running stock drivers?Is .164 kernel running stock drivers? (maybe weak-updates from .128 kernel?)
What IO scheduler is this? Default CFQ?
I would move this discussion to 'CentOS Users' as that is the more appropriate list for this.
-Ross
On Tue, 16 Mar 2010, Ross Walker wrote:
I would move this discussion to 'CentOS Users' as that is the more appropriate list for this.
I would think that bugzilla.redhat.com would be the best place for that information to be sent.
On Sunday, March 28, 2010, Charlie Brady charlieb-centos-devel@budge.apana.org.au wrote:
On Tue, 16 Mar 2010, Ross Walker wrote:
I would move this discussion to 'CentOS Users' as that is the more appropriate list for this.
I would think that bugzilla.redhat.com would be the best place for that information to be sent.
Why? It hasn't been determined to be a bug yet?
Once it's confirmed a bug then bugzilla, otherwise the user list would provide possible configurations to look at and even a possible workaround if it is a bug, which might be a firmware bug and not a kernel driver bug.
-Ross
On Mon, Mar 29, 2010 at 7:36 AM, Ross Walker rswwalker@gmail.com wrote:
On Sunday, March 28, 2010, Charlie Brady charlieb-centos-devel@budge.apana.org.au wrote:
On Tue, 16 Mar 2010, Ross Walker wrote:
I would move this discussion to 'CentOS Users' as that is the more appropriate list for this.
I would think that bugzilla.redhat.com would be the best place for that information to be sent.
Why? It hasn't been determined to be a bug yet?
Because it's another venue, which may get additional eyes looking at it. Plus, it's not just a "bug tracking system", it's a "gather information pertinent to a problem in order to ascertain if the problem is a bug or not system." :-)
jerry
On Tue, Mar 16, 2010 at 10:12:52AM -0400, Ross Walker wrote:
On two different machines, I've been experiencing disk I/O stalls after upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS expander, with one machine having 16 1T WD disks hooked up to it, and the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T disks.
When there's a disk I/O stall, all reads and writes to any disk behind the SAS controller/expander just hang for a while (typically for almost exactly eight seconds), so not just the I/O to one particular disk or a subset of the disks. The disks on other (on-board SATA) controllers still pass I/O requests when the SAS I/O stalls.
I hacked up the attached (dirty) perl script to demonstrate this effect -- it will read /proc/diskstats in a tight loop, and keep track of which request entered the request queue when, and when it completed, and it will WTF if a request took more than a second. (The same thing can probably be done with blktrace, but I was lazy.) New requests get submitted, but the pending ones fail to complete for a while, and then they all complete at once.
This happens on kernel-2.6.18-164.11.1.el5, while reverting to the latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue go away again, i.e. no more stalls.
It doesn't seem to matter whether the I/O load is high or not -- the stalls happen even under almost no load at all.
Before I dig into this further, has anyone experienced anything similar? A quick google search didn't come up with much.
I would use iostat -x and see if there is a disk or group of disks that show abnormal service times and/or utilization.
I/O to all 16 disks stalls simultaneously, for 8 seconds at a time, and 'iostat -k 1' shows zero kb/s read and written to each of the disks (sdb - sdq) for the entire interval.
Are there any errors in the logs?
Nope.
How are the disks configured? Software raid?
Yes, two 8-disk RAID6 sets -- but that doesn't seem relevant.
Is the adapter's firmware at the latest revision?
Not sure. I tried upgrading it but the vendor's firmware updater won't let me (see other email for details).
Was .128 kernel running stock drivers?
Yes.
Is .164 kernel running stock drivers?
Yes.
(maybe weak-updates from .128 kernel?)
Nope.
What IO scheduler is this? Default CFQ?
Yes.
On Tue, Mar 16, 2010 at 08:43:26AM +0100, Lennert Buytenhek wrote:
On two different machines, I've been experiencing disk I/O stalls after upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS expander, with one machine having 16 1T WD disks hooked up to it, and the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T disks.
When there's a disk I/O stall, all reads and writes to any disk behind the SAS controller/expander just hang for a while (typically for almost exactly eight seconds), so not just the I/O to one particular disk or a subset of the disks. The disks on other (on-board SATA) controllers still pass I/O requests when the SAS I/O stalls.
FWIW, on the first machine mentioned above, I upgraded the system BIOS, mptsas controller option ROM, and kernel (to the CentOS 5.5 kernel) all in one go (in an attempt to minimise downtime), and the problem has so far (after ~1 hour of I/O) not resurfaced yet.
Since this is a Supermicro i7 board and the second machine mentioned above has a totally different board, I suspect that the system BIOS upgrade will not have made a difference. I'll try to upgrade the second machine to the CentOS 5.5 kernel soonish and see if that by itself makes the problem go away -- if not, I'll try upgrading the option ROM on that machine's mptsas controller as well.
(I tried upgrading the SAS controller's firmware as well, but the LSI mpt tool refuses to do that, as it complains that the Product ID on the controller doesn't match "SAS3442E" which is apparently what it expected to see.) (This is a Supermicro AOC-USAS-L8i, and the firmware update files came straight from supermicro's ftp site.)