New subject: disk i/o stalls with mptsas since upgrade to centos 5.4

16 Mar 2010


      Hi!
On two different machines, I've been experiencing disk I/O stalls after
upgrading to the CentOS 5.4 kernel.  Both machines have an LSI 1068E
MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS
expander, with one machine having 16 1T WD disks hooked up to it, and
the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T
disks.
When there's a disk I/O stall, all reads and writes to any disk behind
the SAS controller/expander just hang for a while (typically for almost
exactly eight seconds), so not just the I/O to one particular disk or a
subset of the disks.  The disks on other (on-board SATA) controllers
still pass I/O requests when the SAS I/O stalls.
I hacked up the attached (dirty) perl script to demonstrate this effect
-- it will read /proc/diskstats in a tight loop, and keep track of
which request entered the request queue when, and when it completed, and
it will WTF if a request took more than a second.  (The same thing can
probably be done with blktrace, but I was lazy.)  New requests get
submitted, but the pending ones fail to complete for a while, and then
they all complete at once.
This happens on kernel-2.6.18-164.11.1.el5, while reverting to the
latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue go
away again, i.e. no more stalls.
It doesn't seem to matter whether the I/O load is high or not -- the
stalls happen even under almost no load at all.
Before I dig into this further, has anyone experienced anything similar?
A quick google search didn't come up with much.
thanks,
Lennert
#!/usr/bin/perl -w
use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
sub read_stats {
    local *STATS;
    my %stats;
%stats = ();
open STATS, "< /proc/diskstats";
    while (<STATS>) {
    	my @fields;
chomp;
    	@fields = split;
    	if ($fields[2] =~ /^sd[a-z]$/) {
    		my $head = $fields[3] + $fields[7];
    		my $tail = $head + $fields[11];
$stats{$fields[2]} = [ $head, $tail ];
    	}
    }
    close STATS;
return %stats;
}
my %heads;
my %queues;
%heads = ();
%queues = ();
while (1) {
    my $now;
    my %stats;
    my $disk;
$now = clock_gettime(CLOCK_MONOTONIC);
%stats = %{read_stats()};
    foreach $disk (sort keys %stats) {
    	my $head;
    	my $tail;
next if ($disk eq "sda");
$head = $stats{$disk}->[0];
    	$tail = $stats{$disk}->[1];
if (not defined $heads{$disk}) {
    		print "new disk $disk\n";
    		$heads{$disk} = $head;
    		$queues{$disk} = [];
    	}
while ($heads{$disk} + scalar(@{$queues{$disk}}) < $tail) {
    		push @{$queues{$disk}}, $now;
    		print "$now: $disk add -> " . scalar(@{$queues{$disk}}) . "\n";
    	}
while ($heads{$disk} < $head) {
    		my $s;
$heads{$disk}++;
    		$s = int(1000 * ($now - (shift @{$queues{$disk}})));
if ($s > 1000) {
    			print "WTF on " . localtime(time()) . " for $s ms\n";
    		}
    		print "$now: $disk remove after $s ms -> " . scalar(@{$queues{$disk}}) . "\n";
    	}
    }
}