Hi!
On two different machines, I've been experiencing disk I/O stalls after upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS expander, with one machine having 16 1T WD disks hooked up to it, and the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T disks.
When there's a disk I/O stall, all reads and writes to any disk behind the SAS controller/expander just hang for a while (typically for almost exactly eight seconds), so not just the I/O to one particular disk or a subset of the disks. The disks on other (on-board SATA) controllers still pass I/O requests when the SAS I/O stalls.
I hacked up the attached (dirty) perl script to demonstrate this effect -- it will read /proc/diskstats in a tight loop, and keep track of which request entered the request queue when, and when it completed, and it will WTF if a request took more than a second. (The same thing can probably be done with blktrace, but I was lazy.) New requests get submitted, but the pending ones fail to complete for a while, and then they all complete at once.
This happens on kernel-2.6.18-164.11.1.el5, while reverting to the latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue go away again, i.e. no more stalls.
It doesn't seem to matter whether the I/O load is high or not -- the stalls happen even under almost no load at all.
Before I dig into this further, has anyone experienced anything similar? A quick google search didn't come up with much.
thanks, Lennert
#!/usr/bin/perl -w
use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
sub read_stats { local *STATS; my %stats;
%stats = ();
open STATS, "< /proc/diskstats"; while (<STATS>) { my @fields;
chomp; @fields = split; if ($fields[2] =~ /^sd[a-z]$/) { my $head = $fields[3] + $fields[7]; my $tail = $head + $fields[11];
$stats{$fields[2]} = [ $head, $tail ]; } } close STATS;
return %stats; }
my %heads; my %queues;
%heads = (); %queues = ();
while (1) { my $now; my %stats; my $disk;
$now = clock_gettime(CLOCK_MONOTONIC);
%stats = %{read_stats()}; foreach $disk (sort keys %stats) { my $head; my $tail;
next if ($disk eq "sda");
$head = $stats{$disk}->[0]; $tail = $stats{$disk}->[1];
if (not defined $heads{$disk}) { print "new disk $disk\n"; $heads{$disk} = $head; $queues{$disk} = []; }
while ($heads{$disk} + scalar(@{$queues{$disk}}) < $tail) { push @{$queues{$disk}}, $now; print "$now: $disk add -> " . scalar(@{$queues{$disk}}) . "\n"; }
while ($heads{$disk} < $head) { my $s;
$heads{$disk}++; $s = int(1000 * ($now - (shift @{$queues{$disk}})));
if ($s > 1000) { print "WTF on " . localtime(time()) . " for $s ms\n"; } print "$now: $disk remove after $s ms -> " . scalar(@{$queues{$disk}}) . "\n"; } } }