Hi!
On two different machines, I've been experiencing disk I/O stalls after
upgrading to the CentOS 5.4 kernel. Both machines have an LSI 1068E
MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS
expander, with one machine having 16 1T WD disks hooked up to it, and
the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T
disks.
When there's a disk I/O stall, all reads and writes to any disk behind
the SAS controller/expander just hang for a while (typically for almost
exactly eight seconds), so not just the I/O to one particular disk or a
subset of the disks. The disks on other (on-board SATA) controllers
still pass I/O requests when the SAS I/O stalls.
I hacked up the attached (dirty) perl script to demonstrate this effect
-- it will read /proc/diskstats in a tight loop, and keep track of
which request entered the request queue when, and when it completed, and
it will WTF if a request took more than a second. (The same thing can
probably be done with blktrace, but I was lazy.) New requests get
submitted, but the pending ones fail to complete for a while, and then
they all complete at once.
This happens on kernel-2.6.18-164.11.1.el5, while reverting to the
latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue go
away again, i.e. no more stalls.
It doesn't seem to matter whether the I/O load is high or not -- the
stalls happen even under almost no load at all.
Before I dig into this further, has anyone experienced anything similar?
A quick google search didn't come up with much.
thanks,
Lennert
#!/usr/bin/perl -w
use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);
sub read_stats {
local *STATS;
my %stats;
%stats = ();
open STATS, "< /proc/diskstats";
while (<STATS>) {
my @fields;
chomp;
@fields = split;
if ($fields[2] =~ /^sd[a-z]$/) {
my $head = $fields[3] + $fields[7];
my $tail = $head + $fields[11];
$stats{$fields[2]} = [ $head, $tail ];
}
}
close STATS;
return \%stats;
}
my %heads;
my %queues;
%heads = ();
%queues = ();
while (1) {
my $now;
my %stats;
my $disk;
$now = clock_gettime(CLOCK_MONOTONIC);
%stats = %{read_stats()};
foreach $disk (sort keys %stats) {
my $head;
my $tail;
next if ($disk eq "sda");
$head = $stats{$disk}->[0];
$tail = $stats{$disk}->[1];
if (not defined $heads{$disk}) {
print "new disk $disk\n";
$heads{$disk} = $head;
$queues{$disk} = [];
}
while ($heads{$disk} + scalar(@{$queues{$disk}}) < $tail) {
push @{$queues{$disk}}, $now;
print "$now: $disk add -> " . scalar(@{$queues{$disk}}) . "\n";
}
while ($heads{$disk} < $head) {
my $s;
$heads{$disk}++;
$s = int(1000 * ($now - (shift @{$queues{$disk}})));
if ($s > 1000) {
print "WTF on " . localtime(time()) . " for $s ms\n";
}
print "$now: $disk remove after $s ms -> " . scalar(@{$queues{$disk}}) . "\n";
}
}
}