[CentOS-devel] disk i/o stalls with mptsas since upgrade to centos 5.4

Tue Mar 16 07:43:27 UTC 2010
Lennert Buytenhek <buytenh at wantstofly.org>

Hi!

On two different machines, I've been experiencing disk I/O stalls after
upgrading to the CentOS 5.4 kernel.  Both machines have an LSI 1068E
MPT SAS (mptsas) controller connected to a Chenbro CK13601 36-port SAS
expander, with one machine having 16 1T WD disks hooked up to it, and
the other having a mix of about 20 WD/Seagate/Samsung/Hitachi 1T and 2T
disks.

When there's a disk I/O stall, all reads and writes to any disk behind
the SAS controller/expander just hang for a while (typically for almost
exactly eight seconds), so not just the I/O to one particular disk or a
subset of the disks.  The disks on other (on-board SATA) controllers
still pass I/O requests when the SAS I/O stalls.

I hacked up the attached (dirty) perl script to demonstrate this effect
-- it will read /proc/diskstats in a tight loop, and keep track of
which request entered the request queue when, and when it completed, and
it will WTF if a request took more than a second.  (The same thing can
probably be done with blktrace, but I was lazy.)  New requests get
submitted, but the pending ones fail to complete for a while, and then
they all complete at once.

This happens on kernel-2.6.18-164.11.1.el5, while reverting to the
latest CentOS 5.3 kernel (kernel-2.6.18-128.7.1.el5) makes the issue go
away again, i.e. no more stalls.

It doesn't seem to matter whether the I/O load is high or not -- the
stalls happen even under almost no load at all.

Before I dig into this further, has anyone experienced anything similar?
A quick google search didn't come up with much.


thanks,
Lennert



#!/usr/bin/perl -w

use Time::HiRes qw(clock_gettime CLOCK_MONOTONIC);

sub read_stats {
	local *STATS;
	my %stats;

	%stats = ();

	open STATS, "< /proc/diskstats";
	while (<STATS>) {
		my @fields;

		chomp;
		@fields = split;
		if ($fields[2] =~ /^sd[a-z]$/) {
			my $head = $fields[3] + $fields[7];
			my $tail = $head + $fields[11];

			$stats{$fields[2]} = [ $head, $tail ];
		}
	}
	close STATS;

	return \%stats;
}


my %heads;
my %queues;

%heads = ();
%queues = ();

while (1) {
	my $now;
	my %stats;
	my $disk;

	$now = clock_gettime(CLOCK_MONOTONIC);

	%stats = %{read_stats()};
	foreach $disk (sort keys %stats) {
		my $head;
		my $tail;

		next if ($disk eq "sda");

		$head = $stats{$disk}->[0];
		$tail = $stats{$disk}->[1];

		if (not defined $heads{$disk}) {
			print "new disk $disk\n";
			$heads{$disk} = $head;
			$queues{$disk} = [];
		}

		while ($heads{$disk} + scalar(@{$queues{$disk}}) < $tail) {
			push @{$queues{$disk}}, $now;
			print "$now: $disk add -> " . scalar(@{$queues{$disk}}) . "\n";
		}

		while ($heads{$disk} < $head) {
			my $s;

			$heads{$disk}++;
			$s = int(1000 * ($now - (shift @{$queues{$disk}})));

			if ($s > 1000) {
				print "WTF on " . localtime(time()) . " for $s ms\n";
			}
			print "$now: $disk remove after $s ms -> " . scalar(@{$queues{$disk}}) . "\n";
		}
	}
}