[CentOS] Hard I/O lockup with EL6

Mon Sep 26 21:42:18 UTC 2011
Devin Reade <gdr at gno.org>

--On Monday, September 26, 2011 12:11:47 PM -0700 Benjamin Smith
<lists at benjamindsmith.com> wrote:

> I'm trying to figure out why 2 machines have a "hard I/O lock" on the HDD
> when  running EL6. 

I _won't_ chime in with a "check your <whatever>".  Instead here's a
potentially useless datapoint:

I have an older but still usuable 32 bit 686 class machine that was formerly
a production machine running Fedora Core 6.  Its services were migrated
off a while back and I decided I'd use it as a test of CentOS 6. For
this test I needed a few disks in RAID6 and the motherboard only had
two SATA ports so I added a multiport PCI SATA card (a model that 
has served me well in the past).

Short version:  Although the install went fine, trying to run CentOS 6
on this with a four disk RAID6 (with the first 200MB of each disk in 
RAID1 for /boot, the remainder as RAID6 with LVM on top) resulted in 
an unstable system.  After some unpredictable amount of time (anywhere
from 15 minutes to days) the system would lock up hard.  Unfortunately 
I don't recall if the error messages were identical to yours, but it
seems eerily familiar.

I did the usual tricks about swapping out drive controllers, disks, 
using different combinations of onboard vs addon SATA, memtest86,
increased power supply capacity, etc.  No dice.

I eventually ended up getting new hardware for the task (an HP
MicroServer) and so far the new machine seems to be stable enough
running CentOS 6 in the RAID1 /boot + RAID6 LVM configuration.  I've
not had the chance yet to go back and experiment with the old 
machine under C6.

Unfortunately in trying to use C6 on the old machine I wound up with
far too many changed variables to figure out where the problem was.
Despite that, my gut tells me that it's not a hardware problem.

Devin