On Sep 26, 2011, at 3:11 PM, Benjamin Smith lists@benjamindsmith.com wrote:
I'm trying to figure out why 2 machines have a "hard I/O lock" on the HDD when running EL6.
I have 4 identical machines, all were stable with EL5. 2 work great with EL6, 2 do not. I've checked momtherboard BIOS versions and settings, SAS controller BIOS versions and settings, they are the same between the working and non- working systems.
When booting a non-working system, it boots straight up to the boot prompt (runlevel 3) without issue, and everything works fine. When the machine sits idle for a period of time (ranging from 15 minutes or so and up) the HDD becomes unreadable/unwritable and the system is useless for any purpose and must be hard restarted with a full power cycle - it won't even shut down.
Since nothing is logged, I've had precious little information to diagnose with. After several attempts to find out what's going on, I find the following emitted to the screen:
mpt2sas0: diag reset: FAILED mpt2sas0: diag reset: FAILED mpt2sas0: diag reset: FAILED end_request: I/O error, dev sda, sector 226972349 Buffer I/O error, device sda5, logical block 2719747 sd 0:0:0:0rejecting I/O to offline device sd 0:0:0:0rejecting I/O to offline device sd 0:0:0:0rejecting I/O to offline device
This is NOT due to a faulty HDD: I've tried new hard disks, SATA/SAS, I've swapped hard disks with an identical working unit and verified that the working unit remains working and the failing unit continues to fail. I've reformatted and re-installed EL6 numerous times with consistent results.
Googling this error returned very little useful information: where should I go now? Below, please find outputs of dmesg and lspci. I've compared outputs of dmesg between working and nonworking systems, the output of anything with "mpt" at the beginning is identical except for different IRQ ports.
Tried upgrading BIOS?
Errors during idle periods might point to C-State or P-State compatibility issues.
You could try disabling the power management (Speedstep) in the BIOS and see if that makes a difference.
-Ross