[CentOS] Hard I/O lockup with EL6

Mon Sep 26 20:09:04 UTC 2011
Benjamin Smith <lists at benjamindsmith.com>

On Monday, September 26, 2011 12:36:19 PM m.roth at 5-cent.us wrote:
> a) have you checked
> /var/log/message for memory or drive errors? 

Looked through the logs, there's *nothing* I can find that's out of sorts. When 
the IO problem happens, nothing can be written. 

> Maybe memtest86? 

I replaced all the RAM from working/non-working machines. In several cases 
where replacing RAM resolved the issue,  memtest didn't indicate any problems, 
so I'm not inclined to trust it. 

> b) diffed
> dmesg between working and dying machines?

Other than the IRQ difference noted earlier, visual scan revealed no differences 
involving mpt2. 

> 
> One more thing: should we assume you were trying to do things, when they
> die, from the console? I ask because I note that you're using the e1000e
> driver, which was just the subject of a thread here.

I'm familiar with the stale EL6 e1000e driver. I've been using one included by 
yum from elrepo. Manually downloaded RPM so that ethernet works before doing a 
yum -y update. I've been assuming this was unrelated. 

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.