On Monday, September 26, 2011 12:36:19 PM m.roth@5-cent.us wrote:
a) have you checked /var/log/message for memory or drive errors?
Looked through the logs, there's *nothing* I can find that's out of sorts. When the IO problem happens, nothing can be written.
Maybe memtest86?
I replaced all the RAM from working/non-working machines. In several cases where replacing RAM resolved the issue, memtest didn't indicate any problems, so I'm not inclined to trust it.
b) diffed dmesg between working and dying machines?
Other than the IRQ difference noted earlier, visual scan revealed no differences involving mpt2.
One more thing: should we assume you were trying to do things, when they die, from the console? I ask because I note that you're using the e1000e driver, which was just the subject of a thread here.
I'm familiar with the stale EL6 e1000e driver. I've been using one included by yum from elrepo. Manually downloaded RPM so that ethernet works before doing a yum -y update. I've been assuming this was unrelated.