On Mon, Oct 13, 2008 at 3:38 PM, Jeff jpotter-centos@codepuppy.com wrote:
We had the following error thrown on console on a PowerEdge server running CentOS 5 (64 bit). Googling around didn't yield any particular insights. The server crashed a few minutes after this message. Running memtester, just to check, didn't find anything; and the box has been running for months before this without issue. I'm wondering if anyone has run across this before, and if so, if it was software (CentOS) or hardware (PowerEdge / PowerVault) related? Oct 8 12:19:35 someServer kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x4 Oct 8 12:19:35 someServer kernel: EDAC i5000 MC0: >Tmid Thermal event with intelligent throttling disabled Oct 8 12:19:35 someServer kernel: EDAC MC0: UE row 1, channel-a= 2 channel-b= 3 labels "-": (Branch=1 DRAM-Bank=0 RDWR=Write RAS=11802 CAS=0 FATAL Err=0x4)
IIRC the EDAC i5000 is the memory controller of the server, and it looks like something went wrong with a DIMM and that is probably why it crashed. So it looks like you may have a (intermittent) hardware issue.
Regards, Tim