On 04/29/13 04:17, Peter Peltonen wrote:
I started to receive this kind of messages a few days ago on one of my servers:
Message from syslogd@ at Mon Apr 29 08:02:55 2013 ... server1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, UE Err=0x2 (Aliased Uncorrectable Non-Mirrored Demand Data ECC))
I've never had ECC memory to fail on me before, so now I am wondering the following:
- The server is running CentOS 5.7 and is acting as Xen dom0. Is there any
possibility this could be a kernel issue and upgrading would help, or would upgrading at this point just cause more trouble?
Not in my experience.
- Is there now a possibility that my data can get corrupt: should I
shutdown the server as soon as possible or can I keep running until I replace the memories?
Maybe - I'm just not sure. You need to replace the memory asap; order it, and schedule a maintenance window with all your users *now*.
- This server has been running for several years in a datacenter without
problems: what are your experiences, are these kind of problems most likely caused by a failing motherboard or the memories?
DIMM went bad. No big thing. Your only problem may be to identify which one, he says, about to go into work to do just that.
mark