[CentOS] DIMM problem

Wed Apr 24 18:09:23 UTC 2013
me at tdiehl.org <me at tdiehl.org>

On Wed, 24 Apr 2013, m.roth at 5-cent.us wrote:

> Hey, folks,
>   I've got an HP Proliant DL580 G5 throwing ECC errors. This is annoying,
> since a) it's all new as of a few months ago, and b) it's *fully*
> populated. The two things I need to figure out are a) *which* DIMM it
> is, and b) is it mirrored; if so, which *other* DIMM needs to come out
> until we get replacements from the OEM.
> Here's one of many, all identical, from dmesg:
> EDAC MC0: CE row 12, channel 1, label "": Corrected error (Branch=0,
> Channel 1),  DRAM-Bank=2 RD RAS=8218 CAS=500, CE Err=0x10000,
> Syndrome=0x6cad8e02(Correctable Patrol Data ECC))
> I see the Bank=2, so I assume that's the first riser board on the left;
> but I can't identify which of the four (?) DIMMs on it is the problem.
> I've been googling, and skimming useless manuals, and have just been
> trying to look under /sys/devices/system/edac/mc/mc0/. I see ce_count
> there showing thousands; but all of the ce_count files under csrow[0-7]
> show zero.
> Clues, anyone?

Is there anything in the iml log on ILO? Also did you try just re-seating
the memory or moving it into other slots to see if you can track it down that


