[CentOS] DIMM problem

Wed Apr 24 17:34:37 UTC 2013
m.roth at 5-cent.us <m.roth at 5-cent.us>

Hey, folks,

   I've got an HP Proliant DL580 G5 throwing ECC errors. This is annoying,
since a) it's all new as of a few months ago, and b) it's *fully*
populated. The two things I need to figure out are a) *which* DIMM it
is, and b) is it mirrored; if so, which *other* DIMM needs to come out
until we get replacements from the OEM.

Here's one of many, all identical, from dmesg:
EDAC MC0: CE row 12, channel 1, label "": Corrected error (Branch=0,
Channel 1),  DRAM-Bank=2 RD RAS=8218 CAS=500, CE Err=0x10000,
Syndrome=0x6cad8e02(Correctable Patrol Data ECC))

I see the Bank=2, so I assume that's the first riser board on the left;
but I can't identify which of the four (?) DIMMs on it is the problem.

I've been googling, and skimming useless manuals, and have just been
trying to look under /sys/devices/system/edac/mc/mc0/. I see ce_count
there showing thousands; but all of the ce_count files under csrow[0-7]
show zero.

Clues, anyone?

         mark