Hey, folks,
I've got an HP Proliant DL580 G5 throwing ECC errors. This is annoying,
since a) it's all new as of a few months ago, and b) it's *fully*
populated. The two things I need to figure out are a) *which* DIMM it
is, and b) is it mirrored; if so, which *other* DIMM needs to come out
until we get replacements from the OEM.
Here's one of many, all identical, from dmesg:
EDAC MC0: CE row 12, channel 1, label "": Corrected error (Branch=0,
Channel 1), DRAM-Bank=2 RD RAS=8218 CAS=500, CE Err=0x10000,
Syndrome=0x6cad8e02(Correctable Patrol Data ECC))
I see the Bank=2, so I assume that's the first riser board on the left;
but I can't identify which of the four (?) DIMMs on it is the problem.
I've been googling, and skimming useless manuals, and have just been
trying to look under /sys/devices/system/edac/mc/mc0/. I see ce_count
there showing thousands; but all of the ce_count files under csrow[0-7]
show zero.
Clues, anyone?
mark