[CentOS] Intel SE7210TP1-E giving memory errors

Mon Dec 5 08:17:44 UTC 2011
Rob Kampen <rkampen at kampensonline.com>

Rob Kampen wrote:
> Hi List,
> I've been getting the following EDAC memory errors
> EDAC MC0: CE page 0xeb0dd, offset 0x0, grain 4096, syndrome 0x45, row 
> 3, channel 0, label "": i82875p CE
> and from this seeing that these errors have been corrected.
> Checking cat /sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count gives 
> me a count of 4
> thus I now know that csrow3 - ch0 is the problem
> My question is, how does this map to the on board labels
> Am I correct in assuming csrow 3 is DIMM 2B?
Swapped the memory between DIMM 2A and DIMM 2B - still get fault in row 
3, channel 0 - thus did not move with the RAM??
Next reboot I'll try swapping 1A and 1B
> Also I have just discovered that both the OS drives sda and sdb have 
> huge number of errors shown on the SMART records
> - can this relate to the memory errors??
> - I am just really surprised to have two drives show almost identical 
> number of errors at the same time, yet no apparent data errors - 
> Drives are ATA ST380013AS 74.53 GB
Just for safety I swapped /dev/sda with a new slightly larger drive did 
the sfdisk foo and added it to the md raid drives.
This brand new drive immediately shows high raw read error rate and 
hardware ECC recovered in the tens of millions - I think this is not a 
drive issue but related to the ECC mem errors??
Anyone with experience?
> TIA for your insightful comments
> ------------------------------------------------------------------------
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos