[CentOS] ECC RAM Error

Centos centos at unixplanet.biz
Thu Oct 11 14:57:53 UTC 2007


the interesting thing is I only see these ECC errors when I am writting 
data to this box,
and no error shows up when I am reading data from it, so if it was 
corrupted Memory or controller
those errors should show up even when I am reading them.

am I missing some thing here ?


Peter Arremann wrote:
> On Thursday 11 October 2007, Centos wrote:
>   
>> The ECC errors only happens when I am transferring data from other
>> storage to this one that we get error.
>> it only happens when it is writing data to it.
>>     
>
> ECC errors can happen anywhere. It can be that the data is corrupted while it 
> is transmitted to the storage device. Or the data can degrade while stored. 
> And of course, on the transmission from the storage you have another chance 
> to screw it up.
>
> Problem is, in almost all cases, you won't see those errors until you read the 
> data. The memory controller will then perform the ECC checksum and see that 
> the data that was returned is bad. What happens then depends on what type of 
> memory and memory controller you have. 
>
> Simple (old) x86 setups will correct single bit errors and report double bit 
> errors as uncorrectable. If you happen to have 3 bits that changed in the 
> same dataword, ECC will actually screw you up worse - it will see it as a 
> single bit error and correct the wrong way. That way you get corrupt data and 
> a soft error. 
>
> Newer, more complex x86 configs and most proprietary unix boxes protect 
> against that by using fancier ECC algorithms, memory raid and things like 
> that. 
>
> Anyway - ECC errors to me mean that I need to trigger a failover and get off 
> the box asap. There is no ECC algorithm and hardware setup out there that 
> does the right thing every single time. If you don't have a failover, see if 
> you can take the system down now, remove the offending dimm/bank and run with 
> the remaining ram until you get replacements. 
>
> Peter.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>   




More information about the CentOS mailing list