the interesting thing is I only see these ECC errors when I am writting data to this box, and no error shows up when I am reading data from it, so if it was corrupted Memory or controller those errors should show up even when I am reading them. am I missing some thing here ? Peter Arremann wrote: > On Thursday 11 October 2007, Centos wrote: > >> The ECC errors only happens when I am transferring data from other >> storage to this one that we get error. >> it only happens when it is writing data to it. >> > > ECC errors can happen anywhere. It can be that the data is corrupted while it > is transmitted to the storage device. Or the data can degrade while stored. > And of course, on the transmission from the storage you have another chance > to screw it up. > > Problem is, in almost all cases, you won't see those errors until you read the > data. The memory controller will then perform the ECC checksum and see that > the data that was returned is bad. What happens then depends on what type of > memory and memory controller you have. > > Simple (old) x86 setups will correct single bit errors and report double bit > errors as uncorrectable. If you happen to have 3 bits that changed in the > same dataword, ECC will actually screw you up worse - it will see it as a > single bit error and correct the wrong way. That way you get corrupt data and > a soft error. > > Newer, more complex x86 configs and most proprietary unix boxes protect > against that by using fancier ECC algorithms, memory raid and things like > that. > > Anyway - ECC errors to me mean that I need to trigger a failover and get off > the box asap. There is no ECC algorithm and hardware setup out there that > does the right thing every single time. If you don't have a failover, see if > you can take the system down now, remove the offending dimm/bank and run with > the remaining ram until you get replacements. > > Peter. > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > >