[CentOS] Re: ECC RAM Error

Mon Oct 15 15:33:20 UTC 2007
Scott Silva <ssilva at sgvwater.com>

on 10/15/2007 5:16 AM Centos spake the following:
> Thanks every one for help and response.
> 
> I just noticed that these errors might be soft error, because only 
> happens when I overload the
> storage with copying simultaneously large files on the same port and 
> scsi controller, so I was thinking
> it should be  ECC speed to calculation of the parity or ram shortage.
> 
> hardware supposed to take care of ECC erros and also device should
> be panic or hang by seeing these error, but device just keep going.
> 
> what do you think ?
> 
I have had systems so overloaded that I couldn't log in on an ssh session, but 
when the load cleared, there weren't any ECC errors. I still think you have a 
hardware problem, and just because it takes a high load now doesn't mean that 
it is OK. A faulty timing capacitor on the motherboard can cause all sorts of 
corruption in memory, and it will probably deteriorate over time. You need to 
methodically test the memory by running memory tests, and then moving ram and 
testing again. Or replace the hardware if it is mission critical.

-- 
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!