[CentOS] Re: ECC RAM Error
ssilva at sgvwater.com
Mon Oct 15 15:33:20 UTC 2007
on 10/15/2007 5:16 AM Centos spake the following:
> Thanks every one for help and response.
> I just noticed that these errors might be soft error, because only
> happens when I overload the
> storage with copying simultaneously large files on the same port and
> scsi controller, so I was thinking
> it should be ECC speed to calculation of the parity or ram shortage.
> hardware supposed to take care of ECC erros and also device should
> be panic or hang by seeing these error, but device just keep going.
> what do you think ?
I have had systems so overloaded that I couldn't log in on an ssh session, but
when the load cleared, there weren't any ECC errors. I still think you have a
hardware problem, and just because it takes a high load now doesn't mean that
it is OK. A faulty timing capacitor on the motherboard can cause all sorts of
corruption in memory, and it will probably deteriorate over time. You need to
methodically test the memory by running memory tests, and then moving ram and
testing again. Or replace the hardware if it is mission critical.
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
More information about the CentOS