[CentOS] ECC RAM Error

Thu Oct 11 14:40:33 UTC 2007
Dan Halbert <halbert at everyzing.com>

> do you think replacing ram will solve our problem ?
> how can I make sure it is the ram ?
This is almost certainly a hardware problem. It could be the RAM, a 
particular motherboard DIMM slot, or maybe the RAM is just not seated 
quite right in the memory slot. I have seen all three of these problems.

Try running the standalone memory tester. First run:
# yum install memtest86+

This will add a boot option of booting into memtest86+ instead of into 
CentOS. See if you can reproduce the error with memtest86+. That may 
save some time.

When you have reproduced the error, try just reseating all the DIMM's. 
Pop them out and push them back in firmly. Try blowing out any dust that 
may be in the memory slots.

Assuming it still fails, then pull out the memory DIMM's one at a time 
(unless you need to do it in pairs), and keep running your test until it 
doesn't fail. When it stops failing, try the suspicious DIMM all by 
itself in a different slot and see if it fails.

By doing this kind of divide and conquer, you will be able to determine 
whether it is the DIMM or the motherboard.

Dan