On 12/07/11 12:55 AM, John Hodrien wrote:
In my limited experience, if you can disable ECC in your BIOS, memtest is just as good at spotting errors on ECC as non-ECC. With ECC enabled, you'll need seriously messed up ECC before it'll be detected.
except with ECC disabled, the extra 8 ECC bits per 64bit memory word aren't touched at all.
I'd leave ECC on, and skip running memtest entirely, just run real OS workloads and let the ECC do the memory test on the fly, as its meant to.
does linux have an ECC scrubber process? 'real' Unix servers (Solaris, AIX, etc) generally have a background process, sometimes its part of the Idle process, that does a read/write of every memory location when the machine is otherwise idle, this catches and fixes soft ECC errors in otherwise idle memory, which in turn gets logged. Solaris (on Sun Sparc hardware at least) keeps track of what locations have had bad memory, and will stop using a memory page entirely (with a logged alert) if there are too many soft ECC errors in the same area.