On Fri, Mar 8, 2013 at 2:12 PM, Les Mikesell lesmikesell@gmail.com wrote:
I will try your suggestion of trying a separate set of banks in the off chance that those slots are faulty.
I had one a few years ago where it took about 3 days for memtest to catch the bad RAM but even after fixing that there were random crashes. Turned out that the bad RAM had caused some disk corruption which was partly hidden by raid1 mirroring. Once in a while a program block read would hit the bad copy, but when you look for it everything looks OK...
I'm running on the second bank now. I ran into a snag running mcelogd however (processor might not be supported). It appears that the CPU is not supported even after enabling the CONFIG_EDAC_MCE and CONFIG_EDAC_AMD64 in the /boot/config-xxx.. The error sometimes takes a few hours to occur so will use this system throughout the night to try to catch the failure.
Starting mcelog daemon [FAILED] AMD Processor family 21: Please load edac_mce_amd module. CPU is unsupported