On Fri, Mar 8, 2013 at 2:12 PM, Les Mikesell <lesmikesell at gmail.com> wrote: >> I will try your suggestion of trying a separate set of banks in the >> off chance that those slots are faulty. > > I had one a few years ago where it took about 3 days for memtest to > catch the bad RAM but even after fixing that there were random > crashes. Turned out that the bad RAM had caused some disk corruption > which was partly hidden by raid1 mirroring. Once in a while a program > block read would hit the bad copy, but when you look for it everything > looks OK... I'm running on the second bank now. I ran into a snag running mcelogd however (processor might not be supported). It appears that the CPU is not supported even after enabling the CONFIG_EDAC_MCE and CONFIG_EDAC_AMD64 in the /boot/config-xxx.. The error sometimes takes a few hours to occur so will use this system throughout the night to try to catch the failure. Starting mcelog daemon [FAILED] AMD Processor family 21: Please load edac_mce_amd module. CPU is unsupported