On Thu, Aug 21, 2014 at 6:17 PM, John R Pierce pierce@hogranch.com wrote:
Yes, but try a software RAID when you have intermittently bad RAM. I've been there. Mirrored disks that were almost, but not quite, mirrors.
try any file system when you've got flakey ram. data thats not quite what you wanted, oh boy.
Yes, but if you fix the RAM, fsck the disk, and rewrite the data you sort of expect it to work again. In this case with the mirrors randomly mismatching but marked as good, fsck would read the good one in some spots when checking but later the system would read the bad one. In hindsight the reason is obvious but it took me a while to see why the box still crashed every few weeks.
which, btw, is why I insist on ECC for servers. and really prefer ZFS where each block of each part of a raid is checksummed and timestamped, so when scrub finds mismatching blocks, it can know which one is correct.
I thought this was supposed to be ECC with 1-bit correction - and I thought that was supposed to mean that if it couldn't correct it would just stop, but it didn't. It took about 3 days of a memtest-86 run to hit the problem and show that it was RAM - and it has run for many subsequent years since swapping it all. But, the only reason that box is still around is that it is an enormous tower case and the only thing I had with enough drive bays for what I was doing then.