On Wed, Oct 19, 2011 at 2:33 PM, Lamar Owen lowen@pari.edu wrote:
On Tuesday, October 18, 2011 01:07:02 PM Les Mikesell wrote:
I don't think anything is immune to failure. Another fun case is a randomly-bad memory bit causing different things to be written to software raid mirrors. I had one that took 3+ days of running memtest86 to catch.
ECC RAM?
The server said it was one-bit-correcting or something like that. I thought it was supposed to stop if it had errors it couldn't correct. I swapped the whole set out at once without digging much more into the details.