On Nov 18, 2008, at 6:05 PM, Les Mikesell lesmikesell@gmail.com wrote:
nate wrote:
Les Mikesell wrote:
Yes, apparently RAM errors can be subtle and only appear when certain adjacent bit patterns are stored - or when the moon is in a certain phase or something.
Don't forget cosmic rays http://adsabs.harvard.edu/abs/1978ITNS...25.1166P
Yeah, but those don't stop when you replace the faulty RAM... Mine did, but the errors committed to disk kept randomly re-appearing mysteriously as the reads from the RAID1 alternated afterwards.
Ah, memory mapped files, another very good reason to use ECC with large memory machines.
Also if you identify bad memory and use software RAID1, it's better to break the mirror, fsck and fix, then rebuild the mirror as there is no data integrity test on RAID1.
-Ross