Christopher Chan wrote: >>>> Funny you should mention software RAID1... I've seen two instances of that >>> getting silently out-of-sync and royally screwing things up beyond all >>> repair. >>> >>> Maybe this thread has gone on long enough now? >>> >> Not yet :) >> >> Please tell more about your hardware and software. What distro? What >> kernel? What disk controller? What disks? >> >> I'm interested in this because I have never seen Linux software MD RAID1 >> failures like this, but some people keep telling they happen frequently.. > > It could be like Les said - bad RAM. I certainly have not encountered > this sort of error on a md raid1 array. > >> I'm just wondering why I'm not seeing these failures, or if I've just >> been lucky so far.. >> > > Yeah, lucky you've not got bad RAM that passed POSTing and at the same > time did not bring your system down on you right from the start or > rendered it unstable. On the machine where I had the problem I had to run memtest86 more than a day to finally catch it. Then after replacing the RAM and fsck'ing the volume, I still had mysterious problems about once a month until I realized that the disks are accessed alternately and the fsck pass didn't catch everything. I forget the commands to compare and fix the mirroring, but they worked - and I think the centos 5.4 update does that periodically as a cron job now. The other worry is that when one drive dies, you might have unreadable spots in normally unused areas of the mirror since this will keep a rebuild from working - but the cron job should detect those too if you notice the results. -- Les Mikesell lesmikesell at gmail.com