Kay Diederichs wrote:
Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action
This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question.
Thanks for posting this. I have a machine that periodically had filesystem errors on a RAID1 volume that I eventually found were caused by bad RAM but even after replacing it I'd still see filesystem problems reappear every few weeks. It turned out that there were quite a few mismatched blocks between the mirrors and the fsck passes must have sometimes seen the good copy but subsequently the still-bad alternate would be used. Now I've done a repair and fsck and so far everything seems stable. It's hard to tell with problems that only happen once or twice a month, though. I suppose I have some files with corrupt contents on there but they are backups that will expire as more current ones are saved anyway.
BTW, there is - even with current kernels - no speed gain in using RAID1
I don't think I believe that - you can see the reads alternating drives by watching the lights.