Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.
Thanks, Nataraj
Nataraj wrote:
Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.
under normal operation, each read request goes to one or the other drive, this doubles the read throughput as both drives can be servicing different read requests at the same time.
some raid does a scrub, where in the background, when the disks are otherwise idle, it gradually reads all the raid stripes and validates them. I honestly don't know if Linux built in raid does this or not. Of course, with RAID-1, if the two blocks disagree, there's no way of knowing which one is correct, only that there is a potential problem.
Some raid (Sun ZFS, for instance) stores a checksum with every block so it can detect corruption immediately. Also, I know ZFS does this scrubbing.
On Sun, 2008-09-21 at 11:12 -0700, John R Pierce wrote:
Nataraj wrote:
Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.
under normal operation, each read request goes to one or the other drive, this doubles the read throughput as both drives can be servicing different read requests at the same time.
some raid does a scrub, where in the background, when the disks are otherwise idle, it gradually reads all the raid stripes and validates them. I honestly don't know if Linux built in raid does this or not. Of course, with RAID-1, if the two blocks disagree, there's no way of knowing which one is correct, only that there is a potential problem.
Some raid (Sun ZFS, for instance) stores a checksum with every block so it can detect corruption immediately. Also, I know ZFS does this scrubbing.
Thank you John. I'm pretty sure that raid 5 or 6 would be safe, since there is parity checking, however it sounds like this may not be the case for raid 1.
Over the years, one of the things i've noticed is that whenever I've seen a SCSI drive fail, there have always been hardware exceptions from the drive. With less expensive ATA drives (I can't say weather this includes current generation SATA drives), I have seen quite a few drives fail in such a way that they simply returned unreliable data (you could read the same data twice and get 2 different results) without raising any hardware exceptions.
So for this reason, I have been reluctant to use inexpensive SATA drives, unless there is an easy way to know that the drive is returning accurate data.
As my data has grown, I've been more challenged to come up with affordable backup solutions that I feel confident in. Recently I started backing up to a pair of usb/esata terabyte drives with 2 .5 terrabyte partitions on each drive, running software raid 1 across the two partitions.
Note that I am mostly talking about backups here. My primary data is mostly on RAID5 or RAID10 U350 SCSI arrays.
Nataraj
Nataraj wrote:
Thank you John. I'm pretty sure that raid 5 or 6 would be safe, since there is parity checking, however it sounds like this may not be the case for raid 1.
the parity on raid5/6 is only checked if you run some sort of scrub, and its used to regenerate a failed drive onto a spare or replacement... in normal operation, reads will be distributed to the individual drives that contain the blocks in question.
Nataraj wrote:
Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.
Thanks, Nataraj
I've been thinking about this as well.
Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action
This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question.
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .
HTH a bit,
Kay
Kay Diederichs wrote:
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .
except, thats wrong. I unwrapped a recent kernel source tarball from kernel.org and found...
static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector) { struct mirror *m = get_default_mirror(ms);
do { if (likely(!atomic_read(&m->error_count))) return m;
if (m-- == ms->mirror) m += ms->nr_mirrors; } while (m != get_default_mirror(ms));
return NULL; }
so it appears its a round robin ...
On Sun, 2008-09-21 at 12:53 -0700, John R Pierce wrote:
Kay Diederichs wrote:
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .
except, thats wrong. I unwrapped a recent kernel source tarball from kernel.org and found...
static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector) { struct mirror *m = get_default_mirror(ms);
do { if (likely(!atomic_read(&m->error_count))) return m; if (m-- == ms->mirror) m += ms->nr_mirrors; } while (m != get_default_mirror(ms)); return NULL;
}
so it appears its a round robin ... _______________________________________________
This makes sense. I'm pretty sure that tests that I've run in the past using bonnie++ or iozone showed faster reads with raid1 than with a single drive. I would think that if the drives are on seperate controllers (and depending upon the performance/capacity of the drives and controllers), there could be notable improvements.
Nataraj
This makes sense. I'm pretty sure that tests that I've run in the past using bonnie++ or iozone showed faster reads with raid1 than with a single drive. I would think that if the drives are on seperate controllers (and depending upon the performance/capacity of the drives and controllers), there could be notable improvements.
with SATA or SAS, of course, every drive is on its own channel. even with PATA, at 100 or 133Mbyte/sec, only the fastest newer drives would saturate the bus doing two transfers concurrently.
now, after i wrote what I did above, I dug up the kernel.org 2.6.18 kernel that RHEL/CentOS 5 is based on, and it still had the older code sequence as shown in that 'to do' list entry... but I didn't run the RHEL patch sequences against it, its quite possible RHEL retrofitted this patch to it.
On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote:
Nataraj wrote:
Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.
Thanks, Nataraj
I've been thinking about this as well.
Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action
This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question.
BTW, there is - even with current kernels - no speed gain in using RAID1
HTH a bit,
Kay
Hi Kay,
From reading the following url:
http://linux-raid.osdl.org/index.php/RAID_Administration
my understanding is that if repair detects a read error on one of the drives, and sucessfully reads the corresponding data from the other drive, then it will attempt to rewrite those blocks on the drive that got the read error. It looks like it may do this even if you only run check. I don't think it can repair when there is a data discrepency without the hardware returning an error. This is primarily what brings up my concern over sata drives, because I think the hardware error detection is inferior to SCSI or SAS drives.
Nataraj
Kay Diederichs wrote:
Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action
This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question.
Thanks for posting this. I have a machine that periodically had filesystem errors on a RAID1 volume that I eventually found were caused by bad RAM but even after replacing it I'd still see filesystem problems reappear every few weeks. It turned out that there were quite a few mismatched blocks between the mirrors and the fsck passes must have sometimes seen the good copy but subsequently the still-bad alternate would be used. Now I've done a repair and fsck and so far everything seems stable. It's hard to tell with problems that only happen once or twice a month, though. I suppose I have some files with corrupt contents on there but they are backups that will expire as more current ones are saved anyway.
BTW, there is - even with current kernels - no speed gain in using RAID1
I don't think I believe that - you can see the reads alternating drives by watching the lights.
2008/9/30 Les Mikesell lesmikesell@gmail.com:
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .
I don't think I believe that - you can see the reads alternating drives by watching the lights.
Indeed, there is a patch linux-2.6-dm-mirroring.patch in Centos5.2 kernel sources which implements a proper body of choose-mirror() function.
Alexander Georgiev wrote:
2008/9/30 Les Mikesell lesmikesell@gmail.com:
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .
I don't think I believe that - you can see the reads alternating drives by watching the lights.
Indeed, there is a patch linux-2.6-dm-mirroring.patch in Centos5.2 kernel sources which implements a proper body of choose-mirror() function.
Which also explains why, once my mirror was corrupt, that a new problem would show up every few weeks even after the cause (bad RAM) was fixed.
On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote:
Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action
Interesting. I'll give this a go on my own desktop system which is running RAID 1.
You said above, "When this finishes...", but how do you know the check is completed? I saw this in /var/log/messages:
Oct 1 11:02:47 ranbir kernel: md: data-check of RAID array md0 Oct 1 11:02:47 ranbir kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Oct 1 11:02:47 ranbir kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Oct 1 11:02:47 ranbir kernel: md: using 128k window, over a total of 104320 blocks. Oct 1 11:02:48 ranbir kernel: md: md0: data-check done. Oct 1 11:02:48 ranbir kernel: RAID1 conf printout: Oct 1 11:02:48 ranbir kernel: --- wd:2 rd:2 Oct 1 11:02:48 ranbir kernel: disk 0, wo:0, o:1, dev:sda1 Oct 1 11:02:48 ranbir kernel: disk 1, wo:0, o:1, dev:sdb1
There was nothing else after the last line. I don't know exactly what the "disk" lines mean.
Regards,
Ranbir
On Wed, 2008-10-01 at 12:09 -0400, Toby Bluhm wrote:
cat /proc/mdstat gives progress
cat /sys/block/md0/md/sync_action gives current mode
Of course! I guess when I ran the check on md0, it finished before I had the opportunity to watch the progress, so I wasn't sure what to check.
Also, I just noticed this:
Oct 1 11:02:48 ranbir kernel: md: md0: data-check done.
Whoops! It was right there in the log, and I completely missed it.
Regards,
Ranbir
On Sun, Sep 21, 2008 at 10:26 AM, Nataraj incoming-centos@rjl.com wrote:
Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in,
No. Reads are distributed over disks to increase performance.
is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.
Depends on the type of error. However, the sad thing is, if you use 3 disks for raid 1 the kernel does not do the right thing. Let me explain.
Say you have 3 disks in a raid 1 array. If there is a mismatch then the smart thing to do would be to take a vote of the 3 disks. 2 out of 3 wins (assuming they are not all different). The odd man out should be corrected (if possible). But what actually happens is the highest numbered disk is copied to the others.
I haven't looked at the latest kernel code but if this http://linas.org/linux/raid.html is correct then I think the kernel maintainers should address this issue. I don't think it would be hard to implement.