question about software Raid 1

List overview All Threads
Download

newer

older

mysql

setting timezone from kickstart

Nataraj

21 Sep 2008 21 Sep '08

5:26 p.m.

Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.

Thanks, Nataraj

Show replies by date

John R Pierce

21 Sep 21 Sep

6:12 p.m.

Nataraj wrote:

...

Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.

under normal operation, each read request goes to one or the other drive, this doubles the read throughput as both drives can be servicing different read requests at the same time.

some raid does a scrub, where in the background, when the disks are otherwise idle, it gradually reads all the raid stripes and validates them. I honestly don't know if Linux built in raid does this or not. Of course, with RAID-1, if the two blocks disagree, there's no way of knowing which one is correct, only that there is a potential problem.

Some raid (Sun ZFS, for instance) stores a checksum with every block so it can detect corruption immediately. Also, I know ZFS does this scrubbing.

Nataraj

6:46 p.m.

On Sun, 2008-09-21 at 11:12 -0700, John R Pierce wrote:

...

Nataraj wrote:

...
Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.

under normal operation, each read request goes to one or the other drive, this doubles the read throughput as both drives can be servicing different read requests at the same time.

some raid does a scrub, where in the background, when the disks are otherwise idle, it gradually reads all the raid stripes and validates them. I honestly don't know if Linux built in raid does this or not. Of course, with RAID-1, if the two blocks disagree, there's no way of knowing which one is correct, only that there is a potential problem.

Some raid (Sun ZFS, for instance) stores a checksum with every block so it can detect corruption immediately. Also, I know ZFS does this scrubbing.

Thank you John. I'm pretty sure that raid 5 or 6 would be safe, since there is parity checking, however it sounds like this may not be the case for raid 1.

Over the years, one of the things i've noticed is that whenever I've seen a SCSI drive fail, there have always been hardware exceptions from the drive. With less expensive ATA drives (I can't say weather this includes current generation SATA drives), I have seen quite a few drives fail in such a way that they simply returned unreliable data (you could read the same data twice and get 2 different results) without raising any hardware exceptions.

So for this reason, I have been reluctant to use inexpensive SATA drives, unless there is an easy way to know that the drive is returning accurate data.

As my data has grown, I've been more challenged to come up with affordable backup solutions that I feel confident in. Recently I started backing up to a pair of usb/esata terabyte drives with 2 .5 terrabyte partitions on each drive, running software raid 1 across the two partitions.

Note that I am mostly talking about backups here. My primary data is mostly on RAID5 or RAID10 U350 SCSI arrays.

Nataraj

John R Pierce

7:06 p.m.

Nataraj wrote:

...

Thank you John. I'm pretty sure that raid 5 or 6 would be safe, since there is parity checking, however it sounds like this may not be the case for raid 1.

the parity on raid5/6 is only checked if you run some sort of scrub, and its used to regenerate a failed drive onto a spare or replacement... in normal operation, reads will be distributed to the individual drives that contain the blocks in question.

Kay Diederichs

7:01 p.m.

Nataraj wrote:

...

Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.

Thanks, Nataraj

I've been thinking about this as well.

Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action

This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question.

BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

HTH a bit,

Kay

John R Pierce

7:53 p.m.

Kay Diederichs wrote:

...

BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

except, thats wrong. I unwrapped a recent kernel source tarball from kernel.org and found...

static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector) { struct mirror *m = get_default_mirror(ms);

do { if (likely(!atomic_read(&m->error_count))) return m;

if (m-- == ms->mirror) m += ms->nr_mirrors; } while (m != get_default_mirror(ms));

return NULL; }

so it appears its a round robin ...

Nataraj

8:55 p.m.

On Sun, 2008-09-21 at 12:53 -0700, John R Pierce wrote:

...

Kay Diederichs wrote:

...
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

except, thats wrong. I unwrapped a recent kernel source tarball from kernel.org and found...

static struct mirror *choose_mirror(struct mirror_set *ms, sector_t sector) { struct mirror *m = get_default_mirror(ms);
    do {
            if (likely(!atomic_read(&m->error_count)))
                    return m;

            if (m-- == ms->mirror)
                    m += ms->nr_mirrors;
    } while (m != get_default_mirror(ms));

    return NULL;
}

so it appears its a round robin ... _______________________________________________

This makes sense. I'm pretty sure that tests that I've run in the past using bonnie++ or iozone showed faster reads with raid1 than with a single drive. I would think that if the drives are on seperate controllers (and depending upon the performance/capacity of the drives and controllers), there could be notable improvements.

Nataraj

John R Pierce

11:59 p.m.

...

This makes sense. I'm pretty sure that tests that I've run in the past using bonnie++ or iozone showed faster reads with raid1 than with a single drive. I would think that if the drives are on seperate controllers (and depending upon the performance/capacity of the drives and controllers), there could be notable improvements.

with SATA or SAS, of course, every drive is on its own channel. even with PATA, at 100 or 133Mbyte/sec, only the fastest newer drives would saturate the bus doing two transfers concurrently.

now, after i wrote what I did above, I dug up the kernel.org 2.6.18 kernel that RHEL/CentOS 5 is based on, and it still had the older code sequence as shown in that 'to do' list entry... but I didn't run the RHEL patch sequences against it, its quite possible RHEL retrofitted this patch to it.

Nataraj

8:22 p.m.

On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote:

...

Nataraj wrote:

...
Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in, is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.

Thanks, Nataraj

I've been thinking about this as well.

Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action

This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question.

BTW, there is - even with current kernels - no speed gain in using RAID1

see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

HTH a bit,

Kay

Hi Kay,

...

From reading the following url:

http://linux-raid.osdl.org/index.php/RAID_Administration

my understanding is that if repair detects a read error on one of the drives, and sucessfully reads the corresponding data from the other drive, then it will attempt to rewrite those blocks on the drive that got the read error. It looks like it may do this even if you only run check. I don't think it can repair when there is a data discrepency without the hardware returning an error. This is primarily what brings up my concern over sata drives, because I think the hardware error detection is inferior to SCSI or SAS drives.

Nataraj

Les Mikesell

30 Sep 30 Sep

7:43 p.m.

Kay Diederichs wrote:

...

Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action

This applies to at least RAID1 and RAID5. At this point the question arises: how does the "repair job" know which copy is the correct one? I have no answer to this question.

Thanks for posting this. I have a machine that periodically had filesystem errors on a RAID1 volume that I eventually found were caused by bad RAM but even after replacing it I'd still see filesystem problems reappear every few weeks. It turned out that there were quite a few mismatched blocks between the mirrors and the fsck passes must have sometimes seen the good copy but subsequently the still-bad alternate would be used. Now I've done a repair and fsck and so far everything seems stable. It's hard to tell with problems that only happen once or twice a month, though. I suppose I have some files with corrupt contents on there but they are backups that will expire as more current ones are saved anyway.

...

BTW, there is - even with current kernels - no speed gain in using RAID1

see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

I don't think I believe that - you can see the reads alternating drives by watching the lights.

-- Les Mikesell lesmikesell@gmail.com

Alexander Georgiev

1 Oct 1 Oct

12:24 p.m.

2008/9/30 Les Mikesell lesmikesell@gmail.com:

...

...
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

I don't think I believe that - you can see the reads alternating drives by watching the lights.

Indeed, there is a patch linux-2.6-dm-mirroring.patch in Centos5.2 kernel sources which implements a proper body of choose-mirror() function.

Les Mikesell

12:39 p.m.

Alexander Georgiev wrote:

...

2008/9/30 Les Mikesell lesmikesell@gmail.com:

...
...
BTW, there is - even with current kernels - no speed gain in using RAID1 - see http://kernelnewbies.org/KernelProjects/Raid1ReadBalancing .

I don't think I believe that - you can see the reads alternating drives by watching the lights.

Indeed, there is a patch linux-2.6-dm-mirroring.patch in Centos5.2 kernel sources which implements a proper body of choose-mirror() function.

Which also explains why, once my mirror was corrupt, that a new problem would show up every few weeks even after the cause (bad RAM) was fixed.

-- Les Mikesell lesmikesell@gmail.com

Kanwar Ranbir Sandhu

3:11 p.m.

On Sun, 2008-09-21 at 21:01 +0200, Kay Diederichs wrote:

...

Fact is that with CentOS-5 kernels (but not with CentOS-4, as this functionality became available in kernel 2.6.17) you could (or rather _should_ regularly) echo check > /sys/block/mdX/md/sync_action to check agreement between the two (or more) copies. When this finishes, /sys/block/mdX/md/mismatch_cnt shows you the number of mismatches. You can fix these with echo repair > /sys/block/mdX/md/sync_action

Interesting. I'll give this a go on my own desktop system which is running RAID 1.

You said above, "When this finishes...", but how do you know the check is completed? I saw this in /var/log/messages:

Oct 1 11:02:47 ranbir kernel: md: data-check of RAID array md0 Oct 1 11:02:47 ranbir kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Oct 1 11:02:47 ranbir kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Oct 1 11:02:47 ranbir kernel: md: using 128k window, over a total of 104320 blocks. Oct 1 11:02:48 ranbir kernel: md: md0: data-check done. Oct 1 11:02:48 ranbir kernel: RAID1 conf printout: Oct 1 11:02:48 ranbir kernel: --- wd:2 rd:2 Oct 1 11:02:48 ranbir kernel: disk 0, wo:0, o:1, dev:sda1 Oct 1 11:02:48 ranbir kernel: disk 1, wo:0, o:1, dev:sdb1

There was nothing else after the last line. I don't know exactly what the "disk" lines mean.

Regards,

Ranbir

nate

4:08 p.m.

Kanwar Ranbir Sandhu wrote:

...

You said above, "When this finishes...", but how do you know the check is completed? I saw this in /var/log/messages:

cat /proc/mdstat ? That at least shows status of RAID rebuilds, not sure about other types of tasks.

nate

Toby Bluhm

4:09 p.m.

Kanwar Ranbir Sandhu wrote: . . .

...

You said above, "When this finishes...", but how do you know the check is completed? I saw this in /var/log/messages:

cat /proc/mdstat gives progress

cat /sys/block/md0/md/sync_action gives current mode

-tkb

Kanwar Ranbir Sandhu

11:42 p.m.

On Wed, 2008-10-01 at 12:09 -0400, Toby Bluhm wrote:

...

cat /proc/mdstat gives progress

cat /sys/block/md0/md/sync_action gives current mode

Of course! I guess when I ran the check on md0, it finished before I had the opportunity to watch the progress, so I wasn't sure what to check.

Also, I just noticed this:

Oct 1 11:02:48 ranbir kernel: md: md0: data-check done.

Whoops! It was right there in the log, and I completely missed it.

Regards,

Ranbir

Robert Arkiletian

21 Sep 21 Sep

9:34 p.m.

On Sun, Sep 21, 2008 at 10:26 AM, Nataraj incoming-centos@rjl.com wrote:

...

Does software raid 1 compare checksums or otherwise verify that the same bits are coming from both disks during reads? What I'm interested in,

No. Reads are distributed over disks to increase performance.

...

is whether bit errors that were somehow undetected by the hardware would be detected by the raid 1 software.

Depends on the type of error. However, the sad thing is, if you use 3 disks for raid 1 the kernel does not do the right thing. Let me explain.

Say you have 3 disks in a raid 1 array. If there is a mismatch then the smart thing to do would be to take a vote of the 3 disks. 2 out of 3 wins (assuming they are not all different). The odd man out should be corrected (if possible). But what actually happens is the highest numbered disk is copied to the others.

I haven't looked at the latest kernel code but if this http://linas.org/linux/raid.html is correct then I think the kernel maintainers should address this issue. I don't think it would be hard to implement.

-- Robert Arkiletian Eric Hamber Secondary, Vancouver, Canada Fl_TeacherTool http://www3.telus.net/public/robark/Fl_TeacherTool/ C++ GUI tutorial http://www3.telus.net/public/robark/

6261

Age (days ago)

6271

Last active (days ago)

discuss@lists.centos.org

16 comments

9 participants

tags (0)

participants (9)

Alexander Georgiev
John R Pierce
Kanwar Ranbir Sandhu
Kay Diederichs
Les Mikesell
Nataraj
nate
Robert Arkiletian
Toby Bluhm