[CentOS] OT: What's wrong with RAID5

Thu Oct 1 03:57:02 UTC 2009

Slightly OT.......

Opensolaris has just had triple parity raid (raidz3) added to ZFS;

http://blogs.sun.com/ahl/entry/triple_parity_raid_z

Pity we can't get an in kernel version of ZFS for linux.

On Thu, Oct 1, 2009 at 12:41 PM, Stephen Harris <lists at spuddy.org> wrote:

> On Wed, Sep 30, 2009 at 08:52:08PM -0500, Johnny Hughes wrote:
> > On 09/24/2009 07:35 AM, Rainer Duffner wrote:
>
> > > Well, it depends on the disk-size:
> > >
> http://www.enterprisestorageforum.com/technology/features/article.php/3839636
> >
> > This info is VERY relevant ... you will almost ALWAYS have a failure on
> > rebuild with very large RAID 5 arrays.  Since that is a fault in a
> > second drive, that failure will cause the loss of all the data.  I would
> > not recommend RAID 5 right now ... it is not worth the risk.
>
> "Almost always" is very dependent on the disks and size of the array.
>
> Let's take a 20TiByte array as an example.
>
> Now, the "hard error rate" is an expectation.  That means that with
> an error rate of 1E14 then you'd expect to see 1 error for every 1E14
> bits read.  If we make the simplifying assumption of any read being
> equally likely to fail then any single bit read has a 1/1E14 chance of
> being wrong.  (see end of email for more thoughts on this).
>
> Now to rebuild a 20Tibyte array you would need to read 20Tibytes
> of data.  The chance of this happening without error is:
>    (1-1/1E14)^(8*20*2^40) = 0.172
> ie only 17% of rebuilding a 20TiByte array!  That's pretty bad.  In
> fact it's downright awful.  Do not build 20TiByte arrays with consumer
> disks!
>
> Note that this doesn't care about the size of the disks nor the number
> of disks; it's purely based on probability of read error.
>
> Now an "enterprise" class disk with an error rate of 1E15 looks better:
>    (1-1/1E15)^(8*10*2^40) = 0.838
> or 84% chance of successful rebuild.   Better.  But probably not good
> enough.
>
> How about an Enterprise SAS disk at 1E16
>    (1-1/1E16)^(8*12.5*2^40) = 0.981 or 98%
> Not "five nines", but pretty good.
>
> Of course you're never going to get 100%.  Technology just doesn't work
> that way.
>
> So, if you buy Enterprise SAS disks then you do stand a good chance
> of rebuilding a 20TiByte Raid 5.  A 2% chance of a double-failure.
> Do you want to risk your company on that?
>
> RAID6 makes things better; you need a triple failure to cause data loss.
> It's possible, but the numbers are a lot lower.
>
> Of course the error rate and other disk characteristics are actually WAGs
> based on some statistical analysis.  There's no actual measurements to
> show this.
>
> Real life numbers appear to show that disks far outlive their expected
> values.  Error rates are much lower than manufacturer claims (excluding
> bad batches and bad manufacturing, of course!)
>
> This is just a rough "off my head" analysis.  I'm not totally convinced
> it's correct (my understanding of error rate could be wrong; the
> assumption of even failure distribution is likely to be wrong because
> errors on a disk cluster - a sector is bad, a track is bad etc).  But the
> analysis _feels_ right... which means nothing :-)
>
> I currently have 5*1Tbyte consumer disks in a RAID5.  That, theoretically,
> gives me a 27% chance of failure during a rebuild.  As it happens I've had
> 2 bad disks, but they went bad a month apart (I think it is a bad batch!).
> Each time the array has rebuilt without detectable error.
>
> Let's not even talk about Petabyte arrays.  If you're doing that then
> you better have multiple redundancy in place, and **** the expense!
> Google is a great example of this.
>
> --
>
> rgds
> Stephen
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20091001/04e8c35d/attachment.html>