[CentOS] OT: What's wrong with RAID5

Thu Oct 1 02:41:42 UTC 2009
Stephen Harris <lists at spuddy.org>

On Wed, Sep 30, 2009 at 08:52:08PM -0500, Johnny Hughes wrote:
> On 09/24/2009 07:35 AM, Rainer Duffner wrote:

> > Well, it depends on the disk-size:
> > http://www.enterprisestorageforum.com/technology/features/article.php/3839636
> 
> This info is VERY relevant ... you will almost ALWAYS have a failure on
> rebuild with very large RAID 5 arrays.  Since that is a fault in a
> second drive, that failure will cause the loss of all the data.  I would
> not recommend RAID 5 right now ... it is not worth the risk.

"Almost always" is very dependent on the disks and size of the array.

Let's take a 20TiByte array as an example.

Now, the "hard error rate" is an expectation.  That means that with
an error rate of 1E14 then you'd expect to see 1 error for every 1E14
bits read.  If we make the simplifying assumption of any read being
equally likely to fail then any single bit read has a 1/1E14 chance of
being wrong.  (see end of email for more thoughts on this).

Now to rebuild a 20Tibyte array you would need to read 20Tibytes
of data.  The chance of this happening without error is:
    (1-1/1E14)^(8*20*2^40) = 0.172
ie only 17% of rebuilding a 20TiByte array!  That's pretty bad.  In
fact it's downright awful.  Do not build 20TiByte arrays with consumer
disks!

Note that this doesn't care about the size of the disks nor the number
of disks; it's purely based on probability of read error.

Now an "enterprise" class disk with an error rate of 1E15 looks better:
    (1-1/1E15)^(8*10*2^40) = 0.838
or 84% chance of successful rebuild.   Better.  But probably not good
enough.

How about an Enterprise SAS disk at 1E16
    (1-1/1E16)^(8*12.5*2^40) = 0.981 or 98%
Not "five nines", but pretty good.

Of course you're never going to get 100%.  Technology just doesn't work
that way.

So, if you buy Enterprise SAS disks then you do stand a good chance
of rebuilding a 20TiByte Raid 5.  A 2% chance of a double-failure.
Do you want to risk your company on that?

RAID6 makes things better; you need a triple failure to cause data loss.
It's possible, but the numbers are a lot lower.

Of course the error rate and other disk characteristics are actually WAGs
based on some statistical analysis.  There's no actual measurements to
show this.

Real life numbers appear to show that disks far outlive their expected
values.  Error rates are much lower than manufacturer claims (excluding
bad batches and bad manufacturing, of course!)

This is just a rough "off my head" analysis.  I'm not totally convinced
it's correct (my understanding of error rate could be wrong; the
assumption of even failure distribution is likely to be wrong because
errors on a disk cluster - a sector is bad, a track is bad etc).  But the
analysis _feels_ right... which means nothing :-)

I currently have 5*1Tbyte consumer disks in a RAID5.  That, theoretically,
gives me a 27% chance of failure during a rebuild.  As it happens I've had
2 bad disks, but they went bad a month apart (I think it is a bad batch!).
Each time the array has rebuilt without detectable error.

Let's not even talk about Petabyte arrays.  If you're doing that then
you better have multiple redundancy in place, and **** the expense!
Google is a great example of this.

-- 

rgds
Stephen