Slightly OT.......<br><br>Opensolaris has just had triple parity raid (raidz3) added to ZFS;<br><br><br><a href="http://blogs.sun.com/ahl/entry/triple_parity_raid_z">http://blogs.sun.com/ahl/entry/triple_parity_raid_z</a><br>

<br><br>Pity we can't get an in kernel version of ZFS for linux. <br><br><br><br><br><div class="gmail_quote">On Thu, Oct 1, 2009 at 12:41 PM, Stephen Harris <span dir="ltr"><<a href="mailto:lists@spuddy.org">lists@spuddy.org</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">On Wed, Sep 30, 2009 at 08:52:08PM -0500, Johnny Hughes wrote:<br>

> On 09/24/2009 07:35 AM, Rainer Duffner wrote:<br>

<br>

</div><div class="im">> > Well, it depends on the disk-size:<br>

> > <a href="http://www.enterprisestorageforum.com/technology/features/article.php/3839636" target="_blank">http://www.enterprisestorageforum.com/technology/features/article.php/3839636</a><br>

><br>

> This info is VERY relevant ... you will almost ALWAYS have a failure on<br>

> rebuild with very large RAID 5 arrays.  Since that is a fault in a<br>

> second drive, that failure will cause the loss of all the data.  I would<br>

> not recommend RAID 5 right now ... it is not worth the risk.<br>

<br>

</div>"Almost always" is very dependent on the disks and size of the array.<br>

<br>

Let's take a 20TiByte array as an example.<br>

<br>

Now, the "hard error rate" is an expectation.  That means that with<br>

an error rate of 1E14 then you'd expect to see 1 error for every 1E14<br>

bits read.  If we make the simplifying assumption of any read being<br>

equally likely to fail then any single bit read has a 1/1E14 chance of<br>

being wrong.  (see end of email for more thoughts on this).<br>

<br>

Now to rebuild a 20Tibyte array you would need to read 20Tibytes<br>

of data.  The chance of this happening without error is:<br>

    (1-1/1E14)^(8*20*2^40) = 0.172<br>

ie only 17% of rebuilding a 20TiByte array!  That's pretty bad.  In<br>

fact it's downright awful.  Do not build 20TiByte arrays with consumer<br>

disks!<br>

<br>

Note that this doesn't care about the size of the disks nor the number<br>

of disks; it's purely based on probability of read error.<br>

<br>

Now an "enterprise" class disk with an error rate of 1E15 looks better:<br>

    (1-1/1E15)^(8*10*2^40) = 0.838<br>

or 84% chance of successful rebuild.   Better.  But probably not good<br>

enough.<br>

<br>

How about an Enterprise SAS disk at 1E16<br>

    (1-1/1E16)^(8*12.5*2^40) = 0.981 or 98%<br>

Not "five nines", but pretty good.<br>

<br>

Of course you're never going to get 100%.  Technology just doesn't work<br>

that way.<br>

<br>

So, if you buy Enterprise SAS disks then you do stand a good chance<br>

of rebuilding a 20TiByte Raid 5.  A 2% chance of a double-failure.<br>

Do you want to risk your company on that?<br>

<br>

RAID6 makes things better; you need a triple failure to cause data loss.<br>

It's possible, but the numbers are a lot lower.<br>

<br>

Of course the error rate and other disk characteristics are actually WAGs<br>

based on some statistical analysis.  There's no actual measurements to<br>

show this.<br>

<br>

Real life numbers appear to show that disks far outlive their expected<br>

values.  Error rates are much lower than manufacturer claims (excluding<br>

bad batches and bad manufacturing, of course!)<br>

<br>

This is just a rough "off my head" analysis.  I'm not totally convinced<br>

it's correct (my understanding of error rate could be wrong; the<br>

assumption of even failure distribution is likely to be wrong because<br>

errors on a disk cluster - a sector is bad, a track is bad etc).  But the<br>

analysis _feels_ right... which means nothing :-)<br>

<br>

I currently have 5*1Tbyte consumer disks in a RAID5.  That, theoretically,<br>

gives me a 27% chance of failure during a rebuild.  As it happens I've had<br>

2 bad disks, but they went bad a month apart (I think it is a bad batch!).<br>

Each time the array has rebuilt without detectable error.<br>

<br>

Let's not even talk about Petabyte arrays.  If you're doing that then<br>

you better have multiple redundancy in place, and **** the expense!<br>

Google is a great example of this.<br>

<br>

--<br>

<br>

rgds<br>

<font color="#888888">Stephen<br>

</font><div><div></div><div class="h5">_______________________________________________<br>

CentOS mailing list<br>

<a href="mailto:CentOS@centos.org">CentOS@centos.org</a><br>

<a href="http://lists.centos.org/mailman/listinfo/centos" target="_blank">http://lists.centos.org/mailman/listinfo/centos</a><br>

</div></div></blockquote></div><br>