[CentOS] 40TB File System Recommendations

Thu Apr 14 15:26:41 UTC 2011
Ross Walker <rswwalker at gmail.com>

2011/4/14 Peter Kjellström <cap at nsc.liu.se>:
> On Tuesday, April 12, 2011 02:56:54 PM rainer at ultra-secure.de wrote:
> ...
>> > Steve,
>> > I'm managing machines with 30TB of storage for more then two years. And
>> > with
>> > good reporting and reaction we have never had to run fsck.
>>
>> That's not the issue.
>> The issue is rebuild-time.
>> The longer it takes, the more likely is another failure in the array.
>> With RAID6, this does not instantly kill your RAID, as with RAID5 - but I
>> assume it will further decrease overall-performance and the rebuild-time
>> will go up significantly - adding the the risk.
>
> While I do concede the obvious point regarding rebuild time (raid6 takes from
> long to very long to rebuild) I'd like to point out:
>
>  * If you do the math for a 12 drive raid10 vs raid6 then (using actual data
> from ~500 1T drives on HP cciss controllers during two years) raid10 is ~3x
> more likely to cause hard data loss than raid6.
>
>  * mtbf is not everything there's also the thing called unrecoverable read
> errors. If you hit one while rebuilding your raid10 you're toast while in the
> raid6 case you'll use your 2nd parity and continue the rebuild.

You mean if the other side of the mirror fails while rebuilding it.
Yes this is true, of course if this happens with RAID6 it will rebuild
from parity IF there is a second hotspare available, cause remember
the first failure wasn't cleared before the second failure occurred.
Now your RAID6 is in severe degraded state, one more failure before
either of these disks is rebuilt will mean toast for the array. Now
the performance of the array is practically unusable and the load on
the disks is high as it does a full recalculation rebuild, and if they
are large it will be high for a very long time, now if any other disk
in the very large RAID6 array is near failure, or has a bad sector,
this taxing load could very well push it over the edge and the risk of
such an event occurring increases with the size of the array and the
size of the disk surface.

I think this is where the mdraid raid10 shines because it can have 3
copies (or more) of the data instead of just two, of course a three
times (or more) the cost. It also allows for uneven number of disks as
it just saves copies on different spindles rather then "mirrors". This
I think provides the best protection against failure and the best
performance, but at the worst cost, but with 2TB and 4TB disks coming
out it may very well be worth it as the cost per-GB drives lower and
lower and one can get 12TB of raw storage out of only 4 platters,
imagine 12 platters, I wouldn't mind getting 16TB out of 48TB of raw
if it costs me less then what 16TB of raw cost me just 2 years ago,
especially if it means I get both performance and reliability.

> /Peter (who runs many 12 drive raid6 systems just fine)
>
>> Thus, it's generally advisable to do just use RAID10 (in this case, a
>> thin-striped array of RAID1-arrays).

It is not advisable to use any level of RAID.

The RAID level is determined by the needs of the application vs the
risks of the RAID level vs the risks of the storage technology.

-Ross