On Thursday, April 14, 2011 05:26:41 PM Ross Walker wrote: > 2011/4/14 Peter Kjellström <cap at nsc.liu.se>: ... > > While I do concede the obvious point regarding rebuild time (raid6 takes > > from long to very long to rebuild) I'd like to point out: > > > > * If you do the math for a 12 drive raid10 vs raid6 then (using actual > > data from ~500 1T drives on HP cciss controllers during two years) > > raid10 is ~3x more likely to cause hard data loss than raid6. > > > > * mtbf is not everything there's also the thing called unrecoverable > > read errors. If you hit one while rebuilding your raid10 you're toast > > while in the raid6 case you'll use your 2nd parity and continue the > > rebuild. > > You mean if the other side of the mirror fails while rebuilding it. No, the drive (unrecoverably) failing to read a sector is not the same thing as a drive failure. Drive failure frequency expressed in mtbf is around 1M hours (even though including predictive fail we see more like 250K hours). Unrecoverable read error rate (per sector) was quite recently on the order of 1x to 10x of the drive size (a drive I looked up now was spec'ed alot higher at ~1000x drive size). If we assume a raid10 rebuild time of 12h and an unrecoverable read error once every 10x of drive size then the effective mean time between read error is 120h (two to ten thousand times worse than the drive mtbf). Admittedly these numbers are hard to get and equally hard to trust (or double check). What it all comes down to is that raid10 (assuming just double- not tripple copy) stores your data with one extra copy/parity and in a single drive failure scenario you have zero extra data left (on that part of the array). That is, you depend on each and every bit of that (meaning the degraded part) data being correctly read. This means you very much want both: 1) Very fast rebuilds (=> you need hot-spare) 2) An unrecoverable read error rate much larger than your drive size or as you suggest below: 3) Tripple copy > Yes this is true, of course if this happens with RAID6 it will rebuild > from parity IF there is a second hotspare available, This is wrong, hot-spares are not that necessary when using raid6. This has to do with the fact that rebuild times (time from you start being vulnerable to whatever rebuild completes) are already long. An added 12h for a tech to swap in the spare only marginally increases your risks. > cause remember > the first failure wasn't cleared before the second failure occurred. > Now your RAID6 is in severe degraded state, one more failure before > either of these disks is rebuilt will mean toast for the array. All of this was taken into account in my original example above. In the end (with my data) raid10 was around 3x more likely to cause ultimate data loss than raid6. > Now > the performance of the array is practically unusable and the load on > the disks is high as it does a full recalculation rebuild, and if they > are large it will be high for a very long time, now if any other disk > in the very large RAID6 array is near failure, or has a bad sector, > this taxing load could very well push it over the edge In my example a 12 drive raid6 rebuild takes 6-7 days this works out to < 5 MB/s seq read per drive. This added load is not very noticeable in our environment (taking into account normal patrol reads and user data traffic). Either way, the general problem of "[rebuild stress] pushing drives over the edge" is a larger threat to raid10 than raid6 (it being fatal in the first case...). > and the risk of > such an event occurring increases with the size of the array and the > size of the disk surface. > > I think this is where the mdraid raid10 shines because it can have 3 > copies (or more) of the data instead of just two, I think we've now moved into what most people would call unreasonable. Let's see what we have for a 12 drive box (quite common 2U size): raid6: 12x on raid6 no hot spare (see argument above) => 10 data drives raid10: 11x tripple store on raid10 one spare => 3.66 data drives or (if your raid's not odd-drive capable): raid10: 9x tripple store on raid10 one to three spares => 3 data drives (ok, yes you could get 4 data drives out of it if you skipped hot-spare) That is almost a 2.7x-3.3x diff! My users sure care if their X $ results in 1/3 the space (or cost => 3x for the same space if you prefer). On top of this most raid implementations for raid10 lacks tripple copy functionality. Also note that raid10 that allows for odd number of drives is more vulnerable to 2nd drive failures resulting in an even larger than 3x improvement using raid6 (vs double copy odd drive handling raid10). /Peter > of course a three > times (or more) the cost. It also allows for uneven number of disks as > it just saves copies on different spindles rather then "mirrors". This > I think provides the best protection against failure and the best > performance, but at the worst cost, but with 2TB and 4TB disks coming > out ... -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <http://lists.centos.org/pipermail/centos/attachments/20110415/caef7c75/attachment-0005.sig>