[CentOS] RAID6 in production?

Thu Oct 20 00:00:38 UTC 2005
Joshua Baker-LePain <jlb17 at duke.edu>

Is anyone using RAID6 in production?  In moving from hardware RAID on my dual
3ware 7500-8 based systems to md, I decided I'd like to go with RAID6 
(since md is less tolerant of marginal drives than is 3ware).  I did some 
benchmarking and was getting decent speeds with a 128KiB chunksize.

So the next step was failure testing.  First, I fired off memtest.sh as 
found at <http://people.redhat.com/dledford/memtest.html>.  Then, I did 
'mdadm /dev/md0 -f /dev/sdo1', and it started to rebuild as it should.  I 
cranked up /proc/sys/dev/raid/speed_limit_min to 15000 so that it would 
reconstruct in a decent amount of time (the default of 1000 was leading to 
a 53 hour estimate for the recovery).

But memtest.sh started kicking out errors (non-matching diffs).  And then 
I got this:

EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted  journal
Remounting filesystem read-only
attempt to access beyond end of device
md0: rw=0, want=28987566088, limit=4595422208
attempt to access beyond end of device
md0: rw=0, want=28987566088, limit=4595422208
attempt to access beyond end of device
md0: rw=0, want=28987566088, limit=4595422208

Needless to say it's not giving me that warm fuzzy feeling.  The one 
caveat is that not all the members of my array were the same size -- one 
disk is 180GB while all the rest are 160GB.  I'm going to test overnight 
with identically sized RAID members, but I also wanted to see if anyone 
else is using RAID6.

Thanks.


-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University