[CentOS] Recommendations for a “real RAID" 1 card on Centos box

Mon Mar 10 04:01:05 UTC 2008
John R Pierce <pierce at hogranch.com>

Therese Trudeau wrote:
> Ah that makes total sense now, thanks.  Do the 3wire and the Areca cards 
> allow you to remove battery/cache/disk and install into similar motherboard? Also
> when you say remove battery and cache, do you mean remove the entire RAID
> card with battery attached to it as complete assembly with accompaying drive
> and slap them all onto a new motherboard?
>
>   

I /think/ with the 3ware you remove and swap the whole card, along with 
the drives.

On many server grade systems, such as the the HP DL380 series, with on 
board SmartArray, the cache ram module and battery are separate 
detachable components.  in the dl380 they are actually two pieces with a 
cord between them. you unclip and remove the battery from the chassis 
without messing with the wire, then you pull the cache module out of its 
special slot, these can then be installed in another HP smartarray, 
along with the drives from the original system, and when that new DL380 
powers up, the raid controller will verify the drives, and flush its 
cache, insureing data integrity, then boot up your environment.


>>> Again pardon my ignorance, what is a hot spare?  A blank drive connected
>>> in the RAID 5 setup that can be written to in case one of the other 3 drives fail?
>>>
>>>   
>>>       
>> exactly.  a hot spare sits unused until one of the RAID members fails, 
>> then its used to replace the failed drive by remirroring or restriping 
>> the parity, once this is finished, and the original failed drive is 
>> replaced it can become the new hot spare.
>>     
>
> So if I understand correctly, RAID 5 is three active drives and one blank drive connected to a RAID 5 card, 
> and if one of the three active drives fails, the fourth empty drive is automatically written to?  If correct, what happens if the drive that fails loses all it's data before the
> blank drive has a chance to grab it?
>   


with a 3 drive raid 5, you write two drives worth of data across the 
3...  every third 'block' is a 'parity block' calculated by bit-wise 
exclusive or (XOR) of the other two blocks.    on a 3 drive RAID-5, this 
parity block alternates across all three drives....

drive:      0    1    2
            ===========
data        0    1   0x1
blocks     2x3   2    3
            4   4x5   5
            6    7   6x7
           8x9   8    9
             .....

each of those 'blocks' is like 32K bytes, 64 x 512 byte sectors (this is 
the stride of the raid, configured when you create the raid).   the ones 
that are just numbers are your data blocks, while the 0x1 is (block_0 
XOR block_1)  eg, the parity block for that stripe.

if any one drive, /dies/ abruptly with no warning, you can still read 
all the data from the remaining drives, the missing drive is the XOR of 
the other drives, so the controller can reconstruct it on the fly and 
you will continue operating in a degraded performance mode.

if you have a spare drive, or when you replace the failed drive, the 
raid controller begins a rebuild where it reads ALL the blocks of the 
working drives, XOR's them together, and writes this to the spare/new 
drive.   when its done, things revert to normal full performance and 
redundant operation.   raid controllers can do this while the logical 
volume is still in use and online, many let you set the priority of this 
to lower the performance impact from raid rebuilds

you can extend this with a reasonable number of drives, for instance, 5 
drives might look like...


drive:      0    1    2    3    4
            =====================
data        0    1    2    3    P
blocks      P    4    5    6    7
            8    P    9   10   11
               ....

where the P's are XOR's of /all/ the other blocks on the same line.   p0 
= b0 X b1 X b2 X b3.  p1 = b4 X b5 X b6 X b7, etc.

there's tons of material online explaining this stuff far better than a 
centos list can.
http://en.wikipedia.org/wiki/RAID