Therese Trudeau wrote:
Ah that makes total sense now, thanks. Do the 3wire and the Areca cards allow you to remove battery/cache/disk and install into similar motherboard? Also when you say remove battery and cache, do you mean remove the entire RAID card with battery attached to it as complete assembly with accompaying drive and slap them all onto a new motherboard?
I /think/ with the 3ware you remove and swap the whole card, along with the drives.
On many server grade systems, such as the the HP DL380 series, with on board SmartArray, the cache ram module and battery are separate detachable components. in the dl380 they are actually two pieces with a cord between them. you unclip and remove the battery from the chassis without messing with the wire, then you pull the cache module out of its special slot, these can then be installed in another HP smartarray, along with the drives from the original system, and when that new DL380 powers up, the raid controller will verify the drives, and flush its cache, insureing data integrity, then boot up your environment.
Again pardon my ignorance, what is a hot spare? A blank drive connected in the RAID 5 setup that can be written to in case one of the other 3 drives fail?
exactly. a hot spare sits unused until one of the RAID members fails, then its used to replace the failed drive by remirroring or restriping the parity, once this is finished, and the original failed drive is replaced it can become the new hot spare.
So if I understand correctly, RAID 5 is three active drives and one blank drive connected to a RAID 5 card, and if one of the three active drives fails, the fourth empty drive is automatically written to? If correct, what happens if the drive that fails loses all it's data before the blank drive has a chance to grab it?
with a 3 drive raid 5, you write two drives worth of data across the 3... every third 'block' is a 'parity block' calculated by bit-wise exclusive or (XOR) of the other two blocks. on a 3 drive RAID-5, this parity block alternates across all three drives....
drive: 0 1 2 =========== data 0 1 0x1 blocks 2x3 2 3 4 4x5 5 6 7 6x7 8x9 8 9 .....
each of those 'blocks' is like 32K bytes, 64 x 512 byte sectors (this is the stride of the raid, configured when you create the raid). the ones that are just numbers are your data blocks, while the 0x1 is (block_0 XOR block_1) eg, the parity block for that stripe.
if any one drive, /dies/ abruptly with no warning, you can still read all the data from the remaining drives, the missing drive is the XOR of the other drives, so the controller can reconstruct it on the fly and you will continue operating in a degraded performance mode.
if you have a spare drive, or when you replace the failed drive, the raid controller begins a rebuild where it reads ALL the blocks of the working drives, XOR's them together, and writes this to the spare/new drive. when its done, things revert to normal full performance and redundant operation. raid controllers can do this while the logical volume is still in use and online, many let you set the priority of this to lower the performance impact from raid rebuilds
you can extend this with a reasonable number of drives, for instance, 5 drives might look like...
drive: 0 1 2 3 4 ===================== data 0 1 2 3 P blocks P 4 5 6 7 8 P 9 10 11 ....
where the P's are XOR's of /all/ the other blocks on the same line. p0 = b0 X b1 X b2 X b3. p1 = b4 X b5 X b6 X b7, etc.
there's tons of material online explaining this stuff far better than a centos list can. http://en.wikipedia.org/wiki/RAID