[CentOS] [OT/HW] hardware raid -- comment/experience with 3Ware

Wed Mar 13 19:51:01 UTC 2013
Arun Khan <knura9 at gmail.com>

On Wed, Mar 13, 2013 at 7:40 AM, Keith Keller  wrote:
> On 2013-03-12, SilverTip257  wrote:
>>
>> I've not had any MegaRAID controllers fail, so I can only say they've been
>> reliable thus far!
>
> I think that this is not a helpful comment for the OP.  He wants to
> know, in the event the controller does fail, can he replace it with a
> similar-but-possibly-not-identical controller and have it recognize the
> original RAID containers.  Just because you have not seen any failures
> so far does not mean the OP never will.
>

+1.  Nothing is guaranteed in life.  However when the HBA fails, is it
possible to replace it with the same model+firmware (assuming a spare
card in stock) or a later model from the same OEM and recover the RAID
array?   (Assuming that none of the disks in the original array had
any failure).

Has this happened to anyone and have they been able to recover the
array without losing any data?

>> You start by failing/removing the drive via mdadm.  Then hot remove the
>> disk from the subsystem (ex: SCSI [0]) and finally physically remove it.
>>  Then work in the opposite direction ... hot add (SCSI [1]), clone the
>> partition layout from one drive to the new with sfdisk, and finally add the
>> new disk/partitions to your softraid array with mdadm.
>>
>> You must hot remove the disk from the SCSI subsystem or the block device
>> (ex: /dev/sdc) name is occupied and unavailable for the new disk you put in
>> the system.  I've used the above procedure many times to repair softraid
>> arrays while keeping systems online.
>
> This is basically the same procedure for replacing a failed drive in a
> hardware RAID array, except that there is no need to worry about drive
> names (since individual drives don't get assigned a name in the kernel).
> But the point is that replacing a failed drive is the same amount of
> on-site work in either scenario, so that should not deter the OP from
> choosing software RAID.  (There may be other factors, such as the
> aforementioned write cache on many RAID cards.)

Going slightly OT - how do the NAS boxes handle the hard disk failure scenario?

-- Arun Khan