[CentOS] Replacing failed software RAID drive

Les Mikesell lesmikesell at gmail.com
Mon Oct 8 01:52:59 UTC 2007


Hugh E Cruickshank wrote:

>> Normally with software mirroring you would mirror partitions, not 
>> drives.  What does "cat /proc/mdstat" say about them?
> 
> You are correct. I keep falling back to thinking the "MegaRAID" way
> where I have the drives mirrored at the controller level and then
> partitioned at the software level. The /proc/mdstat reports:
> 
> Personalities : [raid0] [raid1]
> md1 : active raid1 sde2[1] sda2[2](F)
>       8193024 blocks [2/1] [_U]
> 
> md2 : active raid1 sde3[1] sda3[2](F)
>       2048192 blocks [2/1] [_U]
> 
> md3 : active raid1 sde5[1] sda5[2](F)
>       25085376 blocks [2/1] [_U]
> 
> md4 : active raid1 sdf1[1] sdb1[0]
>       35840896 blocks [2/2] [UU]
> 
> md5 : active raid1 sdg1[1] sdc1[0]
>       35840896 blocks [2/2] [UU]
> 
> md6 : active raid1 sdh1[1] sdd1[0]
>       35840896 blocks [2/2] [UU]
> 
> md7 : active raid0 sdn1[5] sdm1[4] sdl1[3] sdk1[2] sdj1[1] sdi1[0]
>       213261312 blocks 256k chunks
> 
> md0 : active raid1 sde1[1] sda1[2](F)
>       513984 blocks [2/1] [_U]

OK, you just have to replace the drive, fdisk matching partitions on it 
("fdisk -l /dev/sde" will show the sizes you need), then use
mdadm --add /dev/md? /dev/sda?
for each one to add the missing partition back.  Then reinstall grub on 
the drive.

>> You have an odd combination of drives... Normally you would want to 
>> mirror the partitions on the first 2 disks and install grub on both, in 
>> which case the system would still boot.  Some of the more sophisticated 
>>   controllers can boot from more than the first 2, though.  Anyway, you 
>> should be able to boot from your install CD with 'linux rescue' at the 
>> boot prompt and get to a point where you can fix things.
>>
> 
> The odd combination of drives was actually intentional on my part. The
> idea was to provide "separation" between the mirrors. While I did not
> have separate controllers I thought that using the separate channels 
> on the common controller might provide a shade more resiliency. It was
> my first attempt at setting up mirrored pairs on a non-MegaRAID SCSI
> controller. Live and learn!

The controller might let you boot from the 2nd channel - and if that's 
the case you could install grub on /dev/sde before shutting down, adjust 
the controller bios, and still be able to boot.  The catch is that you 
won't know if it will work until after you shut down..

> I will read up on the "linux rescue" so, if I have to fallback on this
> method, I will be able to have a firm plan in place before I start the
> work.

The only tricky part is what happens to the drive names if you boot with 
/dev/sda broken (depending on the failure mode) or missing.  If the 
controller doesn't see it, all of the other drive names will shift up. 
This normally won't affect md device detection, but you may have a non 
md device mentioned in /etc/fstab, especially for swap devices.

> This particular system is our primary development system and does not
> get all the "fancy" hardware that our production systems do. I have
> configured the production systems using only the MegaRAID controllers
> and there it is a "no brainer" to replace failed drives - just swap
> the drive and away you go.

It isn't that complicated to fdisk a partition and mdadm --add it, and 
with software raid1 you gain the ability to plug any remaining single 
drive into any vendor's scsi controller and access the data.

-- 
   Les Mikesell
    lesmikesell at gmail.com




More information about the CentOS mailing list