Hugh E Cruickshank wrote:
Normally with software mirroring you would mirror partitions, not drives. What does "cat /proc/mdstat" say about them?
You are correct. I keep falling back to thinking the "MegaRAID" way where I have the drives mirrored at the controller level and then partitioned at the software level. The /proc/mdstat reports:
Personalities : [raid0] [raid1] md1 : active raid1 sde2[1] sda2[2](F) 8193024 blocks [2/1] [_U]
md2 : active raid1 sde3[1] sda3[2](F) 2048192 blocks [2/1] [_U]
md3 : active raid1 sde5[1] sda5[2](F) 25085376 blocks [2/1] [_U]
md4 : active raid1 sdf1[1] sdb1[0] 35840896 blocks [2/2] [UU]
md5 : active raid1 sdg1[1] sdc1[0] 35840896 blocks [2/2] [UU]
md6 : active raid1 sdh1[1] sdd1[0] 35840896 blocks [2/2] [UU]
md7 : active raid0 sdn1[5] sdm1[4] sdl1[3] sdk1[2] sdj1[1] sdi1[0] 213261312 blocks 256k chunks
md0 : active raid1 sde1[1] sda1[2](F) 513984 blocks [2/1] [_U]
OK, you just have to replace the drive, fdisk matching partitions on it ("fdisk -l /dev/sde" will show the sizes you need), then use mdadm --add /dev/md? /dev/sda? for each one to add the missing partition back. Then reinstall grub on the drive.
You have an odd combination of drives... Normally you would want to mirror the partitions on the first 2 disks and install grub on both, in which case the system would still boot. Some of the more sophisticated controllers can boot from more than the first 2, though. Anyway, you should be able to boot from your install CD with 'linux rescue' at the boot prompt and get to a point where you can fix things.
The odd combination of drives was actually intentional on my part. The idea was to provide "separation" between the mirrors. While I did not have separate controllers I thought that using the separate channels on the common controller might provide a shade more resiliency. It was my first attempt at setting up mirrored pairs on a non-MegaRAID SCSI controller. Live and learn!
The controller might let you boot from the 2nd channel - and if that's the case you could install grub on /dev/sde before shutting down, adjust the controller bios, and still be able to boot. The catch is that you won't know if it will work until after you shut down..
I will read up on the "linux rescue" so, if I have to fallback on this method, I will be able to have a firm plan in place before I start the work.
The only tricky part is what happens to the drive names if you boot with /dev/sda broken (depending on the failure mode) or missing. If the controller doesn't see it, all of the other drive names will shift up. This normally won't affect md device detection, but you may have a non md device mentioned in /etc/fstab, especially for swap devices.
This particular system is our primary development system and does not get all the "fancy" hardware that our production systems do. I have configured the production systems using only the MegaRAID controllers and there it is a "no brainer" to replace failed drives - just swap the drive and away you go.
It isn't that complicated to fdisk a partition and mdadm --add it, and with software raid1 you gain the ability to plug any remaining single drive into any vendor's scsi controller and access the data.