From: Les Mikesell Sent: October 7, 2007 16:57 > Hi Les. Thanks for your response. > Hugh E Cruickshank wrote: > > > > I now find myself in the situation where I have a failed drive on a > > non-MegaRAID controller, specifically an Adaptec 29160 SCSI > > controller. > > The system is an Acer G700 with 8 internal hot-swappable SCSI drives > > arranged in two banks of 4 drives. Each bank is connected to a > > separate channel on the 29160 controller. When I installed CentOS 4 > > I enable software mirroring between the two banks so that I ended up > > with 4 pairs of mirrored drive (sda/sde, sdb/sdf, sdc/sdg, sdd/sdh). > > Normally with software mirroring you would mirror partitions, not > drives. What does "cat /proc/mdstat" say about them? You are correct. I keep falling back to thinking the "MegaRAID" way where I have the drives mirrored at the controller level and then partitioned at the software level. The /proc/mdstat reports: Personalities : [raid0] [raid1] md1 : active raid1 sde2[1] sda2[2](F) 8193024 blocks [2/1] [_U] md2 : active raid1 sde3[1] sda3[2](F) 2048192 blocks [2/1] [_U] md3 : active raid1 sde5[1] sda5[2](F) 25085376 blocks [2/1] [_U] md4 : active raid1 sdf1[1] sdb1[0] 35840896 blocks [2/2] [UU] md5 : active raid1 sdg1[1] sdc1[0] 35840896 blocks [2/2] [UU] md6 : active raid1 sdh1[1] sdd1[0] 35840896 blocks [2/2] [UU] md7 : active raid0 sdn1[5] sdm1[4] sdl1[3] sdk1[2] sdj1[1] sdi1[0] 213261312 blocks 256k chunks md0 : active raid1 sde1[1] sda1[2](F) 513984 blocks [2/1] [_U] unused devices: <none> In this configuration sda-sdh are the 29160 attached drives while sdi-sdn are hardware mirrored drive pairs attached to a MegaRAID controller. > > > The problem I have now is that it is sda (the boot drive) that has > > failed. I have not encountered this problem before and therefore I > > need to make sure that I understand what I need to do before I start > > mucking around with things and dig myself into a deeper hole. > > > > I have spent much time attempting to research the problem but have > > not > > been able to come with any definite information to help. As far as I > > can see I have two options... > > > > Option 1: Leave the system running and replace the drive. Then either > > the RAID software will re-sync the drives or I can manually sync them > > with mdadm. I have not seen anything that will support this option > > but I am hoping that it is a valid option. > > This should work, but you'll probably have to tell the controller that > you are removing and adding disks. This used to be done by writing > something to /proc/scsi/scsi, but it may have changed and also may be > controller specific so I'll let someone else point out the > documentation for that. I am glade to hear that. I thought it might be the case but I just did not fell up to trying it by yanking out my boot drive while the system was up and running. That just sounded like a recipe for disaster if I did not have some valid reasoning behind the move. I will wait to see if anyone else weighs in on the subject with some pointers to actual documentation. > > > Option 2: Create a boot disk (floppy or CD) that I can boot from but > > that points to sde (the boot mirror). Shutdown the system and replace > > the failed sda drive. Boot from the new boot disk. Format, partition > > and re-sync the new sda from sde. Shutdown, remove the boot disk, and > > reboot from the new sda. > > You have an odd combination of drives... Normally you would want to > mirror the partitions on the first 2 disks and install grub on both, in > which case the system would still boot. Some of the more sophisticated > controllers can boot from more than the first 2, though. Anyway, you > should be able to boot from your install CD with 'linux rescue' at the > boot prompt and get to a point where you can fix things. > The odd combination of drives was actually intentional on my part. The idea was to provide "separation" between the mirrors. While I did not have separate controllers I thought that using the separate channels on the common controller might provide a shade more resiliency. It was my first attempt at setting up mirrored pairs on a non-MegaRAID SCSI controller. Live and learn! I will read up on the "linux rescue" so, if I have to fallback on this method, I will be able to have a firm plan in place before I start the work. This particular system is our primary development system and does not get all the "fancy" hardware that our production systems do. I have configured the production systems using only the MegaRAID controllers and there it is a "no brainer" to replace failed drives - just swap the drive and away you go. Thanks again for your comments. They are greatly appreciated. Regards, Hugh -- Hugh E Cruickshank, Forward Software, www.forward-software.com