[CentOS] raid1 custom initrd and yum

Wed Apr 2 05:07:16 UTC 2008
Les Mikesell <lesmikesell at gmail.com>

Sam Beam wrote:
>
>>> I hope...
>> If you are booting a kernel that can't find your root partition, the
>> initrd might be the problem.   There are several other things that also
>> have to be right.  You should be able to boot your install cd/dvd with
>> "linux rescue" at the boot prompt to fix any of them, so don't panic yet.
> 
> OK but can I panic if this system has a max of two IDE devices, no floppy, one 
> PCI slot, and I don't have a PCI CD, and it won't boot from USB even?

Not yet.

> http://www.tyan.com/archive/products/html/gs10b2094_spec.html
> 
> It has a hardware Promise RAID controller but I have it on good authority that 
> I don't want to mess with that, software RAID is better, etc.

Promise makes a bunch of different stuff.  I'd expect that you can run 
the drives separately - probably even hang a CD on it.

> I installed CentOS onto disk1 using the old 10-year old PC from the basement. 
> Then I mirrored in the second disk. All went pretty well until the root 
> partition was mirrored, fstab and /proc/mdstat and grub all agreed, and I 
> rebooted. 
> 
> At this point Grub seems to work OK and the 3 RAID partitions 
> (/boot, /home, /) assemble correctly and show UU. It gets all the way into 
> INIT - set hostname, check for LVM, and then "Checking filesystems" and then 
> tells me that /dev/md1 (should be /home) has a bad superblock:
> 
>   The superblock could not be read or does not describe a correct ext2
>   filesystem.  If the device is valid and it really contains an ext2
>   filesystem (and not swap or ufs or something else), then the superblock
>   is corrupt, and you might try running e2fsck with an alternate superblock:
>     e2fsck -b 8193 <device>
> 
> Then I am dropped to a recovery shell.
> 
> but $ mdadm -E /dev/hd[ad]2 both show nice superblocks, that look OK to me.
> 
> in dmesg I see raidautorun output and then device-mapper starting up as the 
> last two entires 
> 
> But, /dev/md1 doesn't exist. /dev/md0 and /dev/md2 are there and seem normal. 
> 
> /proc/mdstat contains this:
> 
>    md1: active raid1 hdd2[1] hda2[0]
>         106012864 blocks [2/2] [UU]
> 
> so imagine my surprise when I tried this:
> 
>    # mdadm -Q /dev/md1
>    mdadm: cannot open /dev/md1: No such file or directory

If the device node doesn't exist, that makes sense.

>    # mdadm -Q /dev/hdd2
>    /dev/hdd2: is not an md array
>    /dev/hdd2: device 1 in 2 device active raid1 /dev/.tmp.md1. Use 
> mdadm --examine for more detail.
> 
> But /dev/.tmp.md1 does not exist either, and therefore I can not stop this 
> mystery array or fsck /dev/hd[ad]2 because they are "busy" being part of this 
> non-existent device.
> 
> Arrgh. I be stumped.
> 
> Any help would be a big help ;)

First cut - in your recovery shell, comment out /home from /etc/fstab 
and see if you can come up without it (log in as root, of course).  That 
will at least give you a fairly normal environment to try to figure out 
why the md1 device is getting assembled but the /dev/md1 node isn't 
created for it.  I'm not sure what would happen if you create the node 
with the obvious major/minor numbers yourself.  I'd try it, but don't 
blame me if it explodes.  If you can't get /dev/md1 to appear, fdisk the 
partitions to type 83 so they won't even try to assemble.  Then you can 
at least mount one of the underlying partitions to get to the data.


Since you are this far along you probably don't need to pursue an 
alternate boot, but if you did, you could pull one of the drives and 
hang a cd or dvd on the ide interface.  Raid1 will work just fine with 
one drive missing and you could sync the other one back later.


-- 
   Les Mikesell
    lesmikesell at gmail.com