[CentOS] CentOS 4 Software Raid1 questions

Mon May 2 13:53:04 UTC 2005
Aleksandar Milivojevic <amilivojevic at pbl.ca>

Les Mikesell wrote:

> Thanks, now what happens when you move a set built on one machine
> to a different machine, or if you move drives intending to reformat
> but end up with mismatched members that are detected at bootup.
> I remember having a disaster years ago when I tried to minimize
> downtime by building a raid1 set on a different machine and
> pre-loading files, then shutting down just long enough to swap
> the drive in place.  I think they were either paired with the
> wrong mates or the md? devices were detected in the wrong order
> as the machine rebooted.  Maybe this has been fixed in the newer
> kernel versions but since then I've gone out of my way to avoid
> repeating the situation - sometimes as far as low-leveling on a
> non-production machine before moving a drive that may have been
> part of a raid or even one that might have a conflicting partition
> label. 

My guess is that detection order was different.  This is a big problem 
on Linux that exists even if you haven't used RAID.  For example, SCSI 
device names gets renumbered and moved around each time you add/remove 
devices.  Udev has some nice workarounds for this that work perfectly 
for many types of devices, but I'm not sure if they would work for boot 
disc, and probably wouldn't work at all for software RAID pseudo devices 
("udevinfo -a -p /sys/block/md0" doesn't show any promising output, if 
at least it included randomly generated ID, it could have been usefull).

I had similar problem to the one you described with LVM, when kernel 
read LVM info from first disc it detected (and of course, it was the 
wrong disc, with old, outdated LVM information from one of previous 
intallations).  It sounds logical to me to try searching for LVM info on 
boot disc first, and then look around (and fail if there are conflicting 
volume groups).  But not for Linux developers.  Yeah, I know, I have a 
source...  Too bad it is too much soruce, too little time...

File system labels also can be a bitch, given that in most distributions 
they are assigned in the most moronic way ("LABEL=/" might look logical 
and works if you are never going to replace discs or move them around; 
LABEL="something-random" on the other hand would work most of the time, 
as long as your boot loader is going to load correct kernel and pass it 
correct root flag, but than fstab file looks kinda ugly, and good lack 
guessing what the correct label name is on lilo or grub prompts).

The only solution to these problems on Linux is to boot into single user 
and wipe out any and all information that Linux kernel is not supposed 
to see.  And then when you have those things sorted out, attempt normal 
boot.  No way around it.  Frankly, what is kernel supposed to do when it 
reads conflicting information from the disc?  It's like when you ask for 
directions and one person tells you go right, and the other go left. 
Obviously, you'll can follow directions from either first or second 
person, or sit in the middle of intersection.

-- 
Aleksandar Milivojevic <amilivojevic at pbl.ca>    Pollard Banknote Limited
Systems Administrator                           1499 Buffalo Place
Tel: (204) 474-2323 ext 276                     Winnipeg, MB  R3T 1L7