on 10-21-2010 9:13 AM fred smith spake the following:
On Thu, Oct 21, 2010 at 08:59:13AM -0700, Nataraj wrote:
fred smith wrote:
On Tue, Oct 19, 2010 at 07:34:19PM -0700, Nataraj wrote:
I've seen this kind of thing happen when the autodetection stuff misbehaves. I'm not sure why it does this or how to prevent it. Anyway, to recover, I would use something like:
mdadm --stop /dev/md125 mdadm --stop /dev/md126
If for some reason the above commands fail, check and make sure it has not automounted the file systems from md125 and md126. Hopefully this won't happen.
Then use: mdadm /dev/md0 -a /dev/sdXX To add back the drive which belongs in md0, and similar for md1. In general, it won't let you add the wrong drive, but if you want to check use: mdadm --examine /dev/sda1 | grep UUID and so forth for all your drives and find the ones with the same UUID.
Well, I've already tried to use --fail and --remove on md125 and md126 but I'm told the members are still active.
mdadm /dev/md126 --fail /dev/sdb1 --remove /dev/sdb1 mdadm /dev/md125 --fail /dev/sdb2 --remove /dev/sdb2
You want to use --stop for the md125 and md126. Those are the raid devices that are not correct. Once they are stopped, you can take the drives from them and return them to md0 and md1 where they belong.!
You will need to add the correct drive that was originally paired in each raid set, but as I mentioned, it won't let you add the wrong drives, so just try adding sdb1 to md0, then if it doesn't work, add it to sdb1. You can't fail out drives from arrays that only have one drive.
Thanks for the additional information.
I'll try backing up everything this weekend then will take a stab at it.
someone said earlier that the differing raid superblocks were probably the cause of the misassignment in the first place. but I have no clue how the superblocks could have become messed up, can any of you comment on that? willl I need to hack at that issue, too, before I can succeed?
thanks again!
If the system lost power or otherwise went off before all superblock data was flushed, that could have corrupted the data.I would assume that the oddball devices were the corrupt ones, but unless you have something to compare to, it is hard to be sure