On Thu, Oct 21, 2010 at 08:59:13AM -0700, Nataraj wrote:
fred smith wrote:
On Tue, Oct 19, 2010 at 07:34:19PM -0700, Nataraj wrote:
I've seen this kind of thing happen when the autodetection stuff misbehaves. I'm not sure why it does this or how to prevent it. Anyway, to recover, I would use something like:
mdadm --stop /dev/md125 mdadm --stop /dev/md126
If for some reason the above commands fail, check and make sure it has not automounted the file systems from md125 and md126. Hopefully this won't happen.
Then use: mdadm /dev/md0 -a /dev/sdXX To add back the drive which belongs in md0, and similar for md1. In general, it won't let you add the wrong drive, but if you want to check use: mdadm --examine /dev/sda1 | grep UUID and so forth for all your drives and find the ones with the same UUID.
Well, I've already tried to use --fail and --remove on md125 and md126 but I'm told the members are still active.
mdadm /dev/md126 --fail /dev/sdb1 --remove /dev/sdb1 mdadm /dev/md125 --fail /dev/sdb2 --remove /dev/sdb2
You want to use --stop for the md125 and md126. Those are the raid devices that are not correct. Once they are stopped, you can take the drives from them and return them to md0 and md1 where they belong.!
You will need to add the correct drive that was originally paired in each raid set, but as I mentioned, it won't let you add the wrong drives, so just try adding sdb1 to md0, then if it doesn't work, add it to sdb1. You can't fail out drives from arrays that only have one drive.
Thanks for the additional information.
I'll try backing up everything this weekend then will take a stab at it.
someone said earlier that the differing raid superblocks were probably the cause of the misassignment in the first place. but I have no clue how the superblocks could have become messed up, can any of you comment on that? willl I need to hack at that issue, too, before I can succeed?
thanks again!
Nataraj
mdadm /dev/md126 --fail /dev/sdb1 --remove /dev/sdb1 mdadm: set /dev/sdb1 faulty in /dev/md126
mdadm: hot remove failed for /dev/sdb1: Device or resource busy
with the intention of then re-adding them to md0 and md1.
so I tried:
mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1 and got a similar message.
at which point I knew I was in over my head.
When I create my Raid arrays, I always use the option --bitmap=internal. With this option set, a bitmap is used to keep track of which pages on the drive are out of date and then you only resync pages which need updating instead of recopying the whole drive when this happens. In the past I once added a bitmap to an existing raid1 array using something like this. This may not be the exact command, but I know it can be done: mdadm /dev/mdN --bitmap=internal
Adding the bitmap is very worthwhile and saves time and risk of data loss by not having to recopy the whole partition.
Nataraj