[CentOS] more software raid questions

Mon Oct 25 21:11:23 UTC 2010
Scott Silva <ssilva at sgvwater.com>

on 10-21-2010 9:13 AM fred smith spake the following:
> On Thu, Oct 21, 2010 at 08:59:13AM -0700, Nataraj wrote:
>> fred smith wrote:
>>> On Tue, Oct 19, 2010 at 07:34:19PM -0700, Nataraj wrote:
>>>   
>>>>
>>>> I've seen this kind of thing happen when the autodetection stuff 
>>>> misbehaves. I'm not sure why it does this or how to prevent it. Anyway, 
>>>> to recover, I would use something like:
>>>>
>>>> mdadm --stop /dev/md125
>>>> mdadm --stop /dev/md126
>>>>
>>>> If for some reason the above commands fail, check and make sure it has 
>>>> not automounted the file systems from md125 and md126. Hopefully this 
>>>> won't happen.
>>>>
>>>> Then use:
>>>> mdadm /dev/md0 -a /dev/sdXX
>>>> To add back the drive which belongs in md0, and similar for md1. In 
>>>> general, it won't let you add the wrong drive, but if you want to check use:
>>>> mdadm --examine /dev/sda1 | grep UUID
>>>> and so forth for all your drives and find the ones with the same UUID.
>>>>     
>>>
>>> Well, I've already tried to use --fail and --remove on md125 and md126
>>> but I'm told the members are still active.
>>>
>>> mdadm /dev/md126 --fail /dev/sdb1 --remove /dev/sdb1
>>> mdadm /dev/md125 --fail /dev/sdb2 --remove /dev/sdb2
>>>   
>> You want to use --stop for the md125 and md126. Those are the raid 
>> devices that are not correct. Once they are stopped, you can take the 
>> drives from them and return them to md0 and md1 where they belong.!
> 
>>
>> You will need to add the correct drive that was originally paired in 
>> each raid set, but as I mentioned, it won't let you add the wrong 
>> drives, so just try adding sdb1 to md0, then if it doesn't work, add it 
>> to sdb1. You can't fail out drives from arrays that only have one drive.
> 
> Thanks for the additional information.
> 
> I'll try backing up everything this weekend then will take a stab at it.
> 
> someone said earlier that the differing raid superblocks were probably
> the cause of the misassignment in the first place. but I have no clue
> how the superblocks could have become messed up, can any of you comment
> on that? willl I need to hack at that issue, too, before I can succeed?
> 
> thanks again!
> 
If the system lost power or otherwise went off before all superblock data was
flushed, that could have corrupted the data.I would assume that the oddball
devices were the corrupt ones, but unless you have something to compare to, it
is hard to be sure