[CentOS] C7, mdadm issues

Wed Jan 30 13:39:26 UTC 2019
Alessandro Baggi <alessandro.baggi at gmail.com>

Il 30/01/19 14:02, mark ha scritto:
> On 01/30/19 03:45, Alessandro Baggi wrote:
>> Il 29/01/19 20:42, mark ha scritto:
>>> Alessandro Baggi wrote:
>>>> Il 29/01/19 18:47, mark ha scritto:
>>>>> Alessandro Baggi wrote:
>>>>>> Il 29/01/19 15:03, mark ha scritto:
>>>>>>
>>>>>>> I've no idea what happened, but the box I was working on last week
>>>>>>> has a *second* bad drive. Actually, I'm starting to wonder about
>>>>>>> that particulare hot-swap bay.
>>>>>>>
>>>>>>> Anyway, mdadm --detail shows /dev/sdb1 remove. I've added
>>>>>>> /dev/sdi1...
>>>>>>> but see both /dev/sdh1 and /dev/sdi1 as spare, and have yet to find
>>>>>>> a reliable way to make either one active.
>>>>>>>
>>>>>>> Actually, I would have expected the linux RAID to replace a failed
>>>>>>> one with a spare....
>>>
>>>>>> can you report your raid configuration like raid level and raid 
>>>>>> devices
>>>>>> and the current status from /proc/mdstat?
>>>>>>
>>>>> Well, nope. I got to the point of rebooting the system (xfs had the
>>>>> RAID
>>>>> volume, and wouldn't let go; I also commented out the RAID volume.
>>>>>
>>>>> It's RAID 5, /dev/sdb *also* appears to have died. If I do
>>>>> mdadm --assemble --force -v /dev/md0  /dev/sd[cefgdh]1 mdadm: 
>>>>> looking for
>>>>> devices for /dev/md0 mdadm: /dev/sdc1 is identified as a member of
>>>>> /dev/md0, slot 0.
>>>>> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1.
>>>>> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 2.
>>>>> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3.
>>>>> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 4.
>>>>> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot -1.
>>>>> mdadm: no uptodate device for slot 1 of /dev/md0
>>>>> mdadm: added /dev/sde1 to /dev/md0 as 2
>>>>> mdadm: added /dev/sdf1 to /dev/md0 as 3
>>>>> mdadm: added /dev/sdg1 to /dev/md0 as 4
>>>>> mdadm: no uptodate device for slot 5 of /dev/md0
>>>>> mdadm: added /dev/sdd1 to /dev/md0 as -1
>>>>> mdadm: added /dev/sdh1 to /dev/md0 as -1
>>>>> mdadm: added /dev/sdc1 to /dev/md0 as 0
>>>>> mdadm: /dev/md0 assembled from 4 drives and 2 spares - not enough to
>>>>> start the array.
>>>>>
>>>>> --examine shows me /dev/sdd1 and /dev/sdh1, but that both are spares.
>>>> Hi Mark,
>>>> please post the result from
>>>>
>>>> cat /sys/block/md0/md/sync_action
>>>
>>> There is none. There is no /dev/md0. mdadm refusees, saying that it's 
>>> lost
>>> too many drives.
>>>
>>>        mark
>>>
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org
>>> https://lists.centos.org/mailman/listinfo/centos
>>>
>>
>>
>> I suppose that your config is 5 drive and 1 spare with 1 drive failed.
>> It's strange that your spare was not used for resync.
>> Then you added a new drive but it does not start because it marks the 
>> new disk as spare and you have a raid5 with 4 devices and 2 spares.
>>
>> First I hope that you have a backup for all your data and don't run 
>> some exotic command before backupping your data. If you can't backup 
>> your data, it's a problem.
> 
> This is at work. We have automated nightly backups, and I do offline 
> backups of the backups every two weeks.
>>
>> Have you tried to remove the last added device sdi1 and restart the 
>> raid and force to start a resync?
> 
> The thing is, it had one? two? spares when /dev/sdb1 started dying, and 
> it didn't use them.
>>
>> Have you tried to remove this 2 devices and re-add only the device 
>> that will be usefull for resync?  Maybe you can set 5 devices for your 
>> raid and not 6, if it works (after resync) you can add your spare 
>> device growing your raid set.
> 
> I tried, and that's when I lost it (again), and it refuses to 
> assemble/start the RAID "not enough devices".
>>
>> Reading on google many users use --zero-superblock before re-add the 
>> device.
> 
> I can take one out, and re-add, but I think I'm going to have to 
> recreate the RAID again, and again restore from backup.
>>
>> Other user reassemble the raid using --assume-clean but I don't know 
>> what effect it will produces
>>
>> Hope that this helps.
> 
> Thanks.
> 
>      mark
> 

Hope that someone give you a better help for this.

Update here if you got the solution.