Re: [CentOS] C7, mdadm issues

30 Jan 2019

      Il 30/01/19 18:49, mark ha scritto:
...
Alessandro Baggi wrote:
...
Il 30/01/19 16:33, mark ha scritto:
...
Alessandro Baggi wrote:
...
Il 30/01/19 14:02, mark ha scritto:
...
On 01/30/19 03:45, Alessandro Baggi wrote:
...
Il 29/01/19 20:42, mark ha scritto:
> Alessandro Baggi wrote:
>
>> Il 29/01/19 18:47, mark ha scritto:
>>
>>> Alessandro Baggi wrote:
>>>
>>>> Il 29/01/19 15:03, mark ha scritto:
>>>>
>>>>
>>>>> I've no idea what happened, but the box I was working
>>>>> on last week has a *second* bad drive. Actually, I'm
>>>>> starting to wonder about that particulare hot-swap bay.
>>>>>
>>>>> Anyway, mdadm --detail shows /dev/sdb1 remove. I've
>>>>> added /dev/sdi1...
>>>>> but see both /dev/sdh1 and /dev/sdi1 as spare, and have
>>>>> yet to find a reliable way to make either one active.
>>>>>
>>>>> Actually, I would have expected the linux RAID to
>>>>> replace a failed one with a spare....
>
>>>> can you report your raid configuration like raid level
>>>> and raid devices and the current status from /proc/mdstat?
>>>>
>>>>
>>> Well, nope. I got to the point of rebooting the system (xfs
>>> had the RAID volume, and wouldn't let go; I also commented
>>> out the RAID volume.
>>>
>>> It's RAID 5, /dev/sdb *also* appears to have died. If I do
>>> mdadm --assemble --force -v /dev/md0  /dev/sd[cefgdh]1
>>> mdadm:
>>> looking for devices for /dev/md0 mdadm: /dev/sdc1 is
>>> identified as a member of /dev/md0, slot 0. mdadm: /dev/sdd1
>>> is identified as a member of /dev/md0, slot -1. mdadm:
>>> /dev/sde1 is identified as a member of /dev/md0, slot
>>> 2.
>>> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot
>>> 3.
>>> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot
>>> 4.
>>> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot
>>> -1.
>>> mdadm: no uptodate device for slot 1 of /dev/md0
>>> mdadm: added /dev/sde1 to /dev/md0 as 2
>>> mdadm: added /dev/sdf1 to /dev/md0 as 3
>>> mdadm: added /dev/sdg1 to /dev/md0 as 4
>>> mdadm: no uptodate device for slot 5 of /dev/md0
>>> mdadm: added /dev/sdd1 to /dev/md0 as -1
>>> mdadm: added /dev/sdh1 to /dev/md0 as -1
>>> mdadm: added /dev/sdc1 to /dev/md0 as 0
>>> mdadm: /dev/md0 assembled from 4 drives and 2 spares - not
>>> enough to start the array.
>>>
>>> --examine shows me /dev/sdd1 and /dev/sdh1, but that both
>>> are spares.
>> Hi Mark,
>> please post the result from
>>
>> cat /sys/block/md0/md/sync_action
>
> There is none. There is no /dev/md0. mdadm refusees, saying
> that it's lost too many drives.
>
>         mark
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
I suppose that your config is 5 drive and 1 spare with 1 drive
failed. It's strange that your spare was not used for resync. Then
you added a new drive but it does not start because it marks the
new disk as spare and you have a raid5 with 4 devices and 2
spares.
First I hope that you have a backup for all your data and don't
run some exotic command before backupping your data. If you can't
backup your data, it's a problem.
This is at work. We have automated nightly backups, and I do
offline backups of the backups every two weeks.
...
Have you tried to remove the last added device sdi1 and restart
the raid and force to start a resync?
The thing is, it had one? two? spares when /dev/sdb1 started dying,
and it didn't use them.
...
Have you tried to remove this 2 devices and re-add only the
device that will be usefull for resync?  Maybe you can set 5
devices for your raid and not 6, if it works (after resync) you
can add your spare device growing your raid set.
I tried, and that's when I lost it (again), and it refuses to
assemble/start the RAID "not enough devices".
...
Reading on google many users use --zero-superblock before re-add
the device.
I can take one out, and re-add, but I think I'm going to have to
recreate the RAID again, and again restore from backup.
...
Other user reassemble the raid using --assume-clean but I don't
know what effect it will produces
Hope that someone give you a better help for this.
Update here if you got the solution.
Not that I'm into American football, but I seem to have pulled off what
I
understand is called a hail-mary: *without* zeroing the superrblocks, I
did a create with all six good drives, excluding /dev/sdb1, and
explicitly told it one spare.
And the array is there, complete with data, with *one* spare, five good
  drives, and it's currently rebuilding the spare.
The last resort worked, though we'll see how long.
So you have recreated the array without faulty device?
Yep.
mdadm --create --verbose /dev/md0 --level=5 --raid-devices=6 /dev/sd[cdefgh]1
It's currently at 2.2% recovered for the extra drive.
  mark

How many TB?

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] C7, mdadm issues