[CentOS] C7, mdadm issues

Wed Jan 30 15:49:54 UTC 2019
Simon Matter <simon.matter at invoca.ch>

> On 01/30/19 03:45, Alessandro Baggi wrote:
>> Il 29/01/19 20:42, mark ha scritto:
>>> Alessandro Baggi wrote:
>>>> Il 29/01/19 18:47, mark ha scritto:
>>>>> Alessandro Baggi wrote:
>>>>>> Il 29/01/19 15:03, mark ha scritto:
>>>>>>
>>>>>>> I've no idea what happened, but the box I was working on last week
>>>>>>> has a *second* bad drive. Actually, I'm starting to wonder about
>>>>>>> that particulare hot-swap bay.
>>>>>>>
>>>>>>> Anyway, mdadm --detail shows /dev/sdb1 remove. I've added
>>>>>>> /dev/sdi1...
>>>>>>> but see both /dev/sdh1 and /dev/sdi1 as spare, and have yet to find
>>>>>>> a reliable way to make either one active.
>>>>>>>
>>>>>>> Actually, I would have expected the linux RAID to replace a failed
>>>>>>> one with a spare....
>>>
>>>>>> can you report your raid configuration like raid level and raid
>>>>>> devices
>>>>>> and the current status from /proc/mdstat?
>>>>>>
>>>>> Well, nope. I got to the point of rebooting the system (xfs had the
>>>>> RAID
>>>>> volume, and wouldn't let go; I also commented out the RAID volume.
>>>>>
>>>>> It's RAID 5, /dev/sdb *also* appears to have died. If I do
>>>>> mdadm --assemble --force -v /dev/md0  /dev/sd[cefgdh]1 mdadm: looking
>>>>> for
>>>>> devices for /dev/md0 mdadm: /dev/sdc1 is identified as a member of
>>>>> /dev/md0, slot 0.
>>>>> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1.
>>>>> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 2.
>>>>> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 3.
>>>>> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 4.
>>>>> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot -1.
>>>>> mdadm: no uptodate device for slot 1 of /dev/md0
>>>>> mdadm: added /dev/sde1 to /dev/md0 as 2
>>>>> mdadm: added /dev/sdf1 to /dev/md0 as 3
>>>>> mdadm: added /dev/sdg1 to /dev/md0 as 4
>>>>> mdadm: no uptodate device for slot 5 of /dev/md0
>>>>> mdadm: added /dev/sdd1 to /dev/md0 as -1
>>>>> mdadm: added /dev/sdh1 to /dev/md0 as -1
>>>>> mdadm: added /dev/sdc1 to /dev/md0 as 0
>>>>> mdadm: /dev/md0 assembled from 4 drives and 2 spares - not enough to
>>>>> start the array.
>>>>>
>>>>> --examine shows me /dev/sdd1 and /dev/sdh1, but that both are spares.
>>>> Hi Mark,
>>>> please post the result from
>>>>
>>>> cat /sys/block/md0/md/sync_action
>>>
>>> There is none. There is no /dev/md0. mdadm refusees, saying that it's
>>> lost
>>> too many drives.
>>>
>>>        mark
>>>
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org
>>> https://lists.centos.org/mailman/listinfo/centos
>>>
>>
>>
>> I suppose that your config is 5 drive and 1 spare with 1 drive failed.
>> It's strange that your spare was not used for resync.
>> Then you added a new drive but it does not start because it marks the
>> new disk
>> as spare and you have a raid5 with 4 devices and 2 spares.
>>
>> First I hope that you have a backup for all your data and don't run some
>> exotic command before backupping your data. If you can't backup your
>> data,
>> it's a problem.
>
> This is at work. We have automated nightly backups, and I do offline
> backups
> of the backups every two weeks.
>>
>> Have you tried to remove the last added device sdi1 and restart the raid
>> and
>> force to start a resync?
>
> The thing is, it had one? two? spares when /dev/sdb1 started dying, and it
> didn't use them.

For many years now I'm only doing RAID1 now because it's just safer then
RAID5 and easier than RAID6 if the number of disks is low.

I also don't have much experience with spare handling as I also don't do
it in my scenarios.

However in general, I think the problem today is this:
We have very large disks these days. Defects on a disk are often not found
for a long time. Even with raid-check, I think it doesn't find errors
which only happen while writing, not while reading only.

So now, if one disk fails, things are still okay. Then, when a spare is in
place or the defective disk was replaced, the resync starts. Now, if there
is any error on one of the old disks while the resync happens, boom, the
array fails and is in a bad state now.

I once had to recover a broken RAID5 from some linux based NAS and what I
did was:
* Dump the complete raid partition from every disk to a file, ignoring the
read errors on one of the disks.
* Build the RAID5 like this:

mdadm --create --assume-clean --level=5 --raid-devices=4 --spare-devices=0 \
  --metadata=1.0 --layout=left-symmetric --chunk=64 --bitmap=none \
  /dev/md10 /dev/loop0 missing /dev/loop2 /dev/loop3

* Recover 99.9% of the data from /dev/md10.

One more hint for those interested:
Even with RAID1, I don't use the whole disk as one big RAID1. Instead, I
slice it into equally sized parts - not physically :-) - and create
multiple smaller RAID1 arrays on it. If a disk is 8TB, I create 8
paritions of 1TB and then create 8 RAID1 arrays on it. Then I add all 8
arrays to the same VG. Now, if there is a small error in, say, disk 3,
only a 1TB slice of the whole 8TB is degraded. In large arrays you can
even keep some spare slices on a spare disk to temporary move broken
slices. You get the idea, right?

Hope that help,
Simon