Il 30/01/19 18:49, mark ha scritto: > Alessandro Baggi wrote: >> Il 30/01/19 16:33, mark ha scritto: >> >>> Alessandro Baggi wrote: >>> >>>> Il 30/01/19 14:02, mark ha scritto: >>>> >>>>> On 01/30/19 03:45, Alessandro Baggi wrote: >>>>> >>>>>> Il 29/01/19 20:42, mark ha scritto: >>>>>> >>>>>>> Alessandro Baggi wrote: >>>>>>> >>>>>>>> Il 29/01/19 18:47, mark ha scritto: >>>>>>>> >>>>>>>>> Alessandro Baggi wrote: >>>>>>>>> >>>>>>>>>> Il 29/01/19 15:03, mark ha scritto: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I've no idea what happened, but the box I was working >>>>>>>>>>> on last week has a *second* bad drive. Actually, I'm >>>>>>>>>>> starting to wonder about that particulare hot-swap bay. >>>>>>>>>>> >>>>>>>>>>> Anyway, mdadm --detail shows /dev/sdb1 remove. I've >>>>>>>>>>> added /dev/sdi1... >>>>>>>>>>> but see both /dev/sdh1 and /dev/sdi1 as spare, and have >>>>>>>>>>> yet to find a reliable way to make either one active. >>>>>>>>>>> >>>>>>>>>>> Actually, I would have expected the linux RAID to >>>>>>>>>>> replace a failed one with a spare.... >>>>>>> >>>>>>>>>> can you report your raid configuration like raid level >>>>>>>>>> and raid devices and the current status from /proc/mdstat? >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Well, nope. I got to the point of rebooting the system (xfs >>>>>>>>> had the RAID volume, and wouldn't let go; I also commented >>>>>>>>> out the RAID volume. >>>>>>>>> >>>>>>>>> It's RAID 5, /dev/sdb *also* appears to have died. If I do >>>>>>>>> mdadm --assemble --force -v /dev/md0 /dev/sd[cefgdh]1 >>>>>>>>> mdadm: >>>>>>>>> looking for devices for /dev/md0 mdadm: /dev/sdc1 is >>>>>>>>> identified as a member of /dev/md0, slot 0. mdadm: /dev/sdd1 >>>>>>>>> is identified as a member of /dev/md0, slot -1. mdadm: >>>>>>>>> /dev/sde1 is identified as a member of /dev/md0, slot >>>>>>>>> 2. >>>>>>>>> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot >>>>>>>>> 3. >>>>>>>>> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot >>>>>>>>> 4. >>>>>>>>> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot >>>>>>>>> -1. >>>>>>>>> mdadm: no uptodate device for slot 1 of /dev/md0 >>>>>>>>> mdadm: added /dev/sde1 to /dev/md0 as 2 >>>>>>>>> mdadm: added /dev/sdf1 to /dev/md0 as 3 >>>>>>>>> mdadm: added /dev/sdg1 to /dev/md0 as 4 >>>>>>>>> mdadm: no uptodate device for slot 5 of /dev/md0 >>>>>>>>> mdadm: added /dev/sdd1 to /dev/md0 as -1 >>>>>>>>> mdadm: added /dev/sdh1 to /dev/md0 as -1 >>>>>>>>> mdadm: added /dev/sdc1 to /dev/md0 as 0 >>>>>>>>> mdadm: /dev/md0 assembled from 4 drives and 2 spares - not >>>>>>>>> enough to start the array. >>>>>>>>> >>>>>>>>> --examine shows me /dev/sdd1 and /dev/sdh1, but that both >>>>>>>>> are spares. >>>>>>>> Hi Mark, >>>>>>>> please post the result from >>>>>>>> >>>>>>>> cat /sys/block/md0/md/sync_action >>>>>>> >>>>>>> There is none. There is no /dev/md0. mdadm refusees, saying >>>>>>> that it's lost too many drives. >>>>>>> >>>>>>> mark >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> CentOS mailing list >>>>>>> CentOS at centos.org >>>>>>> https://lists.centos.org/mailman/listinfo/centos >>>>>>> >>>>>> >>>>>> I suppose that your config is 5 drive and 1 spare with 1 drive >>>>>> failed. It's strange that your spare was not used for resync. Then >>>>>> you added a new drive but it does not start because it marks the >>>>>> new disk as spare and you have a raid5 with 4 devices and 2 >>>>>> spares. >>>>>> >>>>>> First I hope that you have a backup for all your data and don't >>>>>> run some exotic command before backupping your data. If you can't >>>>>> backup your data, it's a problem. >>>>> >>>>> This is at work. We have automated nightly backups, and I do >>>>> offline backups of the backups every two weeks. >>>>>> >>>>>> Have you tried to remove the last added device sdi1 and restart >>>>>> the raid and force to start a resync? >>>>> >>>>> The thing is, it had one? two? spares when /dev/sdb1 started dying, >>>>> and it didn't use them. >>>>>> >>>>>> Have you tried to remove this 2 devices and re-add only the >>>>>> device that will be usefull for resync? Maybe you can set 5 >>>>>> devices for your raid and not 6, if it works (after resync) you >>>>>> can add your spare device growing your raid set. >>>>> >>>>> I tried, and that's when I lost it (again), and it refuses to >>>>> assemble/start the RAID "not enough devices". >>>>>> >>>>>> Reading on google many users use --zero-superblock before re-add >>>>>> the device. >>>>> >>>>> I can take one out, and re-add, but I think I'm going to have to >>>>> recreate the RAID again, and again restore from backup. >>>>>> >>>>>> Other user reassemble the raid using --assume-clean but I don't >>>>>> know what effect it will produces >>>> >>>> Hope that someone give you a better help for this. >>>> >>>> >>>> Update here if you got the solution. >>>> >>>> >>> >>> Not that I'm into American football, but I seem to have pulled off what >>> I >>> understand is called a hail-mary: *without* zeroing the superrblocks, I >>> did a create with all six good drives, excluding /dev/sdb1, and >>> explicitly told it one spare. >>> >>> And the array is there, complete with data, with *one* spare, five good >>> drives, and it's currently rebuilding the spare. >>> >>> The last resort worked, though we'll see how long. >>> >> So you have recreated the array without faulty device? >> > Yep. > mdadm --create --verbose /dev/md0 --level=5 --raid-devices=6 /dev/sd[cdefgh]1 > > It's currently at 2.2% recovered for the extra drive. > > mark > > How many TB?