[CentOS] LVM failure after CentOS 7.6 upgrade -- possible corruption

Wed Dec 5 19:38:50 UTC 2018
Stephen John Smoogen <smooge at gmail.com>

On Wed, 5 Dec 2018 at 14:27, Benjamin Smith <lists at benjamindsmith.com> wrote:
>
> My gut feeling is that this is related to a RAID1 issue I'm seeing with 7.6.
> See email thread "CentOS 7.6: Software RAID1 fails the only meaningful test"
>

You might want to point out which list you posted it on since it
doesn't seem to be this one.


> I suggest trying to boot from an earlier kernel. Good luck!
>
> Ben S
>
>
> On Wednesday, December 5, 2018 9:27:22 AM PST Gordon Messmer wrote:
> > I've started updating systems to CentOS 7.6, and so far I have one failure.
> >
> > This system has two peculiarities which might have triggered the
> > problem.  The first is that one of the software RAID arrays on this
> > system is degraded.  While troubleshooting the problem, I saw similar
> > error messages mentioned in bug reports indicating that sGNU/Linux
> > ystems would not boot with degraded software RAID arrays.  The other
> > peculiar aspect is that the system uses dm-cache.
> >
> > Logs from some of the early failed boots are not available, but before I
> > completely fixed the problem, I was able to bring the system up once,
> > and captured logs which look substantially similar to the initial boot.
> > The content of /var/log/messages is here:
> >       https://paste.fedoraproject.org/paste/n-E6X76FWIKzIvzPOw97uw
> >
> > The output of lsblk (minus some VM logical volumes) is here:
> >       https://paste.fedoraproject.org/paste/OizFvMeGn81vF52VEvUbyg
> >
> > As best I can tell, the LVM tools were treating software RAID component
> > devices as PVs, and detecting a conflict between those and the assembled
> > RAID volume.  When running "pvs" on the broken system, no RAID volumes
> > were listed, only component devices.  At the moment, I don't know if the
> > LVs that were activated by the initrd were backed by component devices
> > or the RAID devices, so it's possible that this bug might corrupt
> > software RAID arrays.
> >
> > In order to correct the problem, I had to add a global_filter to
> > /etc/lvm/lvm.conf and rebuild the initrd (dracut -f):
> >       global_filter = [ "r|vm_.*_data|", "a|sdd1|", "r|sd..|" ]
> >
> > This filter excludes the LVs that contain VM data, accepts "/dev/sdd1"
> > which is the dm-cache device, and rejects all other partitions on
> > SCSI(SATA) device nodes, as all of those are RAID component devices.
> >
> > I'm still working on the details of the problem, but I wanted to share
> > what I know now in case anyone else might be affected.
> >
> > After updating, look at the output of "pvs" if you use LVM on software RAID.
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> > https://lists.centos.org/mailman/listinfo/centos
>
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos



-- 
Stephen J Smoogen.