[CentOS] LVM failure after CentOS 7.6 upgrade -- possible corruption

Wed Dec 5 19:27:31 UTC 2018
Benjamin Smith <lists at benjamindsmith.com>

My gut feeling is that this is related to a RAID1 issue I'm seeing with 7.6. 
See email thread "CentOS 7.6: Software RAID1 fails the only meaningful test" 

I suggest trying to boot from an earlier kernel. Good luck! 

Ben S 


On Wednesday, December 5, 2018 9:27:22 AM PST Gordon Messmer wrote:
> I've started updating systems to CentOS 7.6, and so far I have one failure.
> 
> This system has two peculiarities which might have triggered the
> problem.  The first is that one of the software RAID arrays on this
> system is degraded.  While troubleshooting the problem, I saw similar
> error messages mentioned in bug reports indicating that sGNU/Linux
> ystems would not boot with degraded software RAID arrays.  The other
> peculiar aspect is that the system uses dm-cache.
> 
> Logs from some of the early failed boots are not available, but before I
> completely fixed the problem, I was able to bring the system up once,
> and captured logs which look substantially similar to the initial boot.
> The content of /var/log/messages is here:
> 	https://paste.fedoraproject.org/paste/n-E6X76FWIKzIvzPOw97uw
> 
> The output of lsblk (minus some VM logical volumes) is here:
> 	https://paste.fedoraproject.org/paste/OizFvMeGn81vF52VEvUbyg
> 
> As best I can tell, the LVM tools were treating software RAID component
> devices as PVs, and detecting a conflict between those and the assembled
> RAID volume.  When running "pvs" on the broken system, no RAID volumes
> were listed, only component devices.  At the moment, I don't know if the
> LVs that were activated by the initrd were backed by component devices
> or the RAID devices, so it's possible that this bug might corrupt
> software RAID arrays.
> 
> In order to correct the problem, I had to add a global_filter to
> /etc/lvm/lvm.conf and rebuild the initrd (dracut -f):
> 	global_filter = [ "r|vm_.*_data|", "a|sdd1|", "r|sd..|" ]
> 
> This filter excludes the LVs that contain VM data, accepts "/dev/sdd1"
> which is the dm-cache device, and rejects all other partitions on
> SCSI(SATA) device nodes, as all of those are RAID component devices.
> 
> I'm still working on the details of the problem, but I wanted to share
> what I know now in case anyone else might be affected.
> 
> After updating, look at the output of "pvs" if you use LVM on software RAID.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos