[CentOS-virt] Xen CentOS 7.3 server + CentOS 7.3 VM fails to boot after CR updates (applied to VM)!

Thu Sep 14 22:16:03 UTC 2017
Anderson, Dave <daveanderson at wsu.edu>

I've been pretty happy using 7.3+CR/7.4 as PVHVM for the last week or two now, no complaints other than the sudden problem with PV taking me by surprise.

Has anyone tried with 7.4 as the dom0 yet? I saw the point in the 7.4 release notes about enough rebasing taking place that I probably shouldn't update my xen hypervisor machines yet...but wondering what the status is on that.

Thanks,
-Dave


> On Sep 14, 2017, at 4:00 AM, Adi Pircalabu <adi at ddns.com.au> wrote:
> 
> On 14-09-2017 20:57, Adi Pircalabu wrote:
>> On 08-09-2017 6:17, Kevin Stange wrote:
>>> On 09/06/2017 05:21 PM, Kevin Stange wrote:
>>>> On 09/06/2017 08:40 AM, Johnny Hughes wrote:
>>>>> On 09/05/2017 02:26 PM, Kevin Stange wrote:
>>>>>> On 09/04/2017 05:27 PM, Johnny Hughes wrote:
>>>>>>> On 09/04/2017 03:59 PM, Kevin Stange wrote:
>>>>>>>> On 09/02/2017 08:11 AM, Johnny Hughes wrote:
>>>>>>>>> On 09/01/2017 02:41 PM, Kevin Stange wrote:
>>>>>>>>>> On 08/31/2017 07:50 AM, PJ Welsh wrote:
>>>>>>>>>>> A recently created and fully functional CentOS 7.3 VM fails to boot
>>>>>>>>>>> after applying CR updates:
>>>>>>>>>> <snip>
>>>>>>>>>>> Server OS is CentOS 7.3 using Xen (no CR updates):
>>>>>>>>>>> rpm -qa xen\*
>>>>>>>>>>> xen-hypervisor-4.6.3-15.el7.x86_64
>>>>>>>>>>> xen-4.6.3-15.el7.x86_64
>>>>>>>>>>> xen-licenses-4.6.3-15.el7.x86_64
>>>>>>>>>>> xen-libs-4.6.3-15.el7.x86_64
>>>>>>>>>>> xen-runtime-4.6.3-15.el7.x86_64
>>>>>>>>>>> uname -a
>>>>>>>>>>> Linux tsxen2.xx.com <https://urldefense.proofpoint.com/v2/url?u=http-3A__tsxen2.xx.com&d=DwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=YXRZwMwEbTMJ1t85qnlbGKfbihJPB3W_h8JMF05uQhA&m=nL_mZmY1UFFXIyjRjNDnOYF72oaFqTb61_8qV-7trBA&s=9Xwh4UrRbSxWCPu--RHK9tBDn_So4wqFK0VOVLiP15s&e= > 4.9.39-29.el7.x86_64 #1 SMP
>>>>>>>>>>> Fri Jul 21 15:09:00 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>>>>>>>>>> Sadly, the other issue is that the grub menu will not display for me to
>>>>>>>>>>> select another kernel to see if it is just a kernel issue.
>>>>>>>>>>> The dracut prompt does not show any /dev/disk folder either.
>>>>>>>>>> I'm seeing this as well.  My host is 4.9.44-29 and Xen 4.4.4-26 from
>>>>>>>>>> testing repo, my guest is 3.10.0-693.1.1.  Guest boots fine with
>>>>>>>>>> 514.26.2.  The kernel messages that appear to kick off the failure for
>>>>>>>>>> me start with a page allocation failure.  It eventually reaches dracut
>>>>>>>>>> failures due to systemd/udev not setting up properly, but I think the
>>>>>>>>>> root is this:
>>>> <snip>
>>>>>>>>> Do any of you guys have access to RHEL to try the RHEL 7.4 Kernel?
>>>>>>>> I think I may.  I haven't tried yet, but I'll see if I can get my hands
>>>>>>>> on one and test it tomorrow when I'm back at the office tomorrow.
>>>>>>>> RH closed my bug as "WONTFIX" so far, saying Red Hat Quality Engineering
>>>>>>>> Management declined the request.  I started to look at the Red Hat
>>>>>>>> source browser to see the list of patches from 693 to 514, but getting
>>>>>>>> the full list seems impossible because the change log only goes back to
>>>>>>>> 644 and there doesn't seem to be a way to obtain full builds of
>>>>>>>> unreleased kernels.  Unless I'm mistaken.
>>>>>>>> I will also do some digging via RH support if I can.
>>>>>>> I would think that RH would want AWS support for RHEL 7.4 and I thought
>>>>>>> AWS was run on Xen // Note:  I could be wrong about that.
>>>>>>> In any event, at the very least, we can make a kernel that boots PV for
>>>>>>> 7.4 at some point.
>>>>>> AWS does run on Xen, but the modifications they make to Xen are not
>>>>>> known to me nor which version of Xen they use.  They may also run the
>>>>>> domains as HVM, which seems to mitigate the issue here.
>>>>>> I just verified this kernel issue exists on a RHEL 7.3 system image
>>>>>> under the same conditions, when it's updated to RHEL 7.4 and kernel
>>>>>> 3.10.0-693.2.1.el7.x86_64.
>>>>> One other option is to run the DomU's as PVHVM:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.xen.org_wiki_Xen-5FLinux-5FPV-5Fon-5FHVM-5Fdrivers&d=DwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=YXRZwMwEbTMJ1t85qnlbGKfbihJPB3W_h8JMF05uQhA&m=nL_mZmY1UFFXIyjRjNDnOYF72oaFqTb61_8qV-7trBA&s=O2qKd2kfvOIH5E9Ndt3WhhHlOdvQQseXJtTNtriIftg&e= That should be much better performance than HVM and may be a workable
>>>>> solution for people who don't want to modify their VM kernel.
>>>>> Here is more info on PVHVM:
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.xen.org_wiki_PV-5Fon-5FHVM&d=DwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=YXRZwMwEbTMJ1t85qnlbGKfbihJPB3W_h8JMF05uQhA&m=nL_mZmY1UFFXIyjRjNDnOYF72oaFqTb61_8qV-7trBA&s=KEvkgBy5Xk4kxaQvwzaOy78t7rk2YrRT0Amziht84lc&e= ================
>>>>> Also heard from someone to try this Config file change to the base
>>>>> kernel and rebuild:
>>>>> CONFIG_RANDOMIZE_BASE=n
>>>> This suggestion was mirrored in the RH bugzilla as well, it worked, but
>>>> the same issue does not exist in newer kernels which have the option on.
>>>> I've posted updated findings in the CentOS bug, which includes a patch
>>>> that I found which seems to fix the issue:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.centos.org_view.php-3Fid-3D13763-23c30014&d=DwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=YXRZwMwEbTMJ1t85qnlbGKfbihJPB3W_h8JMF05uQhA&m=nL_mZmY1UFFXIyjRjNDnOYF72oaFqTb61_8qV-7trBA&s=sCwMApzGOUMUvb6ZockhNOmVfISaRBhCxoyLa8UeB84&e= 
>>> With many thanks to hughesjr and toracat, I was able to find a patch
>>> that seems to resolve this issue and get it into CentOS Plus
>>> 3.10.0-693.2.1.  I've asked Red Hat to apply it to some future kernel
>>> update, but that is only a dream for now.
>>> In the meantime, if anyone who has been experiencing the issue with PV
>>> domains can try out the CentOS Plus kernel here and provide feedback,
>>> I'd appreciate it!
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__buildlogs.centos.org_c7-2Dplus_kernel-2Dplus_20170907163005_3.10.0-2D693.2.1.el7.centos.plus.x86-5F64_&d=DwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=YXRZwMwEbTMJ1t85qnlbGKfbihJPB3W_h8JMF05uQhA&m=nL_mZmY1UFFXIyjRjNDnOYF72oaFqTb61_8qV-7trBA&s=EZkVPbptzzbEVFWx9JUHAWKy61l6LsCLVSEp14VI-HA&e= 
>> Loaded 3.10.0-693.2.2.el7.centos.plus.x86_64 successfully on two
>> CentOS 7.4 PV domUs which failed previously on
>> kernel-3.10.0-693.2.2.el7.x86_64, the 2 hypervisors tested are:
>> 1. CentOS 6.9, kernel 4.9.13-22.el6.x86_64, Xen 4.6.3-8.el6
>> 2. CentOS 7.3, kernel 4.9.31-27.el7.x86_64, Xen 4.9.31-27.el7.x86_64
> 
> Should read:
> 1. CentOS 6.9, kernel 4.9.13-22.el6.x86_64, Xen 4.6.3-8.el6
> 2. CentOS 7.3, kernel 4.9.31-27.el7.x86_64, Xen 4.6.3-15.el7
> 
> ---
> Adi Pircalabu, System Administrator
> _______________________________________________
> CentOS-virt mailing list
> CentOS-virt at centos.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.centos.org_mailman_listinfo_centos-2Dvirt&d=DwICAg&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=YXRZwMwEbTMJ1t85qnlbGKfbihJPB3W_h8JMF05uQhA&m=nL_mZmY1UFFXIyjRjNDnOYF72oaFqTb61_8qV-7trBA&s=gCcWg7R3JYkD_vQZomlBE7lhDcYMi8TvZEymITpYGbI&e=