Hi all,
Following instructions at http://wiki.centos.org/QaWiki/Xen4 to setup Xen on CentOS 6.4. Unfortunately after installing Xen and modifying the boot line there is a kernel panic during the boot process causing the host to enter a reboot loop. Console log attached.
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b
xen-4.2.2-23.el6.centos.alt.x86_64 kernel-3.4.53-8.el6.centos.alt.x86_64
title xen root (hd0,0) kernel /xen.gz dom0_mem=256M,max:256M loglvl=all guest_loglvl=all module /vmlinuz-3.4.53-8.el6.centos.alt.x86_64 ro root=/dev/mapper/vg_cs-lv_root rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=uk LANG=en_US.UTF-8 rd_LVM_LV=vg_cs/lv_root rd_Np module /initramfs-3.4.53-8.el6.centos.alt.x86_64.img
This issue was found in an automated environment that actually uses Dave Scott's Xen 4.4 branch (http://xenbits.xen.org/djs/centos-xen-4-4/) however in trying to diagnose the issue we found that the base xen-c6 combination failed in the same way. Note also that the last successful run (which was a while ago due to a configuration issue) used the same xen and kernel as the now-failing environments. Both the failing and passing environments are using xen-4.4.0-2.el6.x86_64 and kernel-3.4.53-8.el6.centos.alt.x86_64.
Is the current combination of xen/kernel in xen-c6 working for others at the moment? Are there any thoughts on what might be causing this regression?
Bob
On 12/01/2014 04:48 AM, Bob Ball wrote:
Hi all,
Following instructions at http://wiki.centos.org/QaWiki/Xen4 to setup Xen on CentOS 6.4. Unfortunately after installing Xen and modifying the boot line there is a kernel panic during the boot process causing the host to enter a reboot loop. Console log attached.
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b
xen-4.2.2-23.el6.centos.alt.x86_64 kernel-3.4.53-8.el6.centos.alt.x86_64
title xen root (hd0,0) kernel /xen.gz dom0_mem=256M,max:256M loglvl=all guest_loglvl=all module /vmlinuz-3.4.53-8.el6.centos.alt.x86_64 ro root=/dev/mapper/vg_cs-lv_root rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=uk LANG=en_US.UTF-8 rd_LVM_LV=vg_cs/lv_root rd_Np module /initramfs-3.4.53-8.el6.centos.alt.x86_64.img
This issue was found in an automated environment that actually uses Dave Scott's Xen 4.4 branch (http://xenbits.xen.org/djs/centos-xen-4-4/) however in trying to diagnose the issue we found that the base xen-c6 combination failed in the same way. Note also that the last successful run (which was a while ago due to a configuration issue) used the same xen and kernel as the now-failing environments. Both the failing and passing environments are using xen-4.4.0-2.el6.x86_64 and kernel-3.4.53-8.el6.centos.alt.x86_64.
Is the current combination of xen/kernel in xen-c6 working for others at the moment? Are there any thoughts on what might be causing this regression?
Bob
It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.
Use this link: http://wiki.centos.org/HowTos/Xen/Xen4QuickStart
BUT .. it seems to be a hardware/driver issue.
Thanks, Johnny Hughes
-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b
It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.
Updated to CentOS-6.6, but I still get the same issue.
By the above I assume you're using the xen4 repo rather than the xen-c6 repository referred to by http://wiki.centos.org/QaWiki/Xen4? Is the xen-c6 repo now considered broken or deprecated with the xen4 repo used in preference?
BUT .. it seems to be a hardware/driver issue.
The same hardware (cluster of 10 machines) was successfully working with the xen-c6 repository previously; I'm not sure what issue might have occurred to cause this failure on all hosts which is why I think it's a software issue. Possibly a driver issue although the last successful run was using the same kernel so I assume had roughly the same drivers installed. Note that the 3.4 kernel boots fine without Xen, it is only under Xen that the boot fails and the machine restarts.
Bob
On Tue, Dec 2, 2014 at 1:36 PM, Bob Ball bob.ball@citrix.com wrote:
-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b
It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.
Updated to CentOS-6.6, but I still get the same issue.
By the above I assume you're using the xen4 repo rather than the xen-c6 repository referred to by http://wiki.centos.org/QaWiki/Xen4? Is the xen-c6 repo now considered broken or deprecated with the xen4 repo used in preference?
Yes, the top of that page says:
"This is a development release, only meant for testing purposes. We do not recommend anyone deploy production systems using the content mentioned here. "
The wiki page Johnny pointed you to is the officially supported Xen 4 CentOS binary now.
We should probably delete that wiki page -- thanks for finding it. :-)
-George
On 12/02/2014 07:36 AM, Bob Ball wrote:
-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b
It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.
Updated to CentOS-6.6, but I still get the same issue.
By the above I assume you're using the xen4 repo rather than the xen-c6 repository referred to by http://wiki.centos.org/QaWiki/Xen4? Is the xen-c6 repo now considered broken or deprecated with the xen4 repo used in preference?
BUT .. it seems to be a hardware/driver issue.
The same hardware (cluster of 10 machines) was successfully working with the xen-c6 repository previously; I'm not sure what issue might have occurred to cause this failure on all hosts which is why I think it's a software issue. Possibly a driver issue although the last successful run was using the same kernel so I assume had roughly the same drivers installed. Note that the 3.4 kernel boots fine without Xen, it is only under Xen that the boot fails and the machine restarts.
What I mean by hardware issue is the way the hardware interacts with the newer versions of xen. I guess what I should have said is that there is some unique issue with your hardware.
The updates have have posted are needed for numerous security updates, so I would not recommend running older versions long term for security reasons ... BUT ... all the previously released software is here:
http://vault.centos.org/6.4/xen4/
http://vault.centos.org/6.5/xen4/
and
http://mirror.centos.org/centos/6.6/xen4/
In this unique case (ie, your exact hardware and software combinations), you may need to experiment with and find the exact combination of software that works for you.
In any event, all the software we have previously released is in those locations, so getting a combination that works so we can isolate the issue that causes it all to die is likely the best starting point.
Thanks all for the advice.
It seems there is an issue with Dracut booting from these hosts when LVM is used.
dracut: Scanning devices sda2 for LVM logical volumes VolGroup/lv_swap VolGroup/lv_root dracut: inactive '/dev/VolGroup/lv_swap' [1.94 GiB] inherit dracut: inactive '/dev/VolGroup/lv_root' [230.69 GiB] inherit dracut: PARTIAL MODE. Incomplete logical volumes will be processed. dracut: Operation prohibited while global/metadata_read_only is set. dracut: Operation prohibited while global/metadata_read_only is set. ... dracut Warning: LVM VolGroup/lv_swap not found dracut Warning: LVM VolGroup/lv_root not found
Switching my kickstart to use real partitions rather than LVM solved the issue. Not sure if that's enough detail to figure out what's wrong / missing from the kernel / initrd.
Bob
-----Original Message----- From: centos-virt-bounces@centos.org [mailto:centos-virt- bounces@centos.org] On Behalf Of Johnny Hughes Sent: 04 December 2014 09:51 To: centos-virt@centos.org Subject: Re: [CentOS-virt] xen-c6 fails to boot
On 12/02/2014 07:36 AM, Bob Ball wrote:
-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b
It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.
Updated to CentOS-6.6, but I still get the same issue.
By the above I assume you're using the xen4 repo rather than the xen-c6
repository referred to by http://wiki.centos.org/QaWiki/Xen4?
Is the xen-c6 repo now considered broken or deprecated with the xen4
repo used in preference?
BUT .. it seems to be a hardware/driver issue.
The same hardware (cluster of 10 machines) was successfully working with
the xen-c6 repository previously; I'm not sure what issue might have occurred to cause this failure on all hosts which is why I think it's a software issue. Possibly a driver issue although the last successful run was using the same kernel so I assume had roughly the same drivers installed. Note that the 3.4 kernel boots fine without Xen, it is only under Xen that the boot fails and the machine restarts.
What I mean by hardware issue is the way the hardware interacts with the newer versions of xen. I guess what I should have said is that there is some unique issue with your hardware.
The updates have have posted are needed for numerous security updates, so I would not recommend running older versions long term for security reasons ... BUT ... all the previously released software is here:
http://vault.centos.org/6.4/xen4/
http://vault.centos.org/6.5/xen4/
and
http://mirror.centos.org/centos/6.6/xen4/
In this unique case (ie, your exact hardware and software combinations), you may need to experiment with and find the exact combination of software that works for you.
In any event, all the software we have previously released is in those locations, so getting a combination that works so we can isolate the issue that causes it all to die is likely the best starting point.
On Thu, Dec 4, 2014 at 12:39 PM, Bob Ball bob.ball@citrix.com wrote:
Thanks all for the advice.
It seems there is an issue with Dracut booting from these hosts when LVM is used.
dracut: Scanning devices sda2 for LVM logical volumes VolGroup/lv_swap VolGroup/lv_root dracut: inactive '/dev/VolGroup/lv_swap' [1.94 GiB] inherit dracut: inactive '/dev/VolGroup/lv_root' [230.69 GiB] inherit dracut: PARTIAL MODE. Incomplete logical volumes will be processed. dracut: Operation prohibited while global/metadata_read_only is set. dracut: Operation prohibited while global/metadata_read_only is set. ... dracut Warning: LVM VolGroup/lv_swap not found dracut Warning: LVM VolGroup/lv_root not found
Switching my kickstart to use real partitions rather than LVM solved the issue. Not sure if that's enough detail to figure out what's wrong / missing from the kernel / initrd.
Sorry, it's still not clear from the previous conversation -- in addition to updating to Centos 6.6, have you also switched to the official Xen4CentOS repos (i.e., by installing centos-release-xen)?
-George
Sorry, it's still not clear from the previous conversation -- in addition to updating to Centos 6.6, have you also switched to the official Xen4CentOS repos (i.e., by installing centos-release-xen)?
Sorry for not making it clear! :)
Yes, I upgraded to both CentOS 6.6 and Xen4CentOS, although because we're building packages in a mock we're using the repository (http://mirror.centos.org/centos/6/xen4/x86_64/) directly for both the build and the install.
Bob
On 12/09/2014 04:48 AM, George Dunlap wrote:
On Thu, Dec 4, 2014 at 12:39 PM, Bob Ball bob.ball@citrix.com wrote:
Thanks all for the advice.
It seems there is an issue with Dracut booting from these hosts when LVM is used.
dracut: Scanning devices sda2 for LVM logical volumes VolGroup/lv_swap VolGroup/lv_root dracut: inactive '/dev/VolGroup/lv_swap' [1.94 GiB] inherit dracut: inactive '/dev/VolGroup/lv_root' [230.69 GiB] inherit dracut: PARTIAL MODE. Incomplete logical volumes will be processed. dracut: Operation prohibited while global/metadata_read_only is set. dracut: Operation prohibited while global/metadata_read_only is set. ... dracut Warning: LVM VolGroup/lv_swap not found dracut Warning: LVM VolGroup/lv_root not found
Switching my kickstart to use real partitions rather than LVM solved the issue. Not sure if that's enough detail to figure out what's wrong / missing from the kernel / initrd.
Sorry, it's still not clear from the previous conversation -- in addition to updating to Centos 6.6, have you also switched to the official Xen4CentOS repos (i.e., by installing centos-release-xen)?
LVM not working is something we can look at .. I will try to find a drive to use to test this.