xen-c6 fails to boot

List overview All Threads
Download

newer

older

Reminder: CentOS Virt SIG meeting...

Virt SIG meeting minutes 2 Dec 2014

Bob Ball

1 Dec 2014 1 Dec '14

10:48 a.m.

Hi all,

Following instructions at http://wiki.centos.org/QaWiki/Xen4 to setup Xen on CentOS 6.4. Unfortunately after installing Xen and modifying the boot line there is a kernel panic during the boot process causing the host to enter a reboot loop. Console log attached.

[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b

xen-4.2.2-23.el6.centos.alt.x86_64 kernel-3.4.53-8.el6.centos.alt.x86_64

title xen root (hd0,0) kernel /xen.gz dom0_mem=256M,max:256M loglvl=all guest_loglvl=all module /vmlinuz-3.4.53-8.el6.centos.alt.x86_64 ro root=/dev/mapper/vg_cs-lv_root rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=uk LANG=en_US.UTF-8 rd_LVM_LV=vg_cs/lv_root rd_Np module /initramfs-3.4.53-8.el6.centos.alt.x86_64.img

This issue was found in an automated environment that actually uses Dave Scott's Xen 4.4 branch (http://xenbits.xen.org/djs/centos-xen-4-4/) however in trying to diagnose the issue we found that the base xen-c6 combination failed in the same way. Note also that the last successful run (which was a while ago due to a configuration issue) used the same xen and kernel as the now-failing environments. Both the failing and passing environments are using xen-4.4.0-2.el6.x86_64 and kernel-3.4.53-8.el6.centos.alt.x86_64.

Is the current combination of xen/kernel in xen-c6 working for others at the moment? Are there any thoughts on what might be causing this regression?

Bob

Attachments:

console.log (application/octet-stream — 10.9 KB)

Show replies by date

Johnny Hughes

1 Dec 1 Dec

4:02 p.m.

On 12/01/2014 04:48 AM, Bob Ball wrote:

...

Hi all,

Following instructions at http://wiki.centos.org/QaWiki/Xen4 to setup Xen on CentOS 6.4. Unfortunately after installing Xen and modifying the boot line there is a kernel panic during the boot process causing the host to enter a reboot loop. Console log attached.

[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b

xen-4.2.2-23.el6.centos.alt.x86_64 kernel-3.4.53-8.el6.centos.alt.x86_64

title xen root (hd0,0) kernel /xen.gz dom0_mem=256M,max:256M loglvl=all guest_loglvl=all module /vmlinuz-3.4.53-8.el6.centos.alt.x86_64 ro root=/dev/mapper/vg_cs-lv_root rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=uk LANG=en_US.UTF-8 rd_LVM_LV=vg_cs/lv_root rd_Np module /initramfs-3.4.53-8.el6.centos.alt.x86_64.img

This issue was found in an automated environment that actually uses Dave Scott's Xen 4.4 branch (http://xenbits.xen.org/djs/centos-xen-4-4/) however in trying to diagnose the issue we found that the base xen-c6 combination failed in the same way. Note also that the last successful run (which was a while ago due to a configuration issue) used the same xen and kernel as the now-failing environments. Both the failing and passing environments are using xen-4.4.0-2.el6.x86_64 and kernel-3.4.53-8.el6.centos.alt.x86_64.

Is the current combination of xen/kernel in xen-c6 working for others at the moment? Are there any thoughts on what might be causing this regression?

Bob

It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.

Use this link: http://wiki.centos.org/HowTos/Xen/Xen4QuickStart

BUT .. it seems to be a hardware/driver issue.

Thanks, Johnny Hughes

Bob Ball

2 Dec 2 Dec

1:36 p.m.

...

-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:

...
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b

It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.

Updated to CentOS-6.6, but I still get the same issue.

By the above I assume you're using the xen4 repo rather than the xen-c6 repository referred to by http://wiki.centos.org/QaWiki/Xen4? Is the xen-c6 repo now considered broken or deprecated with the xen4 repo used in preference?

...

BUT .. it seems to be a hardware/driver issue.

The same hardware (cluster of 10 machines) was successfully working with the xen-c6 repository previously; I'm not sure what issue might have occurred to cause this failure on all hosts which is why I think it's a software issue. Possibly a driver issue although the last successful run was using the same kernel so I assume had roughly the same drivers installed. Note that the 3.4 kernel boots fine without Xen, it is only under Xen that the boot fails and the machine restarts.

Bob

George Dunlap

3 Dec 3 Dec

10:45 a.m.

On Tue, Dec 2, 2014 at 1:36 PM, Bob Ball bob.ball@citrix.com wrote:

...

...
-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:

...
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b

It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.

Updated to CentOS-6.6, but I still get the same issue.

By the above I assume you're using the xen4 repo rather than the xen-c6 repository referred to by http://wiki.centos.org/QaWiki/Xen4? Is the xen-c6 repo now considered broken or deprecated with the xen4 repo used in preference?

Yes, the top of that page says:

"This is a development release, only meant for testing purposes. We do not recommend anyone deploy production systems using the content mentioned here. "

The wiki page Johnny pointed you to is the officially supported Xen 4 CentOS binary now.

We should probably delete that wiki page -- thanks for finding it. :-)

-George

Johnny Hughes

4 Dec 4 Dec

9:50 a.m.

On 12/02/2014 07:36 AM, Bob Ball wrote:

...

...
-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:

...
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b

It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.

Updated to CentOS-6.6, but I still get the same issue.

By the above I assume you're using the xen4 repo rather than the xen-c6 repository referred to by http://wiki.centos.org/QaWiki/Xen4? Is the xen-c6 repo now considered broken or deprecated with the xen4 repo used in preference?

...
BUT .. it seems to be a hardware/driver issue.

The same hardware (cluster of 10 machines) was successfully working with the xen-c6 repository previously; I'm not sure what issue might have occurred to cause this failure on all hosts which is why I think it's a software issue. Possibly a driver issue although the last successful run was using the same kernel so I assume had roughly the same drivers installed. Note that the 3.4 kernel boots fine without Xen, it is only under Xen that the boot fails and the machine restarts.

What I mean by hardware issue is the way the hardware interacts with the newer versions of xen. I guess what I should have said is that there is some unique issue with your hardware.

The updates have have posted are needed for numerous security updates, so I would not recommend running older versions long term for security reasons ... BUT ... all the previously released software is here:

http://vault.centos.org/6.4/xen4/

http://vault.centos.org/6.5/xen4/

and

http://mirror.centos.org/centos/6.6/xen4/

In this unique case (ie, your exact hardware and software combinations), you may need to experiment with and find the exact combination of software that works for you.

In any event, all the software we have previously released is in those locations, so getting a combination that works so we can isolate the issue that causes it all to die is likely the best starting point.

Bob Ball

12:39 p.m.

Thanks all for the advice.

It seems there is an issue with Dracut booting from these hosts when LVM is used.

dracut: Scanning devices sda2 for LVM logical volumes VolGroup/lv_swap VolGroup/lv_root dracut: inactive '/dev/VolGroup/lv_swap' [1.94 GiB] inherit dracut: inactive '/dev/VolGroup/lv_root' [230.69 GiB] inherit dracut: PARTIAL MODE. Incomplete logical volumes will be processed. dracut: Operation prohibited while global/metadata_read_only is set. dracut: Operation prohibited while global/metadata_read_only is set. ... dracut Warning: LVM VolGroup/lv_swap not found dracut Warning: LVM VolGroup/lv_root not found

Switching my kickstart to use real partitions rather than LVM solved the issue. Not sure if that's enough detail to figure out what's wrong / missing from the kernel / initrd.

Bob

...

-----Original Message----- From: centos-virt-bounces@centos.org [mailto:centos-virt- bounces@centos.org] On Behalf Of Johnny Hughes Sent: 04 December 2014 09:51 To: centos-virt@centos.org Subject: Re: [CentOS-virt] xen-c6 fails to boot

On 12/02/2014 07:36 AM, Bob Ball wrote:

...
...
-----Original Message----- From: Johnny Hughes On 12/01/2014 04:48 AM, Bob Ball wrote:

...
[<ffffffff81575480>] panic+0xc4/0x1e1 [<ffffffff81054836>] find_new_reaper+0x176/0x180 [<ffffffff81055345>] forget_original_parent+0x45/0x2c0 [<ffffffff81107214>] ? task_function_call+0x44/0x50 [<ffffffff810555d7>] exit_notify+0x17/0x140 [<ffffffff81057053>] do_exit+0x1f3/0x450 [<ffffffff81057305>] do_group_exit+0x55/0xd0 [<ffffffff81057397>] sys_exit_group+0x17/0x20 [<ffffffff815806a9>] system_call_fastpath+0x16/0x1b

It works fine for me .. you might consider using CentOS-6.6 and not CentOS-6.4 .. also, we now use a 3.10 kernel and the latest version of xen is 4.2.5 in the /6.6/xen4/ repo.

Updated to CentOS-6.6, but I still get the same issue.

By the above I assume you're using the xen4 repo rather than the xen-c6

repository referred to by http://wiki.centos.org/QaWiki/Xen4?

...
Is the xen-c6 repo now considered broken or deprecated with the xen4

repo used in preference?

...
...
BUT .. it seems to be a hardware/driver issue.

The same hardware (cluster of 10 machines) was successfully working with

the xen-c6 repository previously; I'm not sure what issue might have occurred to cause this failure on all hosts which is why I think it's a software issue. Possibly a driver issue although the last successful run was using the same kernel so I assume had roughly the same drivers installed. Note that the 3.4 kernel boots fine without Xen, it is only under Xen that the boot fails and the machine restarts.

...
What I mean by hardware issue is the way the hardware interacts with the newer versions of xen. I guess what I should have said is that there is some unique issue with your hardware.

The updates have have posted are needed for numerous security updates, so I would not recommend running older versions long term for security reasons ... BUT ... all the previously released software is here:

http://vault.centos.org/6.4/xen4/

http://vault.centos.org/6.5/xen4/

and

http://mirror.centos.org/centos/6.6/xen4/

In this unique case (ie, your exact hardware and software combinations), you may need to experiment with and find the exact combination of software that works for you.

In any event, all the software we have previously released is in those locations, so getting a combination that works so we can isolate the issue that causes it all to die is likely the best starting point.

George Dunlap

9 Dec 9 Dec

10:48 a.m.

On Thu, Dec 4, 2014 at 12:39 PM, Bob Ball bob.ball@citrix.com wrote:

...

Thanks all for the advice.

It seems there is an issue with Dracut booting from these hosts when LVM is used.

dracut: Scanning devices sda2 for LVM logical volumes VolGroup/lv_swap VolGroup/lv_root dracut: inactive '/dev/VolGroup/lv_swap' [1.94 GiB] inherit dracut: inactive '/dev/VolGroup/lv_root' [230.69 GiB] inherit dracut: PARTIAL MODE. Incomplete logical volumes will be processed. dracut: Operation prohibited while global/metadata_read_only is set. dracut: Operation prohibited while global/metadata_read_only is set. ... dracut Warning: LVM VolGroup/lv_swap not found dracut Warning: LVM VolGroup/lv_root not found

Switching my kickstart to use real partitions rather than LVM solved the issue. Not sure if that's enough detail to figure out what's wrong / missing from the kernel / initrd.

Sorry, it's still not clear from the previous conversation -- in addition to updating to Centos 6.6, have you also switched to the official Xen4CentOS repos (i.e., by installing centos-release-xen)?

-George

Bob Ball

11:21 a.m.

...

Sorry, it's still not clear from the previous conversation -- in addition to updating to Centos 6.6, have you also switched to the official Xen4CentOS repos (i.e., by installing centos-release-xen)?

Sorry for not making it clear! :)

Yes, I upgraded to both CentOS 6.6 and Xen4CentOS, although because we're building packages in a mock we're using the repository (http://mirror.centos.org/centos/6/xen4/x86_64/) directly for both the build and the install.

Bob

Johnny Hughes

4:17 p.m.

On 12/09/2014 04:48 AM, George Dunlap wrote:

...

On Thu, Dec 4, 2014 at 12:39 PM, Bob Ball bob.ball@citrix.com wrote:

...
Thanks all for the advice.

It seems there is an issue with Dracut booting from these hosts when LVM is used.

dracut: Scanning devices sda2 for LVM logical volumes VolGroup/lv_swap VolGroup/lv_root dracut: inactive '/dev/VolGroup/lv_swap' [1.94 GiB] inherit dracut: inactive '/dev/VolGroup/lv_root' [230.69 GiB] inherit dracut: PARTIAL MODE. Incomplete logical volumes will be processed. dracut: Operation prohibited while global/metadata_read_only is set. dracut: Operation prohibited while global/metadata_read_only is set. ... dracut Warning: LVM VolGroup/lv_swap not found dracut Warning: LVM VolGroup/lv_root not found

Switching my kickstart to use real partitions rather than LVM solved the issue. Not sure if that's enough detail to figure out what's wrong / missing from the kernel / initrd.

Sorry, it's still not clear from the previous conversation -- in addition to updating to Centos 6.6, have you also switched to the official Xen4CentOS repos (i.e., by installing centos-release-xen)?

LVM not working is something we can look at .. I will try to find a drive to use to test this.

4108

Age (days ago)

4116

Last active (days ago)

virt@lists.centos.org

8 comments

3 participants

tags (0)

participants (3)

Bob Ball
George Dunlap
Johnny Hughes