Hi,
I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both the 4.9.34-29 and 4.9.39-29 kernels.
I've attached a txt with two different servers outputs.
Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29
Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29
Both are on different hardware platforms, and have had a long history of being stable until these upgrades.
It sounds potentially related to https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this patch is in the above kernels.
Any suggestions / thoughts?
Cheers,
Nathan
This appears to be a centos kernel issue rather than a xen one.
https://lkml.org/lkml/2016/5/17/440
Digging through the posts and not clear why this never made it upstream.
I'm going to apply that patch to my systems and see if it resolves, but won't know for certain until a week or two of stability goes by.
- Nathan
From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of Nathan March Sent: Wednesday, August 23, 2017 2:48 PM To: centos-virt@centos.org Subject: [CentOS-virt] Major stability problems with xen 4.6.6
Hi,
I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both the 4.9.34-29 and 4.9.39-29 kernels.
I've attached a txt with two different servers outputs.
Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29
Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29
Both are on different hardware platforms, and have had a long history of being stable until these upgrades.
It sounds potentially related to https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this patch is in the above kernels.
Any suggestions / thoughts?
Cheers,
Nathan
Just in case anyone else on this list is running into similar issues, I can confirm that the patch appears to have resolved this.
I've opened https://bugs.centos.org/view.php?id=13713
It was so bad that having the system under load (with rpmbuild) and opening another ssh window or two would almost always cause the oops.
Cheers,
Nathan
From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of Nathan March Sent: Wednesday, August 23, 2017 3:32 PM To: 'Discussion about the virtualization on CentOS' centos-virt@centos.org Subject: Re: [CentOS-virt] Major stability problems with xen 4.6.6
This appears to be a centos kernel issue rather than a xen one.
https://lkml.org/lkml/2016/5/17/440
Digging through the posts and not clear why this never made it upstream.
I'm going to apply that patch to my systems and see if it resolves, but won't know for certain until a week or two of stability goes by.
- Nathan
From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of Nathan March Sent: Wednesday, August 23, 2017 2:48 PM To: centos-virt@centos.org mailto:centos-virt@centos.org Subject: [CentOS-virt] Major stability problems with xen 4.6.6
Hi,
I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both the 4.9.34-29 and 4.9.39-29 kernels.
I've attached a txt with two different servers outputs.
Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29
Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29
Both are on different hardware platforms, and have had a long history of being stable until these upgrades.
It sounds potentially related to https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this patch is in the above kernels.
Any suggestions / thoughts?
Cheers,
Nathan
Hi,
On Thu, Aug 24, 2017 at 03:45:46PM -0700, Nathan March wrote:
Just in case anyone else on this list is running into similar issues, I can confirm that the patch appears to have resolved this.
I've opened [1]https://bugs.centos.org/view.php?id=13713
It was so bad that having the system under load (with rpmbuild) and opening another ssh window or two would almost always cause the oops.
It seems the patch you mentioned was merged to upstream Linux here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
and then reverted/removed here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Do you know if there has been proper/fixed patch after that? has it been merged to upstream Linux kernel already?
Thanks,
-- Pasi
Cheers,
Nathan
From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of Nathan March Sent: Wednesday, August 23, 2017 3:32 PM To: 'Discussion about the virtualization on CentOS' centos-virt@centos.org Subject: Re: [CentOS-virt] Major stability problems with xen 4.6.6
This appears to be a centos kernel issue rather than a xen one.
[2]https://lkml.org/lkml/2016/5/17/440
Digging through the posts and not clear why this never made it upstream...
I'm going to apply that patch to my systems and see if it resolves, but won't know for certain until a week or two of stability goes by.
- Nathan
From: CentOS-virt [[3]mailto:centos-virt-bounces@centos.org] On Behalf Of Nathan March Sent: Wednesday, August 23, 2017 2:48 PM To: [4]centos-virt@centos.org Subject: [CentOS-virt] Major stability problems with xen 4.6.6
Hi,
I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both the 4.9.34-29 and 4.9.39-29 kernels.
I've attached a txt with two different servers outputs.
Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29
Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29
Both are on different hardware platforms, and have had a long history of being stable until these upgrades.
It sounds potentially related to [5]https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl... but I've confirmed this patch is in the above kernels.
Any suggestions / thoughts?
Cheers,
Nathan
It seems the patch you mentioned was merged to upstream Linux here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
d=71472fa9c52b1da27663c275d416d8654b905f05
and then reverted/removed here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
d=896d81fefe5d1919537db2c2150ab6384e4a6610
Do you know if there has been proper/fixed patch after that? has it been merged to upstream Linux kernel already?
Interesting! I didn't come across that when digging into this.
It looks like this hasn't been followed up on at all since April: https://lists.gt.net/engine?list=linux;do=search_results;search_type=AND;sea rch_forum=forum_1;search_string=ldisc%20reopened&sb=post_time
Currently I've got ~40 dom0's running with the patch on 4.9.44-39 and it's resolved all stability issues, previously I was seeing multiple crashes a week.
Cheers, Nathan