Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

List overview All Threads
Download

newer

older

Difference between qemu-kvm-ev and...

Xen DomU's randomly freezing

George Dunlap

17 Jan 2018 17 Jan '18

5:14 p.m.

I've built & tagged packages for CentOS 6 and 7 4.6.6-9, with XPTI "stage 1" Meltdown mitigation.

This will allow 64-bit PV guests to run safely (with a few caveats), but incurs a fairly significant slowdown for 64-bit PV guests on Intel boxes (including domain 0).

If you prefer using Vixen / Comet, you can turn it off by adding 'xpti=0' to your Xen command-line.

Detailed information can be found in the XSA-254 advisory:

https://xenbits.xen.org/xsa/advisory-254.html

Please test and report any issues you have. I'll probably tag then with -release tomorrow.

4.8 packages should be coming to buildlogs soon.

-George

Show replies by date

Peter Peltonen

18 Jan 18 Jan

7:18 p.m.

Thanks George.

As there are now quite many options to choose from, what would be the best option performance wise for running 32bit domUs under xen-4.6?

Best, Peter

On Wed, Jan 17, 2018 at 7:14 PM, George Dunlap dunlapg@umich.edu wrote:

...

I've built & tagged packages for CentOS 6 and 7 4.6.6-9, with XPTI "stage 1" Meltdown mitigation.

This will allow 64-bit PV guests to run safely (with a few caveats), but incurs a fairly significant slowdown for 64-bit PV guests on Intel boxes (including domain 0).

If you prefer using Vixen / Comet, you can turn it off by adding 'xpti=0' to your Xen command-line.

Detailed information can be found in the XSA-254 advisory:

https://xenbits.xen.org/xsa/advisory-254.html

Please test and report any issues you have. I'll probably tag then with -release tomorrow.

4.8 packages should be coming to buildlogs soon.

-George _______________________________________________ CentOS-virt mailing list CentOS-virt@centos.org https://lists.centos.org/mailman/listinfo/centos-virt

Nathan March

7:31 p.m.

New subject: Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

...

-----Original Message----- From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of Peter Peltonen Sent: Thursday, January 18, 2018 11:19 AM To: Discussion about the virtualization on CentOS centos-virt@centos.org Subject: Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Thanks George.

As there are now quite many options to choose from, what would be the best option performance wise for running 32bit domUs under xen-4.6?

Best, Peter

It's worth taking a look at the table in the latest XSA, it helps clarify a fair bit:

https://xenbits.xen.org/xsa/advisory-254.html

Cheers, Nathan

Peter Peltonen

7:36 p.m.

Hi Nathan,

On Thu, Jan 18, 2018 at 9:31 PM, Nathan March nathan@gt.net wrote:

...

...
-----Original Message----- As there are now quite many options to choose from, what would be the best option performance wise for running 32bit domUs under xen-4.6?

Best, Peter

It's worth taking a look at the table in the latest XSA, it helps clarify a fair bit:

https://xenbits.xen.org/xsa/advisory-254.html

thanks for pointing this out, but there is a disclaimer:

"Everything in this section applies to 64-bit PV x86 guests only."

It also reads in the advisory "32-bit PV guests cannot exploit SP3"

So I am wondering if I just "yum update" will I get some fixes I do not need that will slow my guests down?

Best, Peter

Nathan March

22 Jan 22 Jan

10:38 p.m.

New subject: Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Just a heads up that I'm seeing major stability problems on these builds. Didn't have console capture setup unfortunately, but have seen my test hypervisor hard lock twice over the weekend.

This is with xpti being used, rather than the shim.

Cheers, Nathan

...

-----Original Message----- From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of George Dunlap Sent: Wednesday, January 17, 2018 9:14 AM To: Discussion about the virtualization on CentOS centos-virt@centos.org Subject: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation)

packages

...

making their way to centos-virt-xen-testing

I've built & tagged packages for CentOS 6 and 7 4.6.6-9, with XPTI "stage 1" Meltdown mitigation.

This will allow 64-bit PV guests to run safely (with a few caveats), but incurs a fairly significant slowdown for 64-bit PV guests on Intel boxes (including domain 0).

If you prefer using Vixen / Comet, you can turn it off by adding 'xpti=0' to your Xen command-line.

Detailed information can be found in the XSA-254 advisory:

https://xenbits.xen.org/xsa/advisory-254.html

Please test and report any issues you have. I'll probably tag then with -release tomorrow.

4.8 packages should be coming to buildlogs soon.

-George _______________________________________________ CentOS-virt mailing list CentOS-virt@centos.org https://lists.centos.org/mailman/listinfo/centos-virt

George Dunlap

23 Jan 23 Jan

11:39 a.m.

On Mon, Jan 22, 2018 at 10:38 PM, Nathan March nathan@gt.net wrote:

...

Just a heads up that I'm seeing major stability problems on these builds. Didn't have console capture setup unfortunately, but have seen my test hypervisor hard lock twice over the weekend.

This is with xpti being used, rather than the shim.

Thanks for the heads-up. It's been running through XenServer's tests as well as the XenProject's "osstest" -- I haven't heard of any additional issues, but I'll ask.

It's possible also that it's some interaction between the specific CentOS environment -- the gcc version, the particular dom0 kernel, &c. I'll ask Anthony if he can run the CentOS packages through osstest and see if anything comes up.

-George

Nathan March

6:35 p.m.

New subject: Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

...

Thanks for the heads-up. It's been running through XenServer's tests as well as the XenProject's "osstest" -- I haven't heard of any additional issues, but I'll ask.

Looks like I can reproduce this pretty easily, this happened upon ssh'ing into the server while I had a VM migrating into it. The system goes completely unresponsive (can't even enter a keystroke via console):

[64722.291300] vlan208: port 4(vif5.0) entered forwarding state [64722.291695] NOHZ: local_softirq_pending 08 [64929.006981] BUG: unable to handle kernel paging request at 0000000000002260 [64929.007020] IP: [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 [64929.007049] PGD 1f7a53067 [64929.007057] PUD 1ee0d4067 PMD 0 [64929.007069] [64929.007077] Oops: 0000 [#1] SMP [64929.007088] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb dca ptp pps_core uas usb_storage wmi ttm [64929.007327] CPU: 15 PID: 17696 Comm: kworker/u48:0 Not tainted 4.9.75-30.el6.x86_64 #1 [64929.007343] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1 03/04/2015 [64929.007362] Workqueue: events_unbound flush_to_ldisc [64929.007376] task: ffff8801fbc70580 task.stack: ffffc90048af8000 [64929.007415] RIP: e030:[<ffffffff81533a24>] [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 [64929.007465] RSP: e02b:ffffc90048afbb08 EFLAGS: 00010296 [64929.007476] RAX: 0000000000002260 RBX: 0000000000000000 RCX: 0000000000000002 [64929.007519] RDX: 0000000000000000 RSI: ffff8801dc0f3c20 RDI: ffff8801f9b8acd8 [64929.007563] RBP: ffffc90048afbb78 R08: 0000000000000001 R09: ffffffff8210f1c0 [64929.007577] R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000000002 [64929.007620] R13: ffff8801f9b8ac00 R14: 0000000000000000 R15: ffff8801dc0f3c20 [64929.007675] FS: 00007fcfc0af8700(0000) GS:ffff880204dc0000(0000) knlGS:0000000000000000 [64929.007718] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [64929.007759] CR2: 0000000000002260 CR3: 00000001f067b000 CR4: 0000000000042660 [64929.007782] Stack: [64929.007806] ffffc90048afbb38 0000000000000000 ffff8801f9b8acd8 0000000104dda030 [64929.007858] 0000000000002260 00000000fbc72700 ffff880204dc48c0 0000000000000000 [64929.007941] ffff880204dce890 ffff8801dc0f3c00 ffff8801f7f25c00 ffffc90048afbbf8 [64929.007994] Call Trace: [64929.008008] [<ffffffff81533b84>] n_tty_receive_buf2+0x14/0x20 [64929.008048] [<ffffffff81536763>] tty_ldisc_receive_buf+0x23/0x50 [64929.008088] [<ffffffff81536b88>] flush_to_ldisc+0xc8/0x100 [64929.008133] [<ffffffff8102eb7b>] ? __switch_to+0x20b/0x690 [64929.008176] [<ffffffff81025375>] ? xen_clocksource_read+0x15/0x20 [64929.008222] [<ffffffff810c0030>] process_one_work+0x170/0x500 [64929.008268] [<ffffffff818dac28>] ? __schedule+0x238/0x530 [64929.008310] [<ffffffff818db00a>] ? schedule+0x3a/0xa0 [64929.008324] [<ffffffff810c1ca6>] worker_thread+0x166/0x530 [64929.008368] [<ffffffff810e6a69>] ? put_prev_entity+0x29/0x140 [64929.008412] [<ffffffff818dac28>] ? __schedule+0x238/0x530 [64929.008458] [<ffffffff810d4082>] ? default_wake_function+0x12/0x20 [64929.008502] [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120 [64929.008518] [<ffffffff818db00a>] ? schedule+0x3a/0xa0 [64929.008555] [<ffffffff818dedf6>] ? _raw_spin_unlock_irqrestore+0x16/0x20 [64929.008599] [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120 [64929.008616] [<ffffffff810c6ae5>] kthread+0xe5/0x100 [64929.008630] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 [64929.008643] [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40 [64929.008659] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 [64929.008673] [<ffffffff818df5a1>] ret_from_fork+0x41/0x50 [64929.008685] Code: 89 fe 4c 89 ef 89 45 98 e8 aa fb ff ff 8b 45 98 48 63 d0 48 85 db 48 8d 0c 13 48 0f 45 d9 01 45 bc 49 01 d7 41 29 c4 48 8b 45 b0 <48> 8b 30 48 89 75 c0 49 8b 0e 8d 96 00 10 00 00 29 ca 41 f6 85 [64929.008894] RIP [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 [64929.008914] RSP <ffffc90048afbb08> [64929.008923] CR2: 0000000000002260 [64929.009641] ---[ end trace e1da1cdf77fed144 ]--- [64929.009785] BUG: unable to handle kernel paging request at ffffffffffffffd8 [64929.009804] IP: [<ffffffff810c62c0>] kthread_data+0x10/0x20 [64929.009823] PGD 200d067 [64929.009831] PUD 200f067 PMD 0 [64929.009842] [64929.009850] Oops: 0000 [#2] SMP [64929.009864] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb dca ptp pps_core uas usb_storage wmi ttm [64929.010054] CPU: 15 PID: 17696 Comm: kworker/u48:0 Tainted: G D 4.9.75-30.el6.x86_64 #1 [64929.010068] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1 03/04/2015 [64929.010127] task: ffff8801fbc70580 task.stack: ffffc90048af8000 [64929.010138] RIP: e030:[<ffffffff810c62c0>] [<ffffffff810c62c0>] kthread_data+0x10/0x20 [64929.010153] RSP: e02b:ffffc90048afbdd8 EFLAGS: 00010086 [64929.010162] RAX: 0000000000000000 RBX: ffff880204dd9fc0 RCX: 000000000000000f [64929.010174] RDX: ffff880200409400 RSI: ffff8801fbc70580 RDI: ffff8801fbc70580 [64929.010185] RBP: ffffc90048afbdd8 R08: ffff880204dc0000 R09: 00000006401f55c3 [64929.010197] R10: dead000000000200 R11: dead000000000200 R12: 0000000000019fc0 [64929.010208] R13: ffff8801fbc70580 R14: 0000000000000000 R15: ffff8801fbc70f40 [64929.010229] FS: 00007fcfc0af8700(0000) GS:ffff880204dc0000(0000) knlGS:0000000000000000 [64929.010241] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [64929.010251] CR2: 0000000000000028 CR3: 00000001f067b000 CR4: 0000000000042660 [64929.010270] Stack: [64929.010275] ffffc90048afbe08 ffffffff810bda72 ffffc90048afbdf8 ffff880204dd9fc0 [64929.010295] 0000000000019fc0 ffff8801fbc70580 ffffc90048afbe78 ffffffff818dae04 [64929.010314] 0000000000000001 ffff8801ef0a2400 ffffc90048afbe48 ffff8801f0e67708 [64929.010333] Call Trace: [64929.010340] [<ffffffff810bda72>] wq_worker_sleeping+0x12/0xa0 [64929.010352] [<ffffffff818dae04>] __schedule+0x414/0x530 [64929.010362] [<ffffffff810d63ec>] do_task_dead+0x3c/0x40 [64929.010373] [<ffffffff810aaade>] do_exit+0x24e/0x480 [64929.010383] [<ffffffff810c6ae5>] ? kthread+0xe5/0x100 [64929.010393] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 [64929.010403] [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40 [64929.010415] [<ffffffff818e0db7>] rewind_stack_do_exit+0x17/0x20 [64929.010425] Code: 48 09 00 00 48 8b 40 c8 c9 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 87 48 09 00 00 <48> 8b 40 d8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 [64929.010607] RIP [<ffffffff810c62c0>] kthread_data+0x10/0x20 [64929.010620] RSP <ffffc90048afbdd8> [64929.010626] CR2: ffffffffffffffd8 [64929.010638] ---[ end trace e1da1cdf77fed145 ]--- [64929.010647] Fixing recursive fault but reboot is needed!

This is a centos 6 system booting with:

kernel /boot/xen.gz dom0_mem=6144M,max:6144M cpuinfo com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all msi=off com2=115200,8n1 console=com2,tty1 module /boot/vmlinuz-4.9.75-30.el6.x86_64 ro root=UUID=ffab1fdf-28a4-4239-b112-5e920e3d6c36 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM biosdevname=0 nomodeset max_loop=128 max_loop=512 xencons=ttyS1 console=hvc0,tty0 module /boot/initramfs-4.9.75-30.el6.x86_64.img

Running xen-4.6.6-9.el6. Note that I have msi=off set in the xen parameters due to hitting the same network card bug that Kevin Stange was hitting.

Happy to grab any further info as required, or let me know if this is better suited on xen-devel.

Cheers, Nathan

Pasi Kärkkäinen

9:32 p.m.

Hi,

On Tue, Jan 23, 2018 at 10:35:24AM -0800, Nathan March wrote:

...

...
Thanks for the heads-up. It's been running through XenServer's tests as well as the XenProject's "osstest" -- I haven't heard of any additional issues, but I'll ask.

Looks like I can reproduce this pretty easily, this happened upon ssh'ing into the server while I had a VM migrating into it. The system goes completely unresponsive (can't even enter a keystroke via console):

[64722.291300] vlan208: port 4(vif5.0) entered forwarding state [64722.291695] NOHZ: local_softirq_pending 08 [64929.006981] BUG: unable to handle kernel paging request at 0000000000002260 [64929.007020] IP: [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 [64929.007049] PGD 1f7a53067 [64929.007057] PUD 1ee0d4067 PMD 0 [64929.007069] [64929.007077] Oops: 0000 [#1] SMP [64929.007088] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb dca ptp pps_core uas usb_storage wmi ttm [64929.007327] CPU: 15 PID: 17696 Comm: kworker/u48:0 Not tainted 4.9.75-30.el6.x86_64 #1 [64929.007343] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1 03/04/2015 [64929.007362] Workqueue: events_unbound flush_to_ldisc [64929.007376] task: ffff8801fbc70580 task.stack: ffffc90048af8000 [64929.007415] RIP: e030:[<ffffffff81533a24>] [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 [64929.007465] RSP: e02b:ffffc90048afbb08 EFLAGS: 00010296 [64929.007476] RAX: 0000000000002260 RBX: 0000000000000000 RCX: 0000000000000002 [64929.007519] RDX: 0000000000000000 RSI: ffff8801dc0f3c20 RDI: ffff8801f9b8acd8 [64929.007563] RBP: ffffc90048afbb78 R08: 0000000000000001 R09: ffffffff8210f1c0 [64929.007577] R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000000002 [64929.007620] R13: ffff8801f9b8ac00 R14: 0000000000000000 R15: ffff8801dc0f3c20 [64929.007675] FS: 00007fcfc0af8700(0000) GS:ffff880204dc0000(0000) knlGS:0000000000000000 [64929.007718] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [64929.007759] CR2: 0000000000002260 CR3: 00000001f067b000 CR4: 0000000000042660 [64929.007782] Stack: [64929.007806] ffffc90048afbb38 0000000000000000 ffff8801f9b8acd8 0000000104dda030 [64929.007858] 0000000000002260 00000000fbc72700 ffff880204dc48c0 0000000000000000 [64929.007941] ffff880204dce890 ffff8801dc0f3c00 ffff8801f7f25c00 ffffc90048afbbf8 [64929.007994] Call Trace: [64929.008008] [<ffffffff81533b84>] n_tty_receive_buf2+0x14/0x20 [64929.008048] [<ffffffff81536763>] tty_ldisc_receive_buf+0x23/0x50

Hmm.. isn't this the ldisc bug that was discussed a few months ago on this list, and a patch was applied to virt-sig kernel aswell?

Call trace looks similar..

-- Pasi

...

[64929.008088] [<ffffffff81536b88>] flush_to_ldisc+0xc8/0x100 [64929.008133] [<ffffffff8102eb7b>] ? __switch_to+0x20b/0x690 [64929.008176] [<ffffffff81025375>] ? xen_clocksource_read+0x15/0x20 [64929.008222] [<ffffffff810c0030>] process_one_work+0x170/0x500 [64929.008268] [<ffffffff818dac28>] ? __schedule+0x238/0x530 [64929.008310] [<ffffffff818db00a>] ? schedule+0x3a/0xa0 [64929.008324] [<ffffffff810c1ca6>] worker_thread+0x166/0x530 [64929.008368] [<ffffffff810e6a69>] ? put_prev_entity+0x29/0x140 [64929.008412] [<ffffffff818dac28>] ? __schedule+0x238/0x530 [64929.008458] [<ffffffff810d4082>] ? default_wake_function+0x12/0x20 [64929.008502] [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120 [64929.008518] [<ffffffff818db00a>] ? schedule+0x3a/0xa0 [64929.008555] [<ffffffff818dedf6>] ? _raw_spin_unlock_irqrestore+0x16/0x20 [64929.008599] [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120 [64929.008616] [<ffffffff810c6ae5>] kthread+0xe5/0x100 [64929.008630] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 [64929.008643] [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40 [64929.008659] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 [64929.008673] [<ffffffff818df5a1>] ret_from_fork+0x41/0x50 [64929.008685] Code: 89 fe 4c 89 ef 89 45 98 e8 aa fb ff ff 8b 45 98 48 63 d0 48 85 db 48 8d 0c 13 48 0f 45 d9 01 45 bc 49 01 d7 41 29 c4 48 8b 45 b0 <48> 8b 30 48 89 75 c0 49 8b 0e 8d 96 00 10 00 00 29 ca 41 f6 85 [64929.008894] RIP [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 [64929.008914] RSP <ffffc90048afbb08> [64929.008923] CR2: 0000000000002260 [64929.009641] ---[ end trace e1da1cdf77fed144 ]--- [64929.009785] BUG: unable to handle kernel paging request at ffffffffffffffd8 [64929.009804] IP: [<ffffffff810c62c0>] kthread_data+0x10/0x20 [64929.009823] PGD 200d067 [64929.009831] PUD 200f067 PMD 0 [64929.009842] [64929.009850] Oops: 0000 [#2] SMP [64929.009864] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb dca ptp pps_core uas usb_storage wmi ttm [64929.010054] CPU: 15 PID: 17696 Comm: kworker/u48:0 Tainted: G D 4.9.75-30.el6.x86_64 #1 [64929.010068] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1 03/04/2015 [64929.010127] task: ffff8801fbc70580 task.stack: ffffc90048af8000 [64929.010138] RIP: e030:[<ffffffff810c62c0>] [<ffffffff810c62c0>] kthread_data+0x10/0x20 [64929.010153] RSP: e02b:ffffc90048afbdd8 EFLAGS: 00010086 [64929.010162] RAX: 0000000000000000 RBX: ffff880204dd9fc0 RCX: 000000000000000f [64929.010174] RDX: ffff880200409400 RSI: ffff8801fbc70580 RDI: ffff8801fbc70580 [64929.010185] RBP: ffffc90048afbdd8 R08: ffff880204dc0000 R09: 00000006401f55c3 [64929.010197] R10: dead000000000200 R11: dead000000000200 R12: 0000000000019fc0 [64929.010208] R13: ffff8801fbc70580 R14: 0000000000000000 R15: ffff8801fbc70f40 [64929.010229] FS: 00007fcfc0af8700(0000) GS:ffff880204dc0000(0000) knlGS:0000000000000000 [64929.010241] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [64929.010251] CR2: 0000000000000028 CR3: 00000001f067b000 CR4: 0000000000042660 [64929.010270] Stack: [64929.010275] ffffc90048afbe08 ffffffff810bda72 ffffc90048afbdf8 ffff880204dd9fc0 [64929.010295] 0000000000019fc0 ffff8801fbc70580 ffffc90048afbe78 ffffffff818dae04 [64929.010314] 0000000000000001 ffff8801ef0a2400 ffffc90048afbe48 ffff8801f0e67708 [64929.010333] Call Trace: [64929.010340] [<ffffffff810bda72>] wq_worker_sleeping+0x12/0xa0 [64929.010352] [<ffffffff818dae04>] __schedule+0x414/0x530 [64929.010362] [<ffffffff810d63ec>] do_task_dead+0x3c/0x40 [64929.010373] [<ffffffff810aaade>] do_exit+0x24e/0x480 [64929.010383] [<ffffffff810c6ae5>] ? kthread+0xe5/0x100 [64929.010393] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 [64929.010403] [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40 [64929.010415] [<ffffffff818e0db7>] rewind_stack_do_exit+0x17/0x20 [64929.010425] Code: 48 09 00 00 48 8b 40 c8 c9 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 87 48 09 00 00 <48> 8b 40 d8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 [64929.010607] RIP [<ffffffff810c62c0>] kthread_data+0x10/0x20 [64929.010620] RSP <ffffc90048afbdd8> [64929.010626] CR2: ffffffffffffffd8 [64929.010638] ---[ end trace e1da1cdf77fed145 ]--- [64929.010647] Fixing recursive fault but reboot is needed!

This is a centos 6 system booting with:
    kernel /boot/xen.gz dom0_mem=6144M,max:6144M cpuinfo com1=115200,8n1
console=com1,tty loglvl=all guest_loglvl=all msi=off com2=115200,8n1 console=com2,tty1 module /boot/vmlinuz-4.9.75-30.el6.x86_64 ro root=UUID=ffab1fdf-28a4-4239-b112-5e920e3d6c36 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM biosdevname=0 nomodeset max_loop=128 max_loop=512 xencons=ttyS1 console=hvc0,tty0 module /boot/initramfs-4.9.75-30.el6.x86_64.img

Running xen-4.6.6-9.el6. Note that I have msi=off set in the xen parameters due to hitting the same network card bug that Kevin Stange was hitting.

Happy to grab any further info as required, or let me know if this is better suited on xen-devel.

Cheers, Nathan

Nathan March

9:50 p.m.

New subject: Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Hi,

...

Hmm.. isn't this the ldisc bug that was discussed a few months ago on this

list,

...

and a patch was applied to virt-sig kernel aswell?

Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran into it.

Looks like that patch was just removed in 4.9.75-30 which I just upgraded this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122 Previously I was on 4.9.63-29 which does not have this problem, and does have the ldisc patch. So I guess the question is for Johnny, why was it removed?

In the meantime, I'll revert the kernel and follow up if I see any further problems.

Cheers, Nathan

Karl Johnson

11:57 p.m.

On Tue, Jan 23, 2018 at 4:50 PM, Nathan March nathan@gt.net wrote:

...

Hi,

...
Hmm.. isn't this the ldisc bug that was discussed a few months ago on

this list,

...
and a patch was applied to virt-sig kernel aswell?

Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran into it.

Looks like that patch was just removed in 4.9.75-30 which I just upgraded this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122 Previously I was on 4.9.63-29 which does not have this problem, and does have the ldisc patch. So I guess the question is for Johnny, why was it removed?

In the meantime, I'll revert the kernel and follow up if I see any further problems.

IIRC the patch has been removed from the spec file because it has been merged upstream in 4.9.71.

Karl

Kevin Stange

24 Jan 24 Jan

12:20 a.m.

On 01/23/2018 05:57 PM, Karl Johnson wrote:

...

On Tue, Jan 23, 2018 at 4:50 PM, Nathan March <nathan@gt.net mailto:nathan@gt.net> wrote:

Hi,

> Hmm.. isn't this the ldisc bug that was discussed a few months ago on this
list,
> and a patch was applied to virt-sig kernel aswell?
>
> Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran
into it.

Looks like that patch was just removed in 4.9.75-30 which I just
upgraded
this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122
<http://cbs.centos.org/koji/buildinfo?buildID=21122>
Previously I was on 4.9.63-29 which does not have this problem, and does
have the ldisc patch. So I guess the question is for Johnny, why was it
removed?

In the meantime, I'll revert the kernel and follow up if I see any
further
problems.

IIRC the patch has been removed from the spec file because it has been merged upstream in 4.9.71.

The IRC discussion I found in my log indicates that it was removed because it didn't apply cleanly due to changes when updating to 4.9.75, yet I don't think anyone independently validated that the changes made are equivalent to the patch that was removed. I was never able to reproduce this issue, so I didn't investigate it myself.

-- Kevin Stange Chief Technology Officer Steadfast | Managed Infrastructure, Datacenter and Cloud Services 800 S Wells, Suite 190 | Chicago, IL 60607 312.602.2689 X203 | Fax: 312.602.2688 kevin@steadfast.net | www.steadfast.net

Pasi Kärkkäinen

7:01 a.m.

On Tue, Jan 23, 2018 at 06:20:39PM -0600, Kevin Stange wrote:

...

On 01/23/2018 05:57 PM, Karl Johnson wrote:

...
On Tue, Jan 23, 2018 at 4:50 PM, Nathan March <nathan@gt.net mailto:nathan@gt.net> wrote:
Hi,

> Hmm.. isn't this the ldisc bug that was discussed a few months ago on this
list,
> and a patch was applied to virt-sig kernel aswell?
>
> Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran
into it.

Looks like that patch was just removed in 4.9.75-30 which I just
upgraded
this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122
<http://cbs.centos.org/koji/buildinfo?buildID=21122>
Previously I was on 4.9.63-29 which does not have this problem, and does
have the ldisc patch. So I guess the question is for Johnny, why was it
removed?

In the meantime, I'll revert the kernel and follow up if I see any
further
problems.
IIRC the patch has been removed from the spec file because it has been merged upstream in 4.9.71.
The IRC discussion I found in my log indicates that it was removed because it didn't apply cleanly due to changes when updating to 4.9.75, yet I don't think anyone independently validated that the changes made are equivalent to the patch that was removed. I was never able to reproduce this issue, so I didn't investigate it myself.

Sounds like the patch is still needed :)

Anyone up to re-porting it to 4.9.75+ ?

-- Pasi

...

-- Kevin Stange Chief Technology Officer Steadfast | Managed Infrastructure, Datacenter and Cloud Services 800 S Wells, Suite 190 | Chicago, IL 60607 312.602.2689 X203 | Fax: 312.602.2688 kevin@steadfast.net | www.steadfast.net

Johnny Hughes

2:38 p.m.

On 01/24/2018 01:01 AM, Pasi Kärkkäinen wrote:

...

On Tue, Jan 23, 2018 at 06:20:39PM -0600, Kevin Stange wrote:

...
On 01/23/2018 05:57 PM, Karl Johnson wrote:

...
On Tue, Jan 23, 2018 at 4:50 PM, Nathan March <nathan@gt.net mailto:nathan@gt.net> wrote:
Hi,

> Hmm.. isn't this the ldisc bug that was discussed a few months ago on this
list,
> and a patch was applied to virt-sig kernel aswell?
>
> Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran
into it.

Looks like that patch was just removed in 4.9.75-30 which I just
upgraded
this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122
<http://cbs.centos.org/koji/buildinfo?buildID=21122>
Previously I was on 4.9.63-29 which does not have this problem, and does
have the ldisc patch. So I guess the question is for Johnny, why was it
removed?

In the meantime, I'll revert the kernel and follow up if I see any
further
problems.
IIRC the patch has been removed from the spec file because it has been merged upstream in 4.9.71.
The IRC discussion I found in my log indicates that it was removed because it didn't apply cleanly due to changes when updating to 4.9.75, yet I don't think anyone independently validated that the changes made are equivalent to the patch that was removed. I was never able to reproduce this issue, so I didn't investigate it myself.
Sounds like the patch is still needed :)

Anyone up to re-porting it to 4.9.75+ ?

It looked, at first glance, like 4.9.71 fixed it .. I guess not in all cases

Nathan March

5:30 p.m.

New subject: Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

...

-----Original Message----- From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Wednesday, January 24, 2018 6:39 AM To: centos-virt@centos.org Subject: Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

On 01/24/2018 01:01 AM, Pasi Kärkkäinen wrote:

...
On Tue, Jan 23, 2018 at 06:20:39PM -0600, Kevin Stange wrote:

...
On 01/23/2018 05:57 PM, Karl Johnson wrote:

...
On Tue, Jan 23, 2018 at 4:50 PM, Nathan March <nathan@gt.net mailto:nathan@gt.net> wrote:
Hi,

> Hmm.. isn't this the ldisc bug that was discussed a few months ago
on this

...
...
...
list,
> and a patch was applied to virt-sig kernel aswell?
>
> Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran
into it.

Looks like that patch was just removed in 4.9.75-30 which I just
upgraded
this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122
<http://cbs.centos.org/koji/buildinfo?buildID=21122>
Previously I was on 4.9.63-29 which does not have this problem, and
does

...
...
...
have the ldisc patch. So I guess the question is for Johnny, why was it
removed?

In the meantime, I'll revert the kernel and follow up if I see any
further
problems.
IIRC the patch has been removed from the spec file because it has been merged upstream in 4.9.71.
The IRC discussion I found in my log indicates that it was removed because it didn't apply cleanly due to changes when updating to 4.9.75, yet I don't think anyone independently validated that the changes made are equivalent to the patch that was removed. I was never able to reproduce this issue, so I didn't investigate it myself.
Sounds like the patch is still needed :)

Anyone up to re-porting it to 4.9.75+ ?
It looked, at first glance, like 4.9.71 fixed it .. I guess not in all cases

I'm happy to do testing here if anyone's able to help with a patch, does look like reverting to 4.9.63-29 solved it for me in the interm.

Pasi Kärkkäinen

4 May 4 May

7:11 p.m.

On Wed, Jan 24, 2018 at 09:30:33AM -0800, Nathan March wrote:

...

...
-----Original Message----- From: CentOS-virt [mailto:centos-virt-bounces@centos.org] On Behalf Of Johnny Hughes Sent: Wednesday, January 24, 2018 6:39 AM To: centos-virt@centos.org Subject: Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

On 01/24/2018 01:01 AM, Pasi Kärkkäinen wrote:

...
On Tue, Jan 23, 2018 at 06:20:39PM -0600, Kevin Stange wrote:

...
On 01/23/2018 05:57 PM, Karl Johnson wrote:

...
On Tue, Jan 23, 2018 at 4:50 PM, Nathan March <nathan@gt.net mailto:nathan@gt.net> wrote:
Hi,

> Hmm.. isn't this the ldisc bug that was discussed a few months ago
on this

...
...
...
list,
> and a patch was applied to virt-sig kernel aswell?
>
> Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran
into it.

Looks like that patch was just removed in 4.9.75-30 which I just
upgraded
this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122
<http://cbs.centos.org/koji/buildinfo?buildID=21122>
Previously I was on 4.9.63-29 which does not have this problem, and
does

...
...
...
have the ldisc patch. So I guess the question is for Johnny, why was it
removed?

In the meantime, I'll revert the kernel and follow up if I see any
further
problems.
IIRC the patch has been removed from the spec file because it has been merged upstream in 4.9.71.
The IRC discussion I found in my log indicates that it was removed because it didn't apply cleanly due to changes when updating to 4.9.75, yet I don't think anyone independently validated that the changes made are equivalent to the patch that was removed. I was never able to reproduce this issue, so I didn't investigate it myself.
Sounds like the patch is still needed :)

Anyone up to re-porting it to 4.9.75+ ?
It looked, at first glance, like 4.9.71 fixed it .. I guess not in all cases
I'm happy to do testing here if anyone's able to help with a patch, does look like reverting to 4.9.63-29 solved it for me in the interm.

Hello Nathan,

I noticed Upstream Linux now has some patches related to tty_ldisc .. as you seem to have an easily reproducible way of triggering the crashes, could you please try if these patches fix the issue, when applied to CentOS 4.9 kernel?

"tty: fix data race between tty_init_dev and flush of buf": https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/dr...

"tty: Avoid possible error pointer dereference at tty_ldisc_restore().": https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/dr...

"tty: Don't call panic() at tty_ldisc_init()": https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/dr...

"tty: Use __GFP_NOFAIL for tty_ldisc_get()": https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/dr...

Thanks,

-- Pasi

2629

Age (days ago)

2736

Last active (days ago)

virt@lists.centos.org

14 comments

7 participants

tags (0)

participants (7)

George Dunlap
Johnny Hughes
Karl Johnson
Kevin Stange
Nathan March
Pasi Kärkkäinen
Peter Peltonen