[CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Tue Jan 23 21:32:32 UTC 2018
Pasi Kärkkäinen <pasik at iki.fi>

Hi,

On Tue, Jan 23, 2018 at 10:35:24AM -0800, Nathan March wrote:
> > Thanks for the heads-up.  It's been running through XenServer's tests
> > as well as the XenProject's "osstest" -- I haven't heard of any
> > additional issues, but I'll ask.
> 
> Looks like I can reproduce this pretty easily, this happened upon ssh'ing
> into the server while I had a VM migrating into it. The system goes
> completely unresponsive (can't even enter a keystroke via console):
> 
> [64722.291300] vlan208: port 4(vif5.0) entered forwarding state
> [64722.291695] NOHZ: local_softirq_pending 08
> [64929.006981] BUG: unable to handle kernel paging request at
> 0000000000002260
> [64929.007020] IP: [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0
> [64929.007049] PGD 1f7a53067 [64929.007057] PUD 1ee0d4067 
> PMD 0 [64929.007069] 
> [64929.007077] Oops: 0000 [#1] SMP
> [64929.007088] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables
> arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss
> nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding
> xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn
> xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler
> joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb
> dca ptp pps_core uas usb_storage wmi ttm
> [64929.007327] CPU: 15 PID: 17696 Comm: kworker/u48:0 Not tainted
> 4.9.75-30.el6.x86_64 #1
> [64929.007343] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1
> 03/04/2015
> [64929.007362] Workqueue: events_unbound flush_to_ldisc
> [64929.007376] task: ffff8801fbc70580 task.stack: ffffc90048af8000
> [64929.007415] RIP: e030:[<ffffffff81533a24>]  [<ffffffff81533a24>]
> n_tty_receive_buf_common+0xa4/0x1f0
> [64929.007465] RSP: e02b:ffffc90048afbb08  EFLAGS: 00010296
> [64929.007476] RAX: 0000000000002260 RBX: 0000000000000000 RCX:
> 0000000000000002
> [64929.007519] RDX: 0000000000000000 RSI: ffff8801dc0f3c20 RDI:
> ffff8801f9b8acd8
> [64929.007563] RBP: ffffc90048afbb78 R08: 0000000000000001 R09:
> ffffffff8210f1c0
> [64929.007577] R10: 0000000000007ff0 R11: 0000000000000000 R12:
> 0000000000000002
> [64929.007620] R13: ffff8801f9b8ac00 R14: 0000000000000000 R15:
> ffff8801dc0f3c20
> [64929.007675] FS:  00007fcfc0af8700(0000) GS:ffff880204dc0000(0000)
> knlGS:0000000000000000
> [64929.007718] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [64929.007759] CR2: 0000000000002260 CR3: 00000001f067b000 CR4:
> 0000000000042660
> [64929.007782] Stack:
> [64929.007806]  ffffc90048afbb38 0000000000000000 ffff8801f9b8acd8
> 0000000104dda030
> [64929.007858]  0000000000002260 00000000fbc72700 ffff880204dc48c0
> 0000000000000000
> [64929.007941]  ffff880204dce890 ffff8801dc0f3c00 ffff8801f7f25c00
> ffffc90048afbbf8
> [64929.007994] Call Trace:
> [64929.008008]  [<ffffffff81533b84>] n_tty_receive_buf2+0x14/0x20
> [64929.008048]  [<ffffffff81536763>] tty_ldisc_receive_buf+0x23/0x50
>

Hmm.. isn't this the ldisc bug that was discussed a few months ago on this list, and a patch was applied to virt-sig kernel aswell? 

Call trace looks similar..


-- Pasi

> [64929.008088]  [<ffffffff81536b88>] flush_to_ldisc+0xc8/0x100
> [64929.008133]  [<ffffffff8102eb7b>] ? __switch_to+0x20b/0x690
> [64929.008176]  [<ffffffff81025375>] ? xen_clocksource_read+0x15/0x20
> [64929.008222]  [<ffffffff810c0030>] process_one_work+0x170/0x500
> [64929.008268]  [<ffffffff818dac28>] ? __schedule+0x238/0x530
> [64929.008310]  [<ffffffff818db00a>] ? schedule+0x3a/0xa0
> [64929.008324]  [<ffffffff810c1ca6>] worker_thread+0x166/0x530
> [64929.008368]  [<ffffffff810e6a69>] ? put_prev_entity+0x29/0x140
> [64929.008412]  [<ffffffff818dac28>] ? __schedule+0x238/0x530
> [64929.008458]  [<ffffffff810d4082>] ? default_wake_function+0x12/0x20
> [64929.008502]  [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120
> [64929.008518]  [<ffffffff818db00a>] ? schedule+0x3a/0xa0
> [64929.008555]  [<ffffffff818dedf6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
> [64929.008599]  [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120
> [64929.008616]  [<ffffffff810c6ae5>] kthread+0xe5/0x100
> [64929.008630]  [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0
> [64929.008643]  [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40
> [64929.008659]  [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0
> [64929.008673]  [<ffffffff818df5a1>] ret_from_fork+0x41/0x50
> [64929.008685] Code: 89 fe 4c 89 ef 89 45 98 e8 aa fb ff ff 8b 45 98 48 63
> d0 48 85 db 48 8d 0c 13 48 0f 45 d9 01 45 bc 49 01 d7 41 29 c4 48 8b 45 b0
> <48> 8b 30 48 89 75 c0 49 8b 0e 8d 96 00 10 00 00 29 ca 41 f6 85 
> [64929.008894] RIP  [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0
> [64929.008914]  RSP <ffffc90048afbb08>
> [64929.008923] CR2: 0000000000002260
> [64929.009641] ---[ end trace e1da1cdf77fed144 ]---
> [64929.009785] BUG: unable to handle kernel paging request at
> ffffffffffffffd8
> [64929.009804] IP: [<ffffffff810c62c0>] kthread_data+0x10/0x20
> [64929.009823] PGD 200d067 [64929.009831] PUD 200f067 
> PMD 0 [64929.009842] 
> [64929.009850] Oops: 0000 [#2] SMP
> [64929.009864] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables
> arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss
> nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding
> xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn
> xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler
> joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb
> dca ptp pps_core uas usb_storage wmi ttm
> [64929.010054] CPU: 15 PID: 17696 Comm: kworker/u48:0 Tainted: G      D
> 4.9.75-30.el6.x86_64 #1
> [64929.010068] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1
> 03/04/2015
> [64929.010127] task: ffff8801fbc70580 task.stack: ffffc90048af8000
> [64929.010138] RIP: e030:[<ffffffff810c62c0>]  [<ffffffff810c62c0>]
> kthread_data+0x10/0x20
> [64929.010153] RSP: e02b:ffffc90048afbdd8  EFLAGS: 00010086
> [64929.010162] RAX: 0000000000000000 RBX: ffff880204dd9fc0 RCX:
> 000000000000000f
> [64929.010174] RDX: ffff880200409400 RSI: ffff8801fbc70580 RDI:
> ffff8801fbc70580
> [64929.010185] RBP: ffffc90048afbdd8 R08: ffff880204dc0000 R09:
> 00000006401f55c3
> [64929.010197] R10: dead000000000200 R11: dead000000000200 R12:
> 0000000000019fc0
> [64929.010208] R13: ffff8801fbc70580 R14: 0000000000000000 R15:
> ffff8801fbc70f40
> [64929.010229] FS:  00007fcfc0af8700(0000) GS:ffff880204dc0000(0000)
> knlGS:0000000000000000
> [64929.010241] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [64929.010251] CR2: 0000000000000028 CR3: 00000001f067b000 CR4:
> 0000000000042660
> [64929.010270] Stack:
> [64929.010275]  ffffc90048afbe08 ffffffff810bda72 ffffc90048afbdf8
> ffff880204dd9fc0
> [64929.010295]  0000000000019fc0 ffff8801fbc70580 ffffc90048afbe78
> ffffffff818dae04
> [64929.010314]  0000000000000001 ffff8801ef0a2400 ffffc90048afbe48
> ffff8801f0e67708
> [64929.010333] Call Trace:
> [64929.010340]  [<ffffffff810bda72>] wq_worker_sleeping+0x12/0xa0
> [64929.010352]  [<ffffffff818dae04>] __schedule+0x414/0x530
> [64929.010362]  [<ffffffff810d63ec>] do_task_dead+0x3c/0x40
> [64929.010373]  [<ffffffff810aaade>] do_exit+0x24e/0x480
> [64929.010383]  [<ffffffff810c6ae5>] ? kthread+0xe5/0x100
> [64929.010393]  [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0
> [64929.010403]  [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40
> [64929.010415]  [<ffffffff818e0db7>] rewind_stack_do_exit+0x17/0x20
> [64929.010425] Code: 48 09 00 00 48 8b 40 c8 c9 48 c1 e8 02 83 e0 01 c3 66
> 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 87 48 09 00 00
> <48> 8b 40 d8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 
> [64929.010607] RIP  [<ffffffff810c62c0>] kthread_data+0x10/0x20
> [64929.010620]  RSP <ffffc90048afbdd8>
> [64929.010626] CR2: ffffffffffffffd8
> [64929.010638] ---[ end trace e1da1cdf77fed145 ]---
> [64929.010647] Fixing recursive fault but reboot is needed!
> 
> This is a centos 6 system booting with:
> 
>         kernel /boot/xen.gz dom0_mem=6144M,max:6144M cpuinfo com1=115200,8n1
> console=com1,tty loglvl=all guest_loglvl=all msi=off com2=115200,8n1
> console=com2,tty1
>         module /boot/vmlinuz-4.9.75-30.el6.x86_64 ro
> root=UUID=ffab1fdf-28a4-4239-b112-5e920e3d6c36 rd_NO_LUKS rd_NO_LVM
> LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto
> KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM   biosdevname=0 nomodeset max_loop=128
> max_loop=512 xencons=ttyS1 console=hvc0,tty0
>         module /boot/initramfs-4.9.75-30.el6.x86_64.img
> 
> Running xen-4.6.6-9.el6. Note that I have msi=off set in the xen parameters
> due to hitting the same network card bug that Kevin Stange was hitting.
> 
> Happy to grab any further info as required, or let me know if this is better
> suited on xen-devel.
> 
> Cheers,
> Nathan
>