[CentOS-virt] Xen PV DomU running Kernel 4.14.5-1.el7.elrepo.x86_64: xl -v vcpu-set <domU> <val> triggers domU kernel WARNING, then domU becomes unresponsive
Adi Pircalabu
adi at ddns.com.au
Thu Dec 14 22:01:19 UTC 2017
On 15-12-2017 4:10, Akemi Yagi wrote:
> On Mon, Dec 11, 2017 at 4:52 PM, Adi Pircalabu <adi at ddns.com.au>
> wrote:
>
>> Has anyone seen this recently? I couldn't replicate it on:
>> - CentOS 6 running kernel-2.6.32-696.16.1.el6.x86_64,
>> kernel-lt-4.4.105-1.el6.elrepo.x86_64
>> - CentOS 7 running 4.9.67-1.el7.centos.x86_64
>>
>> But I can replicate it consistently running "xl -v vcpu-set <domU>
>> <val>" on:
>> - CentOS 6 running 4.14.5-1.el6.elrepo.x86_64
>> - CentOS 7 running 4.14.5-1.el7.elrepo.x86_64
>>
>> dom0 versions tested with similar results in the domU:
>> - 4.6.6-6.el7 on kernel 4.9.63-29.el7.x86_64
>> - 4.6.3-15.el6 on kernel 4.9.37-29.el6.x86_64
>>
>> Noticed behaviour:
>> - These commands stall:
>> top
>> ls -l /var/tmp
>> ls -l /tmp
>> - Stuck in D state on the CentOS 7 domU:
>> root 5 0.0 0.0 0 0 ? D 11:20 0:00
>> [kworker/u8:0]
>> root 316 0.0 0.0 0 0 ? D 11:20 0:00
>> [jbd2/xvda1-8]
>> root 1145 0.0 0.2 116636 4776 ? Ds 11:20 0:00
>> -bash
>> root 1289 0.0 0.1 25852 2420 ? Ds 11:35 0:00
>> /usr/bin/systemd-tmpfiles --clean
>> root 1290 0.0 0.1 125248 2696 pts/1 D+ 11:44 0:00 ls
>> --color=auto -l /tmp/
>> root 1293 0.0 0.1 125248 2568 pts/2 D+ 11:44 0:00 ls
>> --color=auto -l /var/tmp
>> root 1296 0.0 0.2 116636 4908 pts/3 Ds+ 11:44 0:00
>> -bash
>> root 1358 0.0 0.1 125248 2612 pts/4 D+ 11:47 0:00 ls
>> --color=auto -l /var/tmp
>>
>> At a first glance it appears the issue is in 4.14.5 kernel. Stack
>> traces follow:
>>
>> Adi Pircalabu
>
> Can you test-install 4.15-rcX
> to see if the problem persists in the latest kernel?:
>
> http://elrepo.org/people/ajb/devel/kernel-ml/el7/x86_64/RPMS/ [1]
>
> Akemi
Thanks for that, tested it on both CentOS 6 and 7 PV domU and I get
similar panics:
-----CentOS 6-----
[...]
dracut: Switching root
Welcome to CentOS
Starting udev: udev: starting version 147
input: PC Speaker as /devices/platform/pcspkr/input/input0
xen_netfront: Initialising Xen virtual ethernet driver
BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
IP: coretemp_cpu_online+0x116/0x190 [coretemp]
PGD 7b5c7067 P4D 7b5c7067 PUD 7b5cd067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: coretemp(+) hwmon xen_netfront pcspkr ext4 jbd2
mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 4.15.0-0.rc1.el6.elrepo.x86_64
#1
task: ffff88007c8f43c0 task.stack: ffffc90040390000
RIP: e030:coretemp_cpu_online+0x116/0x190 [coretemp]
RSP: e02b:ffffc90040393cd8 EFLAGS: 00010246
RAX: 0000000000000010 RBX: 0000000000000000 RCX: ffff88007c87c248
RDX: 0000000000000000 RSI: ffff880077720c28 RDI: ffff8800069ea020
RBP: ffffc90040393d18 R08: 0000000000000000 R09: ffffc90040393a08
R10: 0000000000000000 R11: 000000000000005f R12: 0000000000000000
R13: ffff8800069ea000 R14: ffff88007f60a040 R15: 0000000000000000
FS: 00007f685ca0a700(0000) GS:ffff88007f600000(0000)
knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 000000000683a000 CR4: 0000000000042660
Call Trace:
? coretemp_add_core+0x50/0x50 [coretemp]
cpuhp_invoke_callback+0xe9/0x700
? put_prev_task_fair+0x26/0x40
? __schedule+0x2d0/0x6e0
? __wake_up_common+0x84/0x130
? __wake_up_common+0x84/0x130
cpuhp_thread_fun+0xee/0x170
smpboot_thread_fn+0x10c/0x160
? smpboot_create_threads+0x80/0x80
kthread+0x10a/0x140
? kthread_probe_data+0x40/0x40
ret_from_fork+0x1f/0x30
Code: 11 15 41 e1 49 89 c5 b8 f4 ff ff ff 4d 85 ed 0f 84 66 ff ff ff 4c
89 ef e8 88 11 41 e1 85 c0 75 6e 48 8b 05 75 17 00 00 4d 63 ff <4e> 89
2c f8 49 81 fd 00 f0 ff ff 44 89 e8 0f 87 3c ff ff ff 49
RIP: coretemp_cpu_online+0x116/0x190 [coretemp] RSP: ffffc90040393cd8
CR2: 0000000000000010
---[ end trace 8253bafacf228cf2 ]---
-----CentOS 6-----
-----CentOS 7-----
[...]
[ OK ] Found device /dev/xvda2.
Activating swap /dev/xvda2...
[ 4.998940] alg: No test for pcbc(aes) (pcbc-aes-aesni)
[ 5.001054] Adding 1048572k swap on /dev/xvda2. Priority:-2
extents:1 across:1048572k SSFS
[ OK ] Activated swap /dev/xvda2.
[ OK ] Reached target Swap.
[ 5.020760] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[ 5.020767] IP: coretemp_cpu_online+0xf8/0x1f7 [coretemp]
[ 5.020769] PGD 0 P4D 0
[ 5.020771] Oops: 0002 [#1] SMP
[ 5.020773] Modules linked in: coretemp(+) crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd
glue_helper cryptd pcspkr intel_rapl_perf nfsd auth_rpcgss nfs_acl lockd
grace sunrpc ip_tables ext4 mbcache jbd2 xen_netfront xen_blkfront
crc32c_intel
[ 5.020786] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted
4.15.0-0.rc3.el7.elrepo.x86_64 #1
[ 5.020789] RIP: e030:coretemp_cpu_online+0xf8/0x1f7 [coretemp]
[ 5.020790] RSP: e02b:ffffc90040387e10 EFLAGS: 00010246
[ 5.020793] RAX: 0000000000000010 RBX: ffff8800040d8800 RCX:
0000000000000000
[ 5.020794] RDX: ffff880079761e70 RSI: ffff88007c438cc8 RDI:
ffff8800040d8820
[ 5.020796] RBP: ffffc90040387e40 R08: 0000000000000000 R09:
ffffffff81f8aff0
[ 5.020798] R10: ffff88007d01f400 R11: 0000000000000000 R12:
0000000000000000
[ 5.020800] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff88007d00a020
[ 5.020804] FS: 0000000000000000(0000) GS:ffff88007d000000(0000)
knlGS:0000000000000000
[ 5.020806] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.020808] CR2: 0000000000000010 CR3: 0000000004731000 CR4:
0000000000042660
[ 5.020810] Call Trace:
[ 5.020814] ? create_core_data+0x5f0/0x5f0 [coretemp]
[ 5.020817] cpuhp_invoke_callback+0xae/0x5c0
[ 5.020820] ? __schedule+0x295/0x880
[ 5.020823] cpuhp_thread_fun+0xce/0x170
[ 5.020825] smpboot_thread_fn+0x110/0x160
[ 5.020827] kthread+0x102/0x140
[ 5.020828] ? sort_range+0x30/0x30
[ 5.020831] ? kthread_associate_blkcg+0xa0/0xa0
[ 5.020833] ret_from_fork+0x1f/0x30
[ 5.020834] Code: 21 a0 41 0f b7 f6 e8 38 73 30 e1 48 89 c3 b8 f4 ff
ff ff 48 85 db 74 c0 48 89 df e8 d3 69 30 e1 85 c0 75 7c 48 8b 05 40 18
00 00 <4a> 89 1c f0 48 81 fb 00 f0 ff ff 0f 87 e7 00 00 00 49 8b 47 4c
[ 5.020852] RIP: coretemp_cpu_online+0xf8/0x1f7 [coretemp] RSP:
ffffc90040387e10
[ 5.020854] CR2: 0000000000000010
[ 5.020856] ---[ end trace 9ce91afe6b362317 ]---
[ 5.020858] Kernel panic - not syncing: Fatal exception
[ 5.020861] Kernel Offset: disabled
-----CentOS 7-----
For CentOS 7 I've also tried booting it using vcpus = 1 and no maxvcpus
with the same outcome. Looks like 4.15-rc is a no go for me :)
Cheers,
---
Adi Pircalabu
More information about the CentOS-virt
mailing list