[CentOS] Old kernel bug back in CentOS 6.10?

Thu Aug 16 09:48:13 UTC 2018
Matthias Bethke <matthias at towiski.de>

I updated a few hypervisors and their VMs to CentOS 6.10 on Monday;
today I awoke to an alert saying all VMs are down. It looks like a very
old bug crept back in.

The machine is a ProLiant DL380 G7 with Xeon X5675 and 96 GB, running
half a dozen smallish VMs. Hypervisor and all VMs have kernel
2.6.32-754.2.1.el6.x86_64. Around the time the VMs must have gone down,
there are quite a few error messages like the following in the system
log:

Aug 16 03:10:13 hyper-7 kernel: [265397.382552] vmwrite error: reg 6000 value fffffffffffffff7 (err -9)
Aug 16 03:10:13 hyper-7 kernel: [265397.421372] Pid: 9375, comm: qemu-kvm Not tainted 2.6.32-754.2.1.el6.x86_64 #1
Aug 16 03:10:13 hyper-7 kernel: [265397.464985] Call Trace:
Aug 16 03:10:13 hyper-7 kernel: [265397.481530]  [<ffffffffa0532a9c>] ? vmwrite_error+0x2c/0x30 [kvm_intel]
Aug 16 03:10:13 hyper-7 kernel: [265397.520737]  [<ffffffffa0532ac0>] ? vmcs_writel+0x20/0x30 [kvm_intel]
Aug 16 03:10:13 hyper-7 kernel: [265397.560028]  [<ffffffffa0535e63>] ? vmx_fpu_activate+0x93/0xc0 [kvm_intel]
Aug 16 03:10:14 hyper-7 kernel: [265397.600072]  [<ffffffffa04cd1e7>] ? kvm_arch_vcpu_create+0x37/0x50 [kvm]
Aug 16 03:10:14 hyper-7 kernel: [265397.638183]  [<ffffffffa04c72a1>] ? kvm_vm_ioctl+0x601/0x1050 [kvm]
Aug 16 03:10:14 hyper-7 kernel: [265397.674367]  [<ffffffff8113f461>] ? free_one_page+0x191/0x440
Aug 16 03:10:14 hyper-7 kernel: [265397.708101]  [<ffffffff811b4159>] ? vfs_ioctl+0x29/0xc0
Aug 16 03:10:14 hyper-7 kernel: [265397.739124]  [<ffffffff81142d86>] ? __free_pages+0x46/0xa0
Aug 16 03:10:14 hyper-7 kernel: [265397.773193]  [<ffffffff811b463a>] ? do_vfs_ioctl+0x3aa/0x590
Aug 16 03:10:14 hyper-7 kernel: [265397.805774]  [<ffffffff81142e29>] ? free_pages+0x49/0x50
Aug 16 03:10:14 hyper-7 kernel: [265397.839147]  [<ffffffff811b48a1>] ? sys_ioctl+0x81/0xa0
Aug 16 03:10:14 hyper-7 kernel: [265397.870109]  [<ffffffff810f1d0e>] ? __audit_syscall_exit+0x25e/0x290
Aug 16 03:10:14 hyper-7 kernel: [265397.909358]  [<ffffffff81560351>] ? system_call_fastpath+0x2f/0x34

Curiously, the messages don't seem to indicate anything fatal in and of
themselves; there are a two like this a minute after bootup and like a
dozen more after about a day, none of which seems to have crashed
anything. However, it's the only obvious anomaly I could find around the
time and as it's VT-x related, I reckon there's a connection.

The stack trace closely resembles this bug that turned up in 2015 and was
fixed long ago: https://lkml.org/lkml/2015/7/3/288

Has anyone seen this recently and could confirm or refute any of my
guesswork?

Cheers,
	Matthias