[CentOS-virt] RHEL/CentOS 7.x KVM kernel instability & unreliability

Tue Jun 23 15:41:01 UTC 2015
Vlastimil Holer <vlastimil.holer at gmail.com>

Hello,

I'm managing smaller private virtualization infrastructure, currently based
on CentOS 7.x. In the past we were running mostly on Debian 7 (kernel
3.2.x) and CentOS 6.x without problems.

After we have upgraded to CentOS 7.x, I have experienced occasional
physical host crashes when I did e.g. suspend or resume more virtual
machines OR random virtual machine checkpoint was invalid and VM could not
be resumed.

I did few intesive tests on same hardware with:
- CentOS 6.6 ... worked fine
- CentOS 7.1 with
 1. CentOS distribution kernel ... failed
 2. Binary RHEL 7.1 distr. kernel ... failed
 3. vanilla 3.10.80 kernel ... failed
(plus various firmware releases and BIOS configurations)

So far I could reliably run only CentOS 7.x with latest 4.0.5 kernel from
ElRepo.

7.x kernel is based on 3.10.x, which failed for me as well. So I think
there was some bug in KVM, which led to memory corruption. The result was
either kernel oops or broken checkpoint and kernel oops occured later.

I have opened bug on Red Hat
https://bugzilla.redhat.com/show_bug.cgi?id=1231964
but since it's a private bug, I have created duplicate bug on CentOS
http://bugs.centos.org/view.php?id=8949

There is described how to reproduce the problem including stress test
script.

I would appreciate if anybody can confirm it happens for him as well.

Best regards,
Vlastimil Holer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20150623/96c1093e/attachment-0005.html>