[CentOS-virt] Xen DomU's randomly freezing

Mon Apr 30 15:51:27 UTC 2018
George Dunlap <dunlapg at umich.edu>

On Mon, Apr 30, 2018 at 1:08 PM, Daz Day <dazday60 at gmail.com> wrote:
> Hi,
>
> I've tried hitting up the CentOS forums and thought I'd try here too as I
> don't seem to be getting any bites.
>
> We've been in the process of migrating all our hypervisors over to CentOS 7
> using Xen. Once we had a few up and running we started to notice that the
> DomU's would randomly freeze. They become unresponsive to any network
> traffic, stop consuming CPU resources on the hypervisor and it's not
> possible to log in to the console locally using:
> virsh console <domain>
> We can sometimes get as far as typing a username and hitting return, but the
> DomU just hangs there. It doesn't seem to matter what Linux distro the DomU
> is running, it affects them all. The only way we can get them back is by
> destroying and recreating them (far from ideal!).
>
> After a bit of research and digging around, we eventually found these 2
> nuggets:
> https://wiki.gentoo.org/wiki/Xen#Xen_domU_hanging_with_kernel_4.3.2B
> https://www.novell.com/support/kb/doc.php?id=7018590
>
> They both advise adding the command line argument:
> gnttab_max_frames=256(the default is 32).
> We applied this change and all hypervisors rand stable for around a week
> until DomU's started freezing again (we've since tried even higher values,
> to no avail). More research later led me to
> https://bugs.centos.org/view.php?id=14258 and
> https://bugs.centos.org/view.php?id=14284 (which are essentially the same
> report). There hasn't really been any movement on these tickets
> unfortunately, but I have +1'd them.
>
> Have any others had issues with Xen and DomU's locking up in CentOS 7? Are
> there any other fixes/workarounds? If any additional info is needed that
> isn't already in the bug tickets or forum post, please let me know and I'll
> be happy to provide whatever is required (these freezes are happening at
> least once a day).
>
> Any help would be much appreciated and would mean my Ops guys could get a
> decent sleep!
> Cheers
> Darren

Darren,

Would you mind reposting this to xen-users, along with:

* The config file for your guests
* The output of `dmesg` from inside one of the guests before it hangs
* The output of `dmesg` run on your dom0 after one of these machine hangs

Thanks,
 -George