[CentOS-virt] Stability issues since moving to 4.6 - Kernel paging request bug + VM left in null state

Wed Nov 8 00:57:47 UTC 2017
Sarah Newman <srn at prgmr.com>

On 11/07/2017 03:12 PM, Nathan March wrote:
> Since moving from 4.4 to 4.6, I've been seeing an increasing number of
> stability issues on our hypervisors. I'm not clear if there's a singular
> root cause here, or if I'm dealing with multiple bugs.
> 
>  
> 
> One of the more common ones I've seen, is a VM on shutdown will remain in
> the null state and a kernel bug is thrown:
> 
>  
> 
> xen001 log # xl list
> 
> Name                                        ID   Mem VCPUs      State
> Time(s)
> 
> Domain-0                                     0  6144    24     r-----
> 6639.7
> 
> (null)                                       3     0     1     --pscd
> 36.3
> 
>  
> 
> [89920.839074] BUG: unable to handle kernel paging request at
> ffff88020ee9a000
> 
<snip>

> This is on xen 4.6.6-4.el6 with 4.9.58-29.el6.x86_64. I see these issues
> across a wide number of systems with from both Dell and Supermicro, although
> we run the same Intel x540 10gb nic's in each system with the same netapp
> nfs backend storage.

We don't use NFS and have not seen the exact same issue.

--Sarah