On 11/07/2017 04:57 PM, Sarah Newman wrote:
On 11/07/2017 03:12 PM, Nathan March wrote:
Since moving from 4.4 to 4.6, I've been seeing an increasing number of stability issues on our hypervisors. I'm not clear if there's a singular root cause here, or if I'm dealing with multiple bugs.
One of the more common ones I've seen, is a VM on shutdown will remain in the null state and a kernel bug is thrown:
xen001 log # xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 6144 24 r----- 6639.7
(null) 3 0 1 --pscd 36.3
[89920.839074] BUG: unable to handle kernel paging request at ffff88020ee9a000
<snip>
This is on xen 4.6.6-4.el6 with 4.9.58-29.el6.x86_64. I see these issues across a wide number of systems with from both Dell and Supermicro, although we run the same Intel x540 10gb nic's in each system with the same netapp nfs backend storage.
We don't use NFS and have not seen the exact same issue.
Additionally we aren't using xen 4.6 any more, we're using 4.8, but we didn't see issues like that when we were using xen 4.6. We're also still on 4.9.39. You might try an older kernel or a newer version of xen in addition to looking for nfs specific issues.
--Sarah