Hi All,
One of our developers managed to trigger a kernel oops on a 4.4.2 dom0.. Oops text is attached. He was working on setting up network namespaces / bridging inside a centos domU, had activated the bridge and lost networking (probably config error) so he rebooted the VM. On reboot is when we saw the oops, along with various xen procs hanging:
root 23388 0.0 1.6 132796 32264 ? SLsl Oct02 0:04 /usr/sbin/xl create /mnt/xen/gx/xen/metrixc7
root 26119 0.0 0.0 0 0 ? Z Oct02 0:00 _ [block] <defunct>
root 26127 0.0 0.0 0 0 ? Z Oct02 0:00 _ [block] <defunct>
root 26137 0.0 0.0 0 0 ? Z Oct02 0:00 _ [block] <defunct>
root 26157 0.0 0.0 0 0 ? Z Oct02 0:00 _ [block] <defunct>
root 26169 0.0 0.0 0 0 ? Z Oct02 0:00 _ [block] <defunct>
root 26195 0.0 0.0 0 0 ? Z Oct02 0:00 _ [vif-bridge] <defunct>
root 24625 0.0 0.0 0 0 ? Ds Oct02 0:06 [tapdisk]
At this point the dom0 is still up and running existing VMs fine, I can also migrate live VMs off of it successfully although the post-migration clean up fails and hangs:
libxl: error: libxl_device.c:935:device_backend_callback: unable to remove device with path /local/domain/0/backend/vbd/17/51712
Host is running centos 7 with the 4.4.2-7 package and kernel 3.10.68-11.el6.centos.alt.x86_64. I've also attached xl dmesg.
First time I've seen anything like this and not sure if his networking/bridging in the domU is related or just coincidental. Any thoughts / ideas? Going to try to reproduce on a test dom0 later this week, so happy to grab any additional debugging if required.
- Nathan