Hi All,

 

One of our developers managed to trigger a kernel oops on a 4.4.2 dom0.. Oops text is attached. He was working on setting up network namespaces / bridging inside a centos domU, had activated the bridge and lost networking (probably config error) so he rebooted the VM. On reboot is when we saw the oops, along with various xen procs hanging:

 

root     23388  0.0  1.6 132796 32264 ?        SLsl Oct02   0:04 /usr/sbin/xl create /mnt/xen/gx/xen/metrixc7

root     26119  0.0  0.0      0     0 ?        Z    Oct02   0:00  \_ [block] <defunct>

root     26127  0.0  0.0      0     0 ?        Z    Oct02   0:00  \_ [block] <defunct>

root     26137  0.0  0.0      0     0 ?        Z    Oct02   0:00  \_ [block] <defunct>

root     26157  0.0  0.0      0     0 ?        Z    Oct02   0:00  \_ [block] <defunct>

root     26169  0.0  0.0      0     0 ?        Z    Oct02   0:00  \_ [block] <defunct>

root     26195  0.0  0.0      0     0 ?        Z    Oct02   0:00  \_ [vif-bridge] <defunct>

root     24625  0.0  0.0      0     0 ?        Ds   Oct02   0:06 [tapdisk]

 

At this point the dom0 is still up and running existing VMs fine, I can also migrate live VMs off of it successfully although the post-migration clean up fails and hangs:

 

libxl: error: libxl_device.c:935:device_backend_callback: unable to remove device with path /local/domain/0/backend/vbd/17/51712

 

Host is running centos 7 with the 4.4.2-7 package and kernel 3.10.68-11.el6.centos.alt.x86_64. I've also attached xl dmesg.

 

First time I've seen anything like this and not sure if his networking/bridging in the domU is related or just coincidental. Any thoughts / ideas? Going to try to reproduce on a test dom0 later this week, so happy to grab any additional debugging if required.

 

- Nathan