[CentOS-virt] Tapdisk processes being left behind when hvm domu's migrate/shutdown

Thu Mar 12 18:11:27 UTC 2015
Nathan March <nathan at gt.net>

Hi All,

 

I'm seeing tapdisk processes not being terminated after a HVM vm is shutdown or migrated away. I don't see this problem with linux paravirt domu's, just windows hvm ones.

 

xl.cfg:

 

name = 'nathanwin'

memory = 4096

vcpus = 2

disk = [ 'file:/mnt/gtc_disk_p1/nathanwin/drive_c,hda,w' ]

vif = [ 'mac=00:16:3D:01:03:E0,bridge=vlan208' ]

builder = "hvm"

kernel = "/usr/lib/xen/boot/hvmloader"

 

localtime = 0

on_poweroff = "destroy"

on_reboot = "restart"

on_crash = "destroy"

 

vnc = 1

vncunused = 1

 

cpuid  = [

            '0:eax=00000000000000000000000000001011',

            '1:eax=00000000000000100000011011000010,ecx=10000011101110100010001000000011,edx=00010111100010111111101111111111',

            '2:eax=01010101000000110101101000000001',

          '7,0:eax=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,ebx=00000000000000000000000000000000,ecx=00000000000000000000000000000000,edx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',

         '13,1:eax=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0',

          '10:ebx=00000000000000000000000000000000',

           '11:edx=00000000000000000000000000000000',

   '2147483650:eax=01100101011101000110111001001001,ebx=00101001010100100010100001101100,ecx=01101111011001010101100000100000,edx=00101001010100100010100001101110',

   '2147483651:eax=01010101010100000100001100100000,ebx=00100000001000000010000000100000,ecx=00100000001000000010000000100000,edx=01001100001000000010000000100000',

   '2147483652:eax=00110000001101000011011000110101,ebx=00100000010000000010000000100000,ecx=00110111001100100010111000110010,edx=00000000011110100100100001000111',

   '2147483656:eax=00000000000000000011000000101000',

         ]

 

Starting with the VM running initially on another host, I migrate it in:

 

migration target: Ready to receive domain.

Saving to migration stream new xl format (info 0x0/0x0/1450)

Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/1450)

Savefile contains xl domain config

WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" instead if you really want a non-default firmware

xc: progress: Reloading memory pages: 56320/1114193    5%

xc: progress: Reloading memory pages: 1003520/1114193   90%

DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c

DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev0

DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c

DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev2

migration target: Transfer complete, requesting permission to start domain.

migration sender: Target has acknowledged transfer.

migration sender: Giving target permission to start.

migration target: Got permission, starting domain.

migration target: Domain started successsfully.

migration sender: Target reports successful startup.

DEBUG libxl__device_destroy_tapdisk 66 type=aio:/mnt/gtc_disk_p1/nathanwin/drive_c disk=:/mnt/gtc_disk_p1/nathanwin/drive_c

Migration successful.

 

and now I have 2 tapdisk procs:

 

gtc-vana-005 ~ # ps auxf | grep tapdisk

root     32491  0.1  0.2  20364  4636 ?        SLs  11:06   0:00 tapdisk

root     32520  0.0  0.2  20364  4636 ?        SLs  11:06   0:00 tapdisk

 

Which seems odd given that the VM in question only has a single disk attached to it and the qemu proc indicates it's using tapdev2:

 

root     32524  0.4  0.7 323208 15040 ?        SLsl 11:06   0:00 /usr/lib/xen/bin/qemu-system-i386 -xen-domid 3 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-3,server,nowait -mon chardev=libxl-cmd,mode=control -nodefaults -name nathanwin--incoming -vnc 127.0.0.1:0,to=99 -device cirrus-vga -global vga.vram_size_mb=8 -boot order=cda -smp 2,maxcpus=2 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3d:01:03:e0 -netdev type=tap,id=net0,ifname=vif3.0-emu,script=no,downscript=no -incoming fd:13 -machine xenfv -m 4088 -drive file=/dev/xen/blktap-2/tapdev2,if=ide,index=0,media=disk,format=raw,cache=writeback

 

gtc-vana-005 ~ # lsof -p 32520 | grep blktap-2

tapdisk 32520 root  mem    CHR              246,2               886671 /dev/xen/blktap-2/blktap2

tapdisk 32520 root   19u   CHR              246,2         0t0   886671 /dev/xen/blktap-2/blktap2

 

gtc-vana-005 ~ # lsof -p 32491 | grep blktap-2   

tapdisk 32491 root  mem    CHR              246,0               903999 /dev/xen/blktap-2/blktap0

tapdisk 32491 root   14u   CHR              246,0         0t0   903999 /dev/xen/blktap-2/blktap0

 

I then migrate this VM off to another host:

 

migration target: Ready to receive domain.

Saving to migration stream new xl format (info 0x0/0x0/1450)

Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/1450)

Savefile contains xl domain config

WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" instead if you really want a non-default firmware

xc: progress: Reloading memory pages: 56320/1114193    5%

xc: progress: Reloading memory pages: 1003520/1114193   90%

DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c

DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev2

DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c

DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev3

migration target: Transfer complete, requesting permission to start domain.

migration sender: Target has acknowledged transfer.

migration sender: Giving target permission to start.

migration target: Got permission, starting domain.

migration target: Domain started successsfully.

migration sender: Target reports successful startup.

DEBUG libxl__device_destroy_tapdisk 66 type=aio:/mnt/gtc_disk_p1/nathanwin/drive_c disk=:/mnt/gtc_disk_p1/nathanwin/drive_c

Migration successful.

 

and I'm down to one tapdisk proc that didn't get cleaned up:

 

gtc-vana-005 ~ # ps auxf | grep tapdisk

root     32520  0.0  0.2  20364  4636 ?        SLs  11:06   0:00 tapdisk

 

So it seems like xen is creating a second tapdisk proc on startup for some reason when it doesn't need to, then on cleanup it's only killing one of the two procs.

 

Any thoughts? This is on the latest 4.4.1-7.el6 packages.

 

Thanks!

 

- Nathan

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20150312/ddf6d05d/attachment-0003.html>