Hi there,
I created the first experimental Vagrant images for Hyper-V on March 14, after the CBS upgrade. They seem to be functional, but they generate a lot of disk I/O during "vagrant up", according to Thomas and Michael. I think we end up with 40GB files as disk images - this is the maximum size of the virtual hard drives we create, but most of that is empty. I compared the contents of our box file to the one from kozo/centos-7, which is produced natively using Packer and Hyper-V: our image contains a real 40GB disk image, while the kozo image only has a 2.1GB image.
Image Factory converts the KVM .qcow2 image to a .vhd by calling "qemu-img convert -O vpc <source> <destination>". This produces a 40GB sparse file taking just 1.1GB of space (both with the fixed and dynamic subformats). The problem occurs while packing the sparse image into the .box file: Python's tarfile module only has read-only support for sparse files (a GNU tar extension) - even in Python 3, so the archive ends up with a regular 40GB file.
Creating 40GB sparse files seems to be specific to qemu-img: "VBoxManage convertmedium disk box.vmdk box.vhd" produces a 1.1GB regular file (VirtualBox' convert tool doesn't seem to support QCOW2, so I had to first convert the QCOW2 image to VMDK).
We should probably either: - call GNU tar from Image Factory to preserve the sparseness of our image files (and hope that Windows handles them properly) - use VBoxManage to convert to .vhd (VirtualBox is not in our repos, but Oracle offers a .rpm for EL7 systems).
In the mean time, we will probably release v1703 without Hyper-V support, since CentOS Linux 6.9 was already released.
Best regards, Laurențiu
[vagrant@localhost ~]$ tar -tvzf c7.box # that's ours drwxr-xr-x root/root 0 2017-03-14 11:23 Virtual Machines/ -rw-r--r-- root/root 39100 2017-03-14 11:23 Virtual Machines/vm.XML -rw-r--r-- root/root 114 2017-03-14 11:23 Vagrantfile drwxr-xr-x root/root 0 2017-03-14 11:22 Virtual Hard Disks/ -rwxr-xr-x root/root 42949672960 2017-03-14 11:23 Virtual Hard Disks/disk.vhd -rw-r--r-- root/root 22 2017-03-14 11:23 metadata.json
[vagrant@localhost ~]$ unzip -l hyperv.box # kozo/centos-7 Archive: hyperv.box.zip Length Date Time Name --------- ---------- ----- ---- 36898 02-25-2017 13:05 Virtual Machines/1BF518E8-0055-4ABF-8750-E1D703B23A47.vmcx 61440 02-25-2017 13:05 Virtual Machines/1BF518E8-0055-4ABF-8750-E1D703B23A47.VMRS 64 02-25-2017 12:47 metadata.json 2185232384 02-25-2017 13:05 Virtual Hard Disks/CentOS7.vhdx --------- ------- 2185330786 4 files
On Thu, Apr 06, 2017 at 02:03:55PM +0200, Laurentiu Pancescu wrote:
Hi there,
I created the first experimental Vagrant images for Hyper-V on March 14, after the CBS upgrade. They seem to be functional, but they generate a lot of disk I/O during "vagrant up", according to Thomas and Michael. I think we end up with 40GB files as disk images - this is the maximum size of the virtual hard drives we create, but most of that is empty. I compared the contents of our box file to the one from kozo/centos-7, which is produced natively using Packer and Hyper-V: our image contains a real 40GB disk image, while the kozo image only has a 2.1GB image.
Image Factory converts the KVM .qcow2 image to a .vhd by calling "qemu-img convert -O vpc <source> <destination>". This produces a 40GB sparse file taking just 1.1GB of space (both with the fixed and dynamic subformats). The problem occurs while packing the sparse image into the .box file: Python's tarfile module only has read-only support for sparse files (a GNU tar extension) - even in Python 3, so the archive ends up with a regular 40GB file.
Creating 40GB sparse files seems to be specific to qemu-img: "VBoxManage convertmedium disk box.vmdk box.vhd" produces a 1.1GB regular file (VirtualBox' convert tool doesn't seem to support QCOW2, so I had to first convert the QCOW2 image to VMDK).
We should probably either:
- call GNU tar from Image Factory to preserve the sparseness of our image
files (and hope that Windows handles them properly)
- use VBoxManage to convert to .vhd (VirtualBox is not in our repos, but
Oracle offers a .rpm for EL7 systems).
In the mean time, we will probably release v1703 without Hyper-V support, since CentOS Linux 6.9 was already released.
Maybe you can post-process the large .vhd file with "virt-sparsify"? I do not know if it would recognize the format, but it is worth to try it. Otherwise it should be possible to "punch holes" in the large .vhd file with a little program that calls fallocate(2) on zero-filled areas.
HTH, Niels
Best regards, Laurențiu
[vagrant@localhost ~]$ tar -tvzf c7.box # that's ours drwxr-xr-x root/root 0 2017-03-14 11:23 Virtual Machines/ -rw-r--r-- root/root 39100 2017-03-14 11:23 Virtual Machines/vm.XML -rw-r--r-- root/root 114 2017-03-14 11:23 Vagrantfile drwxr-xr-x root/root 0 2017-03-14 11:22 Virtual Hard Disks/ -rwxr-xr-x root/root 42949672960 2017-03-14 11:23 Virtual Hard Disks/disk.vhd -rw-r--r-- root/root 22 2017-03-14 11:23 metadata.json
[vagrant@localhost ~]$ unzip -l hyperv.box # kozo/centos-7 Archive: hyperv.box.zip Length Date Time Name
36898 02-25-2017 13:05 Virtual
Machines/1BF518E8-0055-4ABF-8750-E1D703B23A47.vmcx 61440 02-25-2017 13:05 Virtual Machines/1BF518E8-0055-4ABF-8750-E1D703B23A47.VMRS 64 02-25-2017 12:47 metadata.json 2185232384 02-25-2017 13:05 Virtual Hard Disks/CentOS7.vhdx
2185330786 4 files _______________________________________________ CentOS-devel mailing list CentOS-devel@centos.org https://lists.centos.org/mailman/listinfo/centos-devel
Hi Niels,
On 06/04/17 14:57, Niels de Vos wrote:
Maybe you can post-process the large .vhd file with "virt-sparsify"? I do not know if it would recognize the format, but it is worth to try it. Otherwise it should be possible to "punch holes" in the large .vhd file with a little program that calls fallocate(2) on zero-filled areas.
The .vhd files that qemu-img creates are already sparse, their real size is 1.1GB (seen by du, ls reports 40GB); the problem occurs during their addition to a .tar.gz file, because Python's tarfile module treats them as regular files while reading them. The resulting archive is just 480MB, since gzip can handle lots of adjacent zeros quite well. This archive has to be extracted by Vagrant on Windows, so even if the file would still be sparse inside the archive, we depend on Ruby for Windows to handle them properly, and Hyper-V as well. I'll have to wait until at least Monday, when Michael can hopefully test if a locally-generated sparse file works properly on Windows.
Perhaps there are no issues with sparse files and Hyper-V, but we're the only ones generating such files. The .vhdx images exported by Hyper-V are regular files (everyone else uses Packer's Hyper-V plugin), as are the .vhd files produced by VirtualBox - both the theoretical and the real size is 1.1GB. Only qemu-img produces huge sparse .vhd files, although it produces regular 1.1GB .vmdk files (used by the VirtualBox variant of our Vagrant boxes). Even worse, the .vhdx files produced by qemu-img from EL7 are huge non-sparse 41GB files - this bug was allegedly fixed upstream around January 2015, but maybe the fix wasn't backported yet.
Using VirtualBox for the .vhd conversion would be the least likely to generate surprises, but maybe using GNU tar to create the .ova archive for Vagrant (instead of Python's tarfile) will be enough.
Best regards, Laurențiu
Some more information after experimenting with sparse images (both me and Michael).
After creating a .vhd image in a Vagrant guest, transferring it to a Windows machine while preserving its sparse nature proved to be not so straightforward. Samba, scp and rsync produce a regular 40GB file at the destination, not a sparse one. Probably the only chance is to use GNU tar or bsdtar on the guest to create a tar archive containing the sparse file, and use GNU tar to expand that on Windows, after transferring the archive (which is a regular file). That's just for a quick test: an official Vagrant image would need to be extracted by Vagrant on Windows, we still to figure out what it will do. Does anyone have any experience with backup software typically used on Windows, especially if they can backup and restore sparse files?
On OS X, the HFS+ filesystem has no support for sparse files, and Apple removed sparse file support from their implementation of UFS (which used to have them).
On 06/04/17 14:03, Laurentiu Pancescu wrote:
We should probably either:
- call GNU tar from Image Factory to preserve the sparseness of our
image files (and hope that Windows handles them properly)
- use VBoxManage to convert to .vhd (VirtualBox is not in our repos, but
Oracle offers a .rpm for EL7 systems).
I would prefer the second option, using VBoxManage to produce small non-sparse .VHD images for our Vagrant boxes, and avoid the compatibility problems posed by sparse files. But this would need to be done in Image Factory, so it's actually for Ian to decide if using VirtualBox is acceptable.
On Fri, Apr 7, 2017 at 7:51 PM, Laurentiu Pancescu lpancescu@gmail.com wrote:
We should probably either:
- call GNU tar from Image Factory to preserve the sparseness of our image
files (and hope that Windows handles them properly)
- use VBoxManage to convert to .vhd (VirtualBox is not in our repos, but
Oracle offers a .rpm for EL7 systems).
I would prefer the second option, using VBoxManage to produce small non-sparse .VHD images for our Vagrant boxes, and avoid the compatibility problems posed by sparse files. But this would need to be done in Image Factory, so it's actually for Ian to decide if using VirtualBox is acceptable.
After using qemu-img to convert the qcow2 image to VHD and then extracting the resulting tar file on the Hyper V host, the resulting image was back to being 40GB 'non-sparse'.
Using VBoxManage instead, 'vagrant up' time was reduced to be comparable to the kozo/centos7 box:
centos7-hyperv (original image): 4m19.232s kozo/centos-7: 51.069s centos7-hyperv (converted with virtualbox): 48.041s
So I would also say the second option is the way to go if possible :)
On 10/04/17 16:57, Michael Vermaes wrote:
On Fri, Apr 7, 2017 at 7:51 PM, Laurentiu Pancescu lpancescu@gmail.com wrote:
I would prefer the second option, using VBoxManage to produce small non-sparse .VHD images for our Vagrant boxes, and avoid the compatibility problems posed by sparse files. But this would need to be done in Image Factory, so it's actually for Ian to decide if using VirtualBox is acceptable.
After using qemu-img to convert the qcow2 image to VHD and then extracting the resulting tar file on the Hyper V host, the resulting image was back to being 40GB 'non-sparse'.
Using VBoxManage instead, 'vagrant up' time was reduced to be comparable to the kozo/centos7 box: [snip] So I would also say the second option is the way to go if possible :)
qemu-img has several options for the vpc and vhdx formats ("qemu-img convert -O vpc -o ?" will show them). Earlier today I tried all possible combinations of subformat, preallocation and block_state_zero, without any visible effect: it always produces sparse files for vpc (.vhd) and huge non-sparse file for .vhdx.
qemu-img seems to be right to produce sparse .vhd files, if I correctly understand the documentation from libvhdi. [1] VirtualBox doesn't, but what they're doing seems to work better in practice.
[1] https://github.com/libyal/libvhdi/blob/master/documentation/Virtual%20Hard%2...
Any thoughts about using VBoxManage to convert the images to .vhd? VirtualBox supports neither .vhdx nor QCOW2, so we'd need to convert via .vmdk or another format.
On 07/04/17 13:51, Laurentiu Pancescu wrote:
I would prefer the second option, using VBoxManage to produce small non-sparse .VHD images for our Vagrant boxes, and avoid the compatibility problems posed by sparse files. But this would need to be done in Image Factory, so it's actually for Ian to decide if using VirtualBox is acceptable.
We now have a working solution for producing regular .vhd images as part of our release process: CBS is now using qemu-img-ev and qemu-kvm-ev from the Virt SIG, instead of the versions in CentOS Linux base. We don't need either VirtualBox or changes in Image Factory.
If anyone has access to Hyper-V on Windows and would be willing to test two Vagrant images, please let me know. I received conflicting test results from Thomas and Michael, and I'd like to have at least an additional test before deciding whether to release them or not.
Thanks, Laurențiu
On Wed, Apr 26, 2017 at 12:34 PM, Laurentiu Pancescu lpancescu@gmail.com wrote:
We now have a working solution for producing regular .vhd images as part of our release process: CBS is now using qemu-img-ev and qemu-kvm-ev from the Virt SIG, instead of the versions in CentOS Linux base. We don't need either VirtualBox or changes in Image Factory.
If anyone has access to Hyper-V on Windows and would be willing to test two Vagrant images, please let me know. I received conflicting test results from Thomas and Michael, and I'd like to have at least an additional test before deciding whether to release them or not.
Thanks, Laurențiu
I just reached out to a friend on the Hyper-V team at MS to see if we can figure something out. It would be nice to have a way to automate this testing going forward.
-Jeff