Hi Niels,
On 06/04/17 14:57, Niels de Vos wrote:
Maybe you can post-process the large .vhd file with "virt-sparsify"? I do not know if it would recognize the format, but it is worth to try it. Otherwise it should be possible to "punch holes" in the large .vhd file with a little program that calls fallocate(2) on zero-filled areas.
The .vhd files that qemu-img creates are already sparse, their real size is 1.1GB (seen by du, ls reports 40GB); the problem occurs during their addition to a .tar.gz file, because Python's tarfile module treats them as regular files while reading them. The resulting archive is just 480MB, since gzip can handle lots of adjacent zeros quite well. This archive has to be extracted by Vagrant on Windows, so even if the file would still be sparse inside the archive, we depend on Ruby for Windows to handle them properly, and Hyper-V as well. I'll have to wait until at least Monday, when Michael can hopefully test if a locally-generated sparse file works properly on Windows.
Perhaps there are no issues with sparse files and Hyper-V, but we're the only ones generating such files. The .vhdx images exported by Hyper-V are regular files (everyone else uses Packer's Hyper-V plugin), as are the .vhd files produced by VirtualBox - both the theoretical and the real size is 1.1GB. Only qemu-img produces huge sparse .vhd files, although it produces regular 1.1GB .vmdk files (used by the VirtualBox variant of our Vagrant boxes). Even worse, the .vhdx files produced by qemu-img from EL7 are huge non-sparse 41GB files - this bug was allegedly fixed upstream around January 2015, but maybe the fix wasn't backported yet.
Using VirtualBox for the .vhd conversion would be the least likely to generate surprises, but maybe using GNU tar to create the .ova archive for Vagrant (instead of Python's tarfile) will be enough.
Best regards, Laurențiu