[CentOS] KVM vs. incremental remote backups

Sun Apr 4 00:11:13 UTC 2021
Gordon Messmer <gordon.messmer at gmail.com>

On 3/31/21 12:50 PM, Nicolas Kovacs wrote:
> The problem with using Rsnapshot on the VM's filesystems rather than backing up
> the whole VM is the time it takes to restore all the mess.


All the same, backing up the VM filesystem from within the VM is the 
best way to back them up using rsnapshot.

rsnapshot's approach of hard links and rsync necessarily means that each 
time a file changes, the copy in the backup set consumes the entire file 
size if any byte in the origin file has changed. If you're backing up VM 
images, you're giving up all of the efficiency that rsnapshot was 
designed for.

I'd note that your original message said that you were transferring the 
entire VM image.  That *shouldn't* be the case. rsync should be 
transferring only the changed bits over the network, but on disk you'll 
have an entirely new file.

There are a few ways you can work around that with rsnapshot, but I'm 
not aware of an easy solution.

One option would be to use btrfs as your backup volume and write wrapper 
scripts for cmd_cp and cmd_rm.  Rather than the default behavior, you'd 
want to create a snapshot (for cmd_cp) and remove snapshots (for cmd_rm).

The other option that comes to mind would be to use either XFS or btrfs 
as your backup volume and write a wrapper script for cmd_cp.  This would 
be simpler, the script would just be:

     #!/bin/sh
     exec cp --reflink=always "$@"

If you pursued either option, you'd want to modify the rsnapshot 
rsync_long_args setting, and add --inplace.

Those two approaches would take advantage of CoW filesystem capabilities 
to conserve disk space.  If you decide to pursue them, bear in mind that 
"du" will report that each of the resulting VM images are full size, 
even though that's not really the case.  The only way (that I know of) 
to accurately measure disk use will be to run "df" before a backup and 
after, and compare the disk use of the filesystem.