[CentOS] Understanding VDO vs ZFS

Hi David,

in my opinion, VDO isn't worth the effort. I tried VDO for the same use case: backups. My dataset is 2-3TB and I backup daily. Even with a smaller dataset, VDO couldn't stand up to it's promises. It used tons of CPU and memory and with a lot of tuning I could get it to kind of work, but it became corrupted at the slightest problem (even a shutdown could do this, and shutdowns could also take hours).

I have tried a number of things and I use a combination of two things now:
1. a btrfs volume with force-compress enabled to store the intermediate data - it compresses my data to about 60% and that's enough for me
2. use of bup (https://bup.github.io/) to store long-term backups.

bup is incredibly efficient for my use case (full VM backups). Over the course of a whole month, the dataset only increases by about 30% from the initial size (I create a new full backup each month) - and this is with FULL backups of all VMs every day. bup backupsets can also be mounted via FUSE, giving you access to all stored versions in a filesystem-like manner.

If you can backup at will you can probably forego the btrfs volume for intermediate storage - that is just a band-aid to work around a specific issue here.

Stefan

--

________________________________
From: CentOS <centos-bounces at centos.org> on behalf of david <david at daku.org>
Sent: Sunday, May 3, 2020 2:50 AM
To: centos at centos.org <centos at centos.org>
Subject: [CentOS] Understanding VDO vs ZFS

Folks

I'm looking for a solution for backups because ZFS has failed on me
too many times.  In my environment, I have a large amount of data
(around 2tb) that I periodically back up.  I keep the last 5
"snapshots".  I use rsync so that when I overwrite the oldest backup,
most of the data is already there and the backup completes quickly,
because only a small number of files have actually changed.

Because of this low change rate, I have used ZFS with its
deduplication feature to store the data.  I started using a Centos-6
installation, and upgraded years ago to Centos7.  Centos 8 is on my
agenda.  However, I've had several data-loss events with ZFS where
because of a combination of errors and/or mistakes, the entire store
was lost.  I've also noticed that ZFS is maintained separately from
Centos.  At this moment, the Centos 8 update causes ZFS to
fail.  Looking for an alternate, I'm trying VDO.

In the VDO installation, I created a logical volume containing two
hard-drives, and defined VDO on top of that logical volume.  It
appears to be running, yet I find the deduplication numbers don't
pass the smell test.  I would expect that if the logical volume
contains three copies of essentially identical data, I should see
deduplication numbers close to 3.00, but instead I'm seeing numbers
like 1.15.  I compute the compression number as follows:
  Use df and extract the value for "1k blocks used" from the third column
  use vdostats --verbose and extract the number titled "1K-blocks used"

Divide the first by the second.

Can you provide any advice on my use of ZFS or VDO without telling me
that I should be doing backups differently?

Thanks

David

_______________________________________________
CentOS mailing list
CentOS at centos.org
https://lists.centos.org/mailman/listinfo/centos