[CentOS] Understanding VDO vs ZFS

Sun May 3 02:50:30 UTC 2020
david <david at daku.org>


I'm looking for a solution for backups because ZFS has failed on me 
too many times.  In my environment, I have a large amount of data 
(around 2tb) that I periodically back up.  I keep the last 5 
"snapshots".  I use rsync so that when I overwrite the oldest backup, 
most of the data is already there and the backup completes quickly, 
because only a small number of files have actually changed.

Because of this low change rate, I have used ZFS with its 
deduplication feature to store the data.  I started using a Centos-6 
installation, and upgraded years ago to Centos7.  Centos 8 is on my 
agenda.  However, I've had several data-loss events with ZFS where 
because of a combination of errors and/or mistakes, the entire store 
was lost.  I've also noticed that ZFS is maintained separately from 
Centos.  At this moment, the Centos 8 update causes ZFS to 
fail.  Looking for an alternate, I'm trying VDO.

In the VDO installation, I created a logical volume containing two 
hard-drives, and defined VDO on top of that logical volume.  It 
appears to be running, yet I find the deduplication numbers don't 
pass the smell test.  I would expect that if the logical volume 
contains three copies of essentially identical data, I should see 
deduplication numbers close to 3.00, but instead I'm seeing numbers 
like 1.15.  I compute the compression number as follows:
  Use df and extract the value for "1k blocks used" from the third column
  use vdostats --verbose and extract the number titled "1K-blocks used"

Divide the first by the second.

Can you provide any advice on my use of ZFS or VDO without telling me 
that I should be doing backups differently?