[CentOS] Understanding VDO vs ZFS

Sun May 3 07:39:35 UTC 2020
Alessandro Baggi <alessandro.baggi at gmail.com>

Il 03/05/20 04:50, david ha scritto:
> Folks
>
> I'm looking for a solution for backups because ZFS has failed on me 
> too many times.  In my environment, I have a large amount of data 
> (around 2tb) that I periodically back up.  I keep the last 5 
> "snapshots".  I use rsync so that when I overwrite the oldest backup, 
> most of the data is already there and the backup completes quickly, 
> because only a small number of files have actually changed.
>
> Because of this low change rate, I have used ZFS with its 
> deduplication feature to store the data.  I started using a Centos-6 
> installation, and upgraded years ago to Centos7.  Centos 8 is on my 
> agenda.  However, I've had several data-loss events with ZFS where 
> because of a combination of errors and/or mistakes, the entire store 
> was lost.  I've also noticed that ZFS is maintained separately from 
> Centos.  At this moment, the Centos 8 update causes ZFS to fail.  
> Looking for an alternate, I'm trying VDO.
>
> In the VDO installation, I created a logical volume containing two 
> hard-drives, and defined VDO on top of that logical volume.  It 
> appears to be running, yet I find the deduplication numbers don't pass 
> the smell test.  I would expect that if the logical volume contains 
> three copies of essentially identical data, I should see deduplication 
> numbers close to 3.00, but instead I'm seeing numbers like 1.15.  I 
> compute the compression number as follows:
>  Use df and extract the value for "1k blocks used" from the third column
>  use vdostats --verbose and extract the number titled "1K-blocks used"
>
> Divide the first by the second.
>
> Can you provide any advice on my use of ZFS or VDO without telling me 
> that I should be doing backups differently?
>
> Thanks
>
> David
>
Hi David, I'm not an expert about vdo but I will try it for backup 
purpose with rsync + hardlink. I know that this is not an answer you 
asked, sorry for this.

Many user said me to use  more specific tool for running backup using 
deduplication (borg in my case). I'm testing it and I'm not sure if I 
will adopt it in the long term. As you reported, you are using rsync 
solution so I would ask: why not use a more specific tool? What are 
benefits to stay with rsync for you?

Thank you in advance.