[CentOS] Understanding VDO vs ZFS

Mon May 4 14:31:53 UTC 2020
Andrew Walsh <awalsh at redhat.com>

On Mon, May 4, 2020 at 10:02 AM Stefan S <stefan at kalaam.org> wrote:
>
> Hi David,
>
> in my opinion, VDO isn't worth the effort. I tried VDO for the same use case: backups. My dataset is 2-3TB and I backup daily. Even with a smaller dataset, VDO couldn't stand up to it's promises. It used tons of CPU and memory and with a lot of tuning I could get it to kind of work, but it became corrupted at the slightest problem (even a shutdown could do this, and shutdowns could also take hours).

I'm sorry to hear you feel that way.  I would be interested to
understand the situations that you experienced this problem so that it
can be addressed better in the future.  Did you reach out for any
guidance when it was happening?

>
> I have tried a number of things and I use a combination of two things now:
> 1. a btrfs volume with force-compress enabled to store the intermediate data - it compresses my data to about 60% and that's enough for me
> 2. use of bup (https://bup.github.io/) to store long-term backups.
>
> bup is incredibly efficient for my use case (full VM backups). Over the course of a whole month, the dataset only increases by about 30% from the initial size (I create a new full backup each month) - and this is with FULL backups of all VMs every day. bup backupsets can also be mounted via FUSE, giving you access to all stored versions in a filesystem-like manner.
>
> If you can backup at will you can probably forego the btrfs volume for intermediate storage - that is just a band-aid to work around a specific issue here.
>
>
> Stefan
>
>
> --
>
> ________________________________
> From: CentOS <centos-bounces at centos.org> on behalf of david <david at daku.org>
> Sent: Sunday, May 3, 2020 2:50 AM
> To: centos at centos.org <centos at centos.org>
> Subject: [CentOS] Understanding VDO vs ZFS
>
> Folks
>
> I'm looking for a solution for backups because ZFS has failed on me
> too many times.  In my environment, I have a large amount of data
> (around 2tb) that I periodically back up.  I keep the last 5
> "snapshots".  I use rsync so that when I overwrite the oldest backup,
> most of the data is already there and the backup completes quickly,
> because only a small number of files have actually changed.
>
> Because of this low change rate, I have used ZFS with its
> deduplication feature to store the data.  I started using a Centos-6
> installation, and upgraded years ago to Centos7.  Centos 8 is on my
> agenda.  However, I've had several data-loss events with ZFS where
> because of a combination of errors and/or mistakes, the entire store
> was lost.  I've also noticed that ZFS is maintained separately from
> Centos.  At this moment, the Centos 8 update causes ZFS to
> fail.  Looking for an alternate, I'm trying VDO.
>
> In the VDO installation, I created a logical volume containing two
> hard-drives, and defined VDO on top of that logical volume.  It
> appears to be running, yet I find the deduplication numbers don't
> pass the smell test.  I would expect that if the logical volume
> contains three copies of essentially identical data, I should see
> deduplication numbers close to 3.00, but instead I'm seeing numbers
> like 1.15.  I compute the compression number as follows:
>   Use df and extract the value for "1k blocks used" from the third column
>   use vdostats --verbose and extract the number titled "1K-blocks used"
>
> Divide the first by the second.
>
> Can you provide any advice on my use of ZFS or VDO without telling me
> that I should be doing backups differently?
>
> Thanks
>
> David
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>