Hi David, in my opinion, VDO isn't worth the effort. I tried VDO for the same use case: backups. My dataset is 2-3TB and I backup daily. Even with a smaller dataset, VDO couldn't stand up to it's promises. It used tons of CPU and memory and with a lot of tuning I could get it to kind of work, but it became corrupted at the slightest problem (even a shutdown could do this, and shutdowns could also take hours). I have tried a number of things and I use a combination of two things now: 1. a btrfs volume with force-compress enabled to store the intermediate data - it compresses my data to about 60% and that's enough for me 2. use of bup (https://bup.github.io/) to store long-term backups. bup is incredibly efficient for my use case (full VM backups). Over the course of a whole month, the dataset only increases by about 30% from the initial size (I create a new full backup each month) - and this is with FULL backups of all VMs every day. bup backupsets can also be mounted via FUSE, giving you access to all stored versions in a filesystem-like manner. If you can backup at will you can probably forego the btrfs volume for intermediate storage - that is just a band-aid to work around a specific issue here. Stefan -- ________________________________ From: CentOS <centos-bounces at centos.org> on behalf of david <david at daku.org> Sent: Sunday, May 3, 2020 2:50 AM To: centos at centos.org <centos at centos.org> Subject: [CentOS] Understanding VDO vs ZFS Folks I'm looking for a solution for backups because ZFS has failed on me too many times. In my environment, I have a large amount of data (around 2tb) that I periodically back up. I keep the last 5 "snapshots". I use rsync so that when I overwrite the oldest backup, most of the data is already there and the backup completes quickly, because only a small number of files have actually changed. Because of this low change rate, I have used ZFS with its deduplication feature to store the data. I started using a Centos-6 installation, and upgraded years ago to Centos7. Centos 8 is on my agenda. However, I've had several data-loss events with ZFS where because of a combination of errors and/or mistakes, the entire store was lost. I've also noticed that ZFS is maintained separately from Centos. At this moment, the Centos 8 update causes ZFS to fail. Looking for an alternate, I'm trying VDO. In the VDO installation, I created a logical volume containing two hard-drives, and defined VDO on top of that logical volume. It appears to be running, yet I find the deduplication numbers don't pass the smell test. I would expect that if the logical volume contains three copies of essentially identical data, I should see deduplication numbers close to 3.00, but instead I'm seeing numbers like 1.15. I compute the compression number as follows: Use df and extract the value for "1k blocks used" from the third column use vdostats --verbose and extract the number titled "1K-blocks used" Divide the first by the second. Can you provide any advice on my use of ZFS or VDO without telling me that I should be doing backups differently? Thanks David _______________________________________________ CentOS mailing list CentOS at centos.org https://lists.centos.org/mailman/listinfo/centos