[CentOS] VDO killed my server

Mon Sep 3 19:25:00 UTC 2018
Yan Li <elliot.li.tech at gmail.com>

Interesting observation! I'm thinking about trying VDO too.

On 09/03/2018 11:40 AM, david wrote:
> I was impressed with the description of VDO (Virtual Device Optimizer?) 
> in the RedHat documentaion, so much that I tried to use it.  The 
> tutorials led me to a few commands.  I built a VDO device on top of two 
> USB disks which I made into a Logical Volume, and I was ready to go.

USB connections are notorious flimsy. They are prone to randomly 
dropping ops and silent intermittent connection breaks, and are known to 
cause a lot of hard-to-debug problems when being used with more complex 
filesystems, such as ZFS and btrfs. Of course we shouldn't blame USB for 
every problem, but I wouldn't be surprised if USB is playing naughty here.

> In my test case, I had a file set of about 600 GB.  There was 5 TB of 
> space between the two disk LVMs.  So, I thought, let's see if I can 
> activate deduplication and compression, and see if VDO can take two, or 
> three, or four identical copies of that file set, at different points in 
> the file system tree.
> Needless to say, all worked well with the first set.  It took 24 hours 
> to copy.  The second set took another 24 hours, and all seemed well.  As 
> I was copying the third set, I started to observe some problems.  The 
> computer was serving other functions (internal DHCPD, DNS, internal 
> HTTPD), and these started to fail.  There were no obvious alerts or 
> warnings from VDO, but the other functions of the system started to 
> die.  The diagnostics from JOURNALCTL were vague (failure to create a 
> file...), 

Did these failures to create a file occur only on the file system on VDO 
or also on other file system?

> but when I want looking with 'df', all the file systems seemed 
> to have enough room for everything.  Even the 'top' program showed 
> available space in the pools it revealed.

How about free memory? What did `free -m` say?

> After hours of my internal clients complaining, I finally removed the 
> 'mount' in /etc/fstab that loaded the VDO system, killed the file 
> copies, and rebooted.  The system then resumed normal healthy functions, 
> but without the VDO files.
> It my mind, there are a few points:
> - If VDO is competing for a finite resource (Memory?), it probably 
> should start posting warnings, and eventually rejecting new files when 
> the pool is nearly full.  Or maybe, use a pool other than what the other 
> services use so as to minimize the impact on them.


> - The documentation talks about 'tuning', but if this resource is one of 
> concern, please don't bury it in the footnotes to the appendix.

I agree. Tuning should only affect performance, never normal functionality.

Yan Li