New subject: ZFS on Linux testing

14 Dec 2013


      On 12/14/2013, 04:00 , lists@benjamindsmith.com wrote:
...
We checked lsyncd out and it's most certainly an very interesting tool.
I*will*  be using it in the future!
However, we found that it has some issues scaling up to really big file
stores that we haven't seen (yet) with ZFS.
For example, the first thing it has to do when it comes online is a
fully rsync of the watched file area. This makes sense; you need to do
this to ensure integrity. But if you have a large file store, EG: many
millions of files and dozens of TB then this first step can take days,
even if the window of downtime is mere minutes due to a restart. Since
we're already at this stage now (and growing rapidly!) we've decided to
keep looking for something more elegant and ZFS appears to be almost an
exact match. We have not tested the stability of lsyncd managing the
many millions of inode write notifications in the meantime, but just
trying to satisfy the write needs for two smaller customers (out of
hundreds) with lsyncd led to crashes and the need to modify kernel
parameters.
As another example, lsyncd solves a (highly useful!) problem of
replication, which is a distinctly different problem than backups.
Replication is useful, for example as a read-only cache for remote
application access, or for disaster recovery with near-real-time
replication, but it's not a backup. If somebody deletes a file
accidentally, you can't go to the replicated host and expect it to be
there. And unless you are lsyncd'ing to a remote file system with it's
own snapshot capability, there isn't an easy way to version a backup
short of running rsync (again) on the target to create hard links or
something - itself a very slow, intensive process with very large
filesystems. (days)
I'll still be experimenting with lsyncd further to evaluate its real
usefulness and performance compared to ZFS and report results. As said
before, we'll know much more in another month or two once our next stage
of roll out is complete.
-Ben
Hi Ben,
Yes, the initial replication of a large filesystem is *very* time 
consuming!  But it makes sleeping at night much easier.  I did have to 
crank up the inotify kernel parameters by a significant amount.
I did the initial replication using rsync directly, rather than asking 
lsyncd to do it.  I notice that if I reboot the primary server, it takes 
a while for the inotify tables to be rebuilt ... after that it's smooth 
sailing.
If you want to prevent deletion of files from your replicated filesystem 
(which I do), you can modify the rsync{} array in the lsyncd.lua file by 
adding the line 'delete = false' to it.  This has saved my butt a few 
times when a user has accidentally deleted a file on the primary server.
I agree that filesystem replication isn't really a backup, but for now 
it's all I have available, but at least the replicated fs is on a 
separate machine.
As a side note for anyone using a file server for hosting OS-X Time 
Machine backups, the 'delete' parameter in rsync{} must be set to 'true' 
in order to prevent chaos should a user need to point their Mac at the 
replicate filesystem (which should be a very rare event).  I put all TM 
backups in a separate ZFS sub-pool for this reason.
Chuck

Re: [CentOS] ZFS on Linux testing