[CentOS] ZFS on Linux testing

Sat Dec 14 16:50:58 UTC 2013
Chuck Munro <chuckm at seafoam.net>

On 12/14/2013, 04:00 , lists at benjamindsmith.com wrote:

> We checked lsyncd out and it's most certainly an very interesting tool.
> I*will*  be using it in the future!
>
> However, we found that it has some issues scaling up to really big file
> stores that we haven't seen (yet) with ZFS.
>
> For example, the first thing it has to do when it comes online is a
> fully rsync of the watched file area. This makes sense; you need to do
> this to ensure integrity. But if you have a large file store, EG: many
> millions of files and dozens of TB then this first step can take days,
> even if the window of downtime is mere minutes due to a restart. Since
> we're already at this stage now (and growing rapidly!) we've decided to
> keep looking for something more elegant and ZFS appears to be almost an
> exact match. We have not tested the stability of lsyncd managing the
> many millions of inode write notifications in the meantime, but just
> trying to satisfy the write needs for two smaller customers (out of
> hundreds) with lsyncd led to crashes and the need to modify kernel
> parameters.
>
> As another example, lsyncd solves a (highly useful!) problem of
> replication, which is a distinctly different problem than backups.
> Replication is useful, for example as a read-only cache for remote
> application access, or for disaster recovery with near-real-time
> replication, but it's not a backup. If somebody deletes a file
> accidentally, you can't go to the replicated host and expect it to be
> there. And unless you are lsyncd'ing to a remote file system with it's
> own snapshot capability, there isn't an easy way to version a backup
> short of running rsync (again) on the target to create hard links or
> something - itself a very slow, intensive process with very large
> filesystems. (days)
>
> I'll still be experimenting with lsyncd further to evaluate its real
> usefulness and performance compared to ZFS and report results. As said
> before, we'll know much more in another month or two once our next stage
> of roll out is complete.
>
> -Ben

Hi Ben,

Yes, the initial replication of a large filesystem is *very* time 
consuming!  But it makes sleeping at night much easier.  I did have to 
crank up the inotify kernel parameters by a significant amount.

I did the initial replication using rsync directly, rather than asking 
lsyncd to do it.  I notice that if I reboot the primary server, it takes 
a while for the inotify tables to be rebuilt ... after that it's smooth 
sailing.

If you want to prevent deletion of files from your replicated filesystem 
(which I do), you can modify the rsync{} array in the lsyncd.lua file by 
adding the line 'delete = false' to it.  This has saved my butt a few 
times when a user has accidentally deleted a file on the primary server.

I agree that filesystem replication isn't really a backup, but for now 
it's all I have available, but at least the replicated fs is on a 
separate machine.

As a side note for anyone using a file server for hosting OS-X Time 
Machine backups, the 'delete' parameter in rsync{} must be set to 'true' 
in order to prevent chaos should a user need to point their Mac at the 
replicate filesystem (which should be a very rare event).  I put all TM 
backups in a separate ZFS sub-pool for this reason.

Chuck