[CentOS] Was: Re: Are linux distros redundant?, is zfs

Thu Apr 25 20:24:13 UTC 2019
Benjamin Smith <lists at benjamindsmith.com>

On Wednesday, April 24, 2019 1:18:46 PM PDT mark wrote:
> Benjamin Smith wrote:
> > On Wednesday, April 24, 2019 11:25:00 AM PDT Andrew Holway wrote:
> >>> Btw, right now, we've just built a new server as Ubuntu, because my
> >>> manager wants to use it to test zfs, including its ability to a) act
> >>> as a RAID, directly, without an underlying RAID, and b) encrypt the
> >>> whole thing natively.
> >> 
> >> ZFS on linux was originally an EL project. Ubuntu support came later.
> > 
> > I've been running ZoL on CentOS for years. Wonderful stuff. SysAdmin's
> > dream, although we keep all ZoL boxes off any public access and update on
> > a carefully tested schedule to ensure that no RPM version weirdness
> > happens.
> 
> Ok, how's it scale on large filesystems? My manager was running some tests
> this morning, 37TB f/s, and he said it seemed slow to respond. Also, dedup
> seemed to take actual time - minutes. I saw him as he copied a 1G test
> file to two or three other names, and it took time to go up, then level
> off in size, with "size" being what we saw with df.

We don't use dedup so I can't comment there. Memory requirements jump 
*sharply* with dedup and our use case doesn't call for it anyway. 

We store in the range of 500M files in 48TB total pool size, 3x 6 RAIDZ2 
vdevs, 4 TB per HGST 3.5" drive, random read/write load. System memory is 32 
GB ECC on (now) older Xeon processors. Nothing too fancy. We don't use any 
particular caching schemes like ARC. We use compression and get about 1.5 
compressratio. 

For simple file read/writes, It's somewhat slower than ext* (perhaps 1/2 the 
IOPS) but that's more than made up by the fact that backups take minutes 
rather than rsync's days and are always consistent a la snapshots. (You do 
back up, right?) The lack of having a 24x7 rsync process saves far more IOPS 
than the 1/2 cost, so in practice we come out way ahead with a sharp 
performance increase. We considered lsyncd but having a 3 day window of 
replication every time we update our master file servers really seemed like a 
non-starter. 

We have 3 systems of identical configuration, and our application layer reads/
writes to all 3 concurrently at the application level using a custom-written 
daemon that operates over unencrypted TCP sockets on a private LAN. For 
safety, one of the systems is still running ext* just in case we get bit by a 
ZFS bug. (knock on wood, hasn't happened yet in 7 years) 

Performance is more than adequate and has never been a bottleneck, even with 
up to 15,000 concurrent users. If we needed performance increases, we'd 
probably switch to using mirror vdevs instead of RAIDZ2, and then perhaps 
L2ARC. Finally, we can shard the file store if needed at the application 
layer, but we've long ago passed our biggest estimations of when we'd need to 
do this and so far, no significant issues.