On Wednesday, April 24, 2019 1:18:46 PM PDT mark wrote:
Benjamin Smith wrote:
On Wednesday, April 24, 2019 11:25:00 AM PDT Andrew Holway wrote:
Btw, right now, we've just built a new server as Ubuntu, because my manager wants to use it to test zfs, including its ability to a) act as a RAID, directly, without an underlying RAID, and b) encrypt the whole thing natively.
ZFS on linux was originally an EL project. Ubuntu support came later.
I've been running ZoL on CentOS for years. Wonderful stuff. SysAdmin's dream, although we keep all ZoL boxes off any public access and update on a carefully tested schedule to ensure that no RPM version weirdness happens.
Ok, how's it scale on large filesystems? My manager was running some tests this morning, 37TB f/s, and he said it seemed slow to respond. Also, dedup seemed to take actual time - minutes. I saw him as he copied a 1G test file to two or three other names, and it took time to go up, then level off in size, with "size" being what we saw with df.
We don't use dedup so I can't comment there. Memory requirements jump *sharply* with dedup and our use case doesn't call for it anyway.
We store in the range of 500M files in 48TB total pool size, 3x 6 RAIDZ2 vdevs, 4 TB per HGST 3.5" drive, random read/write load. System memory is 32 GB ECC on (now) older Xeon processors. Nothing too fancy. We don't use any particular caching schemes like ARC. We use compression and get about 1.5 compressratio.
For simple file read/writes, It's somewhat slower than ext* (perhaps 1/2 the IOPS) but that's more than made up by the fact that backups take minutes rather than rsync's days and are always consistent a la snapshots. (You do back up, right?) The lack of having a 24x7 rsync process saves far more IOPS than the 1/2 cost, so in practice we come out way ahead with a sharp performance increase. We considered lsyncd but having a 3 day window of replication every time we update our master file servers really seemed like a non-starter.
We have 3 systems of identical configuration, and our application layer reads/ writes to all 3 concurrently at the application level using a custom-written daemon that operates over unencrypted TCP sockets on a private LAN. For safety, one of the systems is still running ext* just in case we get bit by a ZFS bug. (knock on wood, hasn't happened yet in 7 years)
Performance is more than adequate and has never been a bottleneck, even with up to 15,000 concurrent users. If we needed performance increases, we'd probably switch to using mirror vdevs instead of RAIDZ2, and then perhaps L2ARC. Finally, we can shard the file store if needed at the application layer, but we've long ago passed our biggest estimations of when we'd need to do this and so far, no significant issues.