ZFS on Linux testing effort

List overview All Threads
Download

newer

older

backing up pending at commands

suggestions for large filesystem...

Andrew Holway

30 Nov 2013 30 Nov '13

2:20 p.m.

Hey,

http://zfsonlinux.org/epel.html

If you have a little time and resource please install and report back any problems you see.

A filesystem or Volume sits within a zpool a zpool is made up of vdevs vdevs are made up of block devices.

zpool is similar to LVM volume vdev is similar to raid set

devices can be files.

Thanks,

Andrew

Show replies by date

Lists

3 Dec 3 Dec

6:38 p.m.

Andrew,

We've been testing ZFS since about 10/24, see my original post (and replies) asking about its suitability "ZFS on Linux in production" on this list. So far, it's been rather impressive. Enabling compression better than halved the disk space utilization in a low/medium bandwidth (mainly archival) usage case.

Dealing with many TB of data in a "real" environment is a very slow, conservative process; our ZFS implementation has, so far, been limited to a single redundant copy of a file system on a server that only backs up other servers.

Our next big test is to try out ZFS filesystem send/receive in lieu of our current backup processes based on rsync. Rsync is a fabulous tool, but is beginning to show performance/scalability issues dealing with the many millions of files being backed up, and we're hoping that ZFS filesystem replication solves this.

This stage of deployment is due to be in place by about 1/2014.

-Ben

On 11/30/2013 06:20 AM, Andrew Holway wrote:

...

Hey,

http://zfsonlinux.org/epel.html

If you have a little time and resource please install and report back any problems you see.

A filesystem or Volume sits within a zpool a zpool is made up of vdevs vdevs are made up of block devices.

zpool is similar to LVM volume vdev is similar to raid set

devices can be files.

Thanks,

Andrew _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

John Doe

4 Dec 4 Dec

2:05 p.m.

From: Lists lists@benjamindsmith.com

...

Our next big test is to try out ZFS filesystem send/receive in lieu of our current backup processes based on rsync. Rsync is a fabulous tool, but is beginning to show performance/scalability issues dealing with the many millions of files being backed up, and we're hoping that ZFS filesystem replication solves this.

Not sure if I already mentioned it but maybe have a look at: http://code.google.com/p/lsyncd/

Nux!

2:12 p.m.

On 04.12.2013 14:05, John Doe wrote:

...

From: Lists lists@benjamindsmith.com

...
Our next big test is to try out ZFS filesystem send/receive in lieu of our current backup processes based on rsync. Rsync is a fabulous tool, but is beginning to show performance/scalability issues dealing with the many millions of files being backed up, and we're hoping that ZFS filesystem replication solves this.

Not sure if I already mentioned it but maybe have a look at: http://code.google.com/p/lsyncd/

I'm not so sure inotify works well with millions of files, not to mention it uses rsync. :D

-- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro

Lists

13 Dec 13 Dec

9:52 p.m.

On 12/04/2013 06:05 AM, John Doe wrote:

...

Not sure if I already mentioned it but maybe have a look at: http://code.google.com/p/lsyncd/

We checked lsyncd out and it's most certainly an very interesting tool. I *will* be using it in the future!

However, we found that it has some issues scaling up to really big file stores that we haven't seen (yet) with ZFS.

For example, the first thing it has to do when it comes online is a fully rsync of the watched file area. This makes sense; you need to do this to ensure integrity. But if you have a large file store, EG: many millions of files and dozens of TB then this first step can take days, even if the window of downtime is mere minutes due to a restart. Since we're already at this stage now (and growing rapidly!) we've decided to keep looking for something more elegant and ZFS appears to be almost an exact match. We have not tested the stability of lsyncd managing the many millions of inode write notifications in the meantime, but just trying to satisfy the write needs for two smaller customers (out of hundreds) with lsyncd led to crashes and the need to modify kernel parameters.

As another example, lsyncd solves a (highly useful!) problem of replication, which is a distinctly different problem than backups. Replication is useful, for example as a read-only cache for remote application access, or for disaster recovery with near-real-time replication, but it's not a backup. If somebody deletes a file accidentally, you can't go to the replicated host and expect it to be there. And unless you are lsyncd'ing to a remote file system with it's own snapshot capability, there isn't an easy way to version a backup short of running rsync (again) on the target to create hard links or something - itself a very slow, intensive process with very large filesystems. (days)

I'll still be experimenting with lsyncd further to evaluate its real usefulness and performance compared to ZFS and report results. As said before, we'll know much more in another month or two once our next stage of roll out is complete.

-Ben

Ryan Wagoner

3 Dec 3 Dec

8:57 p.m.

On Sat, Nov 30, 2013 at 9:20 AM, Andrew Holway andrew.holway@gmail.comwrote:

...

Hey,

http://zfsonlinux.org/epel.html

If you have a little time and resource please install and report back any problems you see.

A filesystem or Volume sits within a zpool a zpool is made up of vdevs vdevs are made up of block devices.

zpool is similar to LVM volume vdev is similar to raid set

devices can be files.

Thanks,

Andrew

Andrew,

I've been using ZFS 0.6.1 on CentOS 6.4 for the past 6 months. For the past few years before I was using mdam with ext4 on CentOS 5. The main reason for upgrading was snapshots integrated with Samba for file shares and compression. So far so good.

Ryan

Lists

6 Jan 6 Jan

10:54 p.m.

On 11/30/2013 06:20 AM, Andrew Holway wrote:

...

Hey,

http://zfsonlinux.org/epel.html

If you have a little time and resource please install and report back any problems you see.

Andrew,

I want to run /var on zfs, but when I try to move /var over it won't boot thereafter, with errors about /var/log missing. Reading the ubuntu howto for ZFS indicates that while it's possible to even boot from zfs, it's a rather long and complicated process.

I don't want to boot from ZFS, but it appears that grub needs to be set up to support ZFS in order to be able to mount zfs filesystems, and it's possible that EL6's grub just isn't new enough. Is there a howto/ instructions for setting up zfs on CentOS/6 so that it's available on boot?

Thanks,

Ben

Cliff Pratt

11:26 p.m.

Grub only needs to know about the filesystems that it uses to boot the system. Mounting of the other file systems including /var is the responsibility of the system that has been booted. I suspect that you have something else wrong if you can't boot with /var/ on ZFS.

I may be wrong, but I don't think so. If grub needed to know about the file systems other than the one it is using to boot, then it would have parameters to describe the other file systems.

Cheers,

Cliff

On Tue, Jan 7, 2014 at 11:54 AM, Lists lists@benjamindsmith.com wrote:

...

On 11/30/2013 06:20 AM, Andrew Holway wrote:

...
Hey,

http://zfsonlinux.org/epel.html

If you have a little time and resource please install and report back any problems you see.

Andrew,

I want to run /var on zfs, but when I try to move /var over it won't boot thereafter, with errors about /var/log missing. Reading the ubuntu howto for ZFS indicates that while it's possible to even boot from zfs, it's a rather long and complicated process.

I don't want to boot from ZFS, but it appears that grub needs to be set up to support ZFS in order to be able to mount zfs filesystems, and it's possible that EL6's grub just isn't new enough. Is there a howto/ instructions for setting up zfs on CentOS/6 so that it's available on boot?

Thanks,

Ben _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

John R Pierce

11:40 p.m.

On 1/6/2014 3:26 PM, Cliff Pratt wrote:

...

Grub only needs to know about the filesystems that it uses to boot the system. Mounting of the other file systems including /var is the responsibility of the system that has been booted. I suspect that you have something else wrong if you can't boot with/var/ on ZFS.

I may be wrong, but I don't think so. If grub needed to know about the file systems other than the one it is using to boot, then it would have parameters to describe the other file systems.

more likely, syslog is trying to start prior to the zfs loadable kernel stuff being setup.

-- john r pierce 37N 122W somewhere on the middle of the left coast

Lists

2 Mar 2 Mar

1:45 a.m.

I had promised to weigh in on my experiences using ZFS in a production environment. We've been testing it for a few months now, and confidence is building. We've started using it in production about a month ago after months of non production testing.

I'll append my thoughts in a cross-post from another thread because I think it's an excellent summary for anybody looking for an Enterprise scale file system.

--- ORIGINAL POST ----

Ditto on ZFS! I've been experimenting with it for about 5-6 months and it really is the way to go for any filesystem greater than about 10 GB IMHO. We're in the process of transitioning several of our big data pools to ZFS because it's so obviously better.

Just remember that ZFS isn't casual! You have to take the time to understand what it is and how it works, because if you make the wrong mistake, it's curtains for your data. ZFS has a few maddening limitations** that you have to plan for. But it is far and away the leader in Copy-On-Write, large scale file systems, and once you know how to plan for it, ZFS capabilities are jaw-dropping. Here are a few off the top of my head:

1) Check for and fix filesystem errors without ever taking it offline. 2) Replace failed HDDs from a raidz pool without ever taking it offline. 3) Works best with inexpensive JBOD drives - it's actually recommended to not use expensive HW raid devices. 4) Native, built-in compression: double your usable disk space for free. 5) Extend (grow) your zfs pool without ever taking it offline. 6) Create a snapshot in seconds that you can keep or expire at any time. (snapshots are read-only, and take no disk space initially) 7) Send a snapshot (entire filesystem) to another server. Binary perfect copies in a single command, much faster than rsync when you have a large data set. 8) Ability to make a clone - a writable copy of a snapshot in seconds. A clone of a snapshot is writable, and snapshots can be created of a clone. A clone initially uses no disk space, and as you use it, it only uses the disk space of the changes between the current state of the clone and the snapshot it's derived from.

** Limitations? ZFS? Say it isn't so! But here they are:

1) You can't add redundancy after creating a vdev in a zfs pool. So if you make a ZFS vdev and don't make it raidz at the start, you can't add another more drives to get raidz. You also can't "add" redundancy to an existing raidz partition. Once you've made it raidz1, you can't add a drive to get raidz2. I've found a workaround, where you create a "fake" drive with a sparse file, and add the fake drive(s) to your RAIDZ pool upon creation, and immediately remove them. But you have to do this on initial creation! http://jeanbruenn.info/2011/01/18/setting-up-zfs-with-3-out-of-4-discs-raidz...

2) Zpools are grouped into vdevs, which you can think of like a block device made from 1 or more HDs. You can add vdevs without issue, but you can't remove them. EVER. Combine this fact with #1 and you had better be planning carefully when you extend a file system. See "Hating your data" section in this excellent ZFS walkthrough: http://arstechnica.com/information-technology/2014/02/ars-walkthrough-using-...

3) Like any COW file system, ZFS tends to fragment. This cuts into performance, especially when you have less than about 20-30% free space. This isn't as bad as it sounds, you can enable compression to double your usable space.

Bug) ZFS on Linux has been quite stable in my testing, but as of this writing, has a memory leak. The workaround is manageable but if you don't do it ZFS servers will eventually lock up. The workaround is fairly simple, google for "zfs /bin/echo 3 > /proc/sys/vm/drop_caches;"

4253

Age (days ago)

4345

Last active (days ago)

discuss@lists.centos.org

9 comments

7 participants

tags (0)

participants (7)

Andrew Holway
Cliff Pratt
John Doe
John R Pierce
Lists
Nux!
Ryan Wagoner