[CentOS] ZFS @ centOS

Tue Apr 5 03:09:29 UTC 2011
Warren Young <warren at etr-usa.com>

On 4/2/2011 2:54 PM, Dawid Horacio Golebiewski wrote:
> I do want to
> use ZFS and I thus far I have only found information about the ZFS-Fuse
> implementation and unclear hints that there is another way.

Here are some benchmark numbers I came up with just a week or two ago. 
(View with fixed-width font.)

Test                             ZFS raidz1      Hardware RAID-6
-------------------------------  ----------      ---------------
Sequential write, per character  11.5 (15% CPU)  71.1 MByte/s (97% CPU)
Sequential write, block          12.3 (1%)       297.9 MB/s (50%)
Sequential write, rewrite        11.8 (2%)       137.4 MB/s (27%)
Sequential read, per character   48.8 (63%)      72.5 MB/s (95%)
Sequential read, block           148.3 (5%)      344.3 MB/s (31%)
Random seeks                     103.0/s         279.6/s

The fact that the write speeds on the ZFS-FUSE test seem capped at ~12 
MB/s strikes me as odd.  It doesn't seem to be a FUSE bottleneck, since 
the read speeds are so much faster, but I can't think where else the 
problem could be since the hardware was identical for both tests. 
Nevertheless, it means ZFS-FUSE performed about as well as a Best Buy 
bus-powered USB drive on this hardware.  On only one test did it even 
exceed the performance of a single one of the drives in the array, and 
then not by very much.  Pitiful.

I did this test with Bonnie++ on a 3ware/LSI 9750-8i controller, with 
eight WD 3 TB disks attached.  Both tests were done with XFS on CentOS 
5.5, 32-bit.  (Yes, 32-bit.  Hard requirement for this application.) 
The base machine was a low-end server with a Core 2 Duo E7500 in it.  I 
interpret several of the results above as suggesting that the 3ware 
numbers could have been higher if the array were in a faster box.

For the ZFS configuration, I exported each disk from the 3ware BIOS as a 
separate single-disk volume, then collected them together into a single 
~19 TB raidz1 pool.  (This controller doesn't have a JBOD mode.)  I 
broke that up into three ~7 TB slices, each formatted with XFS.  I did 
the test on only one of the slices, figuring that they'd all perform 
about equally.

For the RAID-6 configuration, I used the 3ware card's hardware RAID, 
creating a single ~16 TB volume, formatted XFS.

You might be asking why I didn't choose to make a ~19 TB RAID-5 volume 
for the native 3ware RAID test to minimize the number of unnecessary 
differences.  I did that because after testing the ZFS-based system for 
about a week, we decided we'd rather have the extra redundancy than the 
capacity.  Dropping to 16.37 TB on the RAID configuration by switching 
to RAID-6 let us put almost the entire array under a single 16 TB XFS 

Realize that this switch from single redundancy to dual is a handicap 
for the native RAID test, yet it performs better across the board. 
In-kernel ZFS might have beat the hardware RAID on at least a few of the 
tests, due to that handicap.

(Please don't ask me to test one of the in-kernel ZFS patches for Linux. 
  We can't delay putting this box into production any longer, and in any 
case, we're building this server for another organization, so we 
couldn't send the patched box out without violating the GPL.)

Oh, and in case anyone is thinking I somehow threw the test, realize 
that I was rooting for ZFS from the start.  I only did the benchmark 
when it so completely failed to perform under load.  ZFS is beautiful 
tech.  Too bad it doesn't play well with others.

> Phoronix
> reported that http://kqinfotech.com/ would release some form of ZFS for the
> kernel but I have found nothing.

What a total cock-up that was.

Here we had this random company no one had ever heard from before 
putting out a press release that they *will be releasing* something in a 
few months.

Maybe it's easy to say this 6 months hence and we're all sitting here 
listening to the crickets, but I called it at the time: Phoronix should 
have tossed that press release into the trash, or at least held off on 
saying anything about it until something actually shipped.  Reporting a 
clearly BS press release, seriously?  Are they *trying* to destroy their