I have trouble finding definitive information about this. I am considering the use of SME 7.5.1 (centOS based) for my server needs, but I do want to use ZFS and I thus far I have only found information about the ZFS-Fuse implementation and unclear hints that there is another way. Phoronix reported that http://kqinfotech.com/ would release some form of ZFS for the kernel but I have found nothing.
Can so. tell me if fuse-ZFS is more trouble than it's worth?
Thanks in adv.
Dawide
Can so. tell me if fuse-ZFS is more trouble than it's worth?
I've tried both fuse-ZFS, and also zfs installed from rpm's on zfsonlinux.org. Both on centos 5.5.
fuse-ZFS is more polished, but cut write speeds in half on my raid 5.
I ended up going ext4.
SME Server is great by the way - been using it for years.
On 04/02/11 1:54 PM, Dawid Horacio Golebiewski wrote:
I have trouble finding definitive information about this. I am considering the use of SME 7.5.1 (centOS based) for my server needs, but I do want to use ZFS and I thus far I have only found information about the ZFS-Fuse implementation and unclear hints that there is another way. Phoronix reported that http://kqinfotech.com/ would release some form of ZFS for the kernel but I have found nothing.
Can so. tell me if fuse-ZFS is more trouble than it's worth?
ZFS isn't GPL, therefore can't be integrated into the kernel where a file system belongs, therefore is pretty much relegated to user space (fuse), and its just not very well supported on Linux.
If you really want to use ZFS, I'd suggest using Solaris or one of its derivatives (OpenIndiana, etc) where its native.
I pondered Solaris for some time, but as I do not intend to build the OS "from scratch" and nexenta was to GUIed for me I started researching SME. What puzzled me is the theory and the practice: RAIDz is the best solution from a theoretical standpoint (maximum features available) but still raid 5,65+0 etc. are used. Why?
Opensolaris supposedly stopped last February.
John R Pierce schrieb:
On 04/02/11 1:54 PM, Dawid Horacio Golebiewski wrote:
I have trouble finding definitive information about this. I am considering the use of SME 7.5.1 (centOS based) for my server needs, but I do want to use ZFS and I thus far I have only found information about the ZFS-Fuse implementation and unclear hints that there is another way. Phoronix reported that http://kqinfotech.com/ would release some form of ZFS for the kernel but I have found nothing.
Can so. tell me if fuse-ZFS is more trouble than it's worth?
ZFS isn't GPL, therefore can't be integrated into the kernel where a file system belongs, therefore is pretty much relegated to user space (fuse), and its just not very well supported on Linux.
If you really want to use ZFS, I'd suggest using Solaris or one of its derivatives (OpenIndiana, etc) where its native.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Apr 2, 2011, at 5:28 PM, Dawid Horacy Golebiewski dawid.golebiewski@tu-harburg.de wrote:
I pondered Solaris for some time, but as I do not intend to build the OS "from scratch" and nexenta was to GUIed for me I started researching SME. What puzzled me is the theory and the practice: RAIDz is the best solution from a theoretical standpoint (maximum features available) but still raid 5,65+0 etc. are used. Why?
Raidz/2/3 isn't raid5/6, similar but not the same.
In ZFS you create vdevs, which can be individual disks, mirrors or raidzs. Then you create a pool out of multiple vdevs. The vdevs should be of the same size and type for performance and capacity planning reasons, but it's not a requirement. Currently you cannot add drives to a vdev, but you can add vdevs to a pool. IO written to a pool does round robin across the vdevs giving a raid0 type performance.
-Ross
On Sat, Apr 02, 2011 at 02:13:01PM -0700, John R Pierce spake thusly:
ZFS isn't GPL, therefore can't be integrated into the kernel where a file system belongs, therefore is pretty much relegated to user space
It can be patched in by the end user. So if someone were to distribute a patch which could be dropped into a current SRPM by the end user and a change to the SPEC file made to apply the patch during the rpm build the user would be all set.
2011/4/6 Tracy Reed treed@ultraviolet.org:
On Sat, Apr 02, 2011 at 02:13:01PM -0700, John R Pierce spake thusly:
ZFS isn't GPL, therefore can't be integrated into the kernel where a file system belongs, therefore is pretty much relegated to user space
It can be patched in by the end user. So if someone were to distribute a patch which could be dropped into a current SRPM by the end user and a change to the SPEC file made to apply the patch during the rpm build the user would be all
Look at: http://zfsonlinux.org/
Currently best way to get zfs with features (deduplication) is opensolaris (nexenta?)
Anyway, btrfs supports also compression nowdays.
-- Eero
On 4/2/2011 2:54 PM, Dawid Horacio Golebiewski wrote:
I do want to use ZFS and I thus far I have only found information about the ZFS-Fuse implementation and unclear hints that there is another way.
Here are some benchmark numbers I came up with just a week or two ago. (View with fixed-width font.)
Test ZFS raidz1 Hardware RAID-6 ------------------------------- ---------- --------------- Sequential write, per character 11.5 (15% CPU) 71.1 MByte/s (97% CPU) Sequential write, block 12.3 (1%) 297.9 MB/s (50%) Sequential write, rewrite 11.8 (2%) 137.4 MB/s (27%) Sequential read, per character 48.8 (63%) 72.5 MB/s (95%) Sequential read, block 148.3 (5%) 344.3 MB/s (31%) Random seeks 103.0/s 279.6/s
The fact that the write speeds on the ZFS-FUSE test seem capped at ~12 MB/s strikes me as odd. It doesn't seem to be a FUSE bottleneck, since the read speeds are so much faster, but I can't think where else the problem could be since the hardware was identical for both tests. Nevertheless, it means ZFS-FUSE performed about as well as a Best Buy bus-powered USB drive on this hardware. On only one test did it even exceed the performance of a single one of the drives in the array, and then not by very much. Pitiful.
I did this test with Bonnie++ on a 3ware/LSI 9750-8i controller, with eight WD 3 TB disks attached. Both tests were done with XFS on CentOS 5.5, 32-bit. (Yes, 32-bit. Hard requirement for this application.) The base machine was a low-end server with a Core 2 Duo E7500 in it. I interpret several of the results above as suggesting that the 3ware numbers could have been higher if the array were in a faster box.
For the ZFS configuration, I exported each disk from the 3ware BIOS as a separate single-disk volume, then collected them together into a single ~19 TB raidz1 pool. (This controller doesn't have a JBOD mode.) I broke that up into three ~7 TB slices, each formatted with XFS. I did the test on only one of the slices, figuring that they'd all perform about equally.
For the RAID-6 configuration, I used the 3ware card's hardware RAID, creating a single ~16 TB volume, formatted XFS.
You might be asking why I didn't choose to make a ~19 TB RAID-5 volume for the native 3ware RAID test to minimize the number of unnecessary differences. I did that because after testing the ZFS-based system for about a week, we decided we'd rather have the extra redundancy than the capacity. Dropping to 16.37 TB on the RAID configuration by switching to RAID-6 let us put almost the entire array under a single 16 TB XFS filesystem.
Realize that this switch from single redundancy to dual is a handicap for the native RAID test, yet it performs better across the board. In-kernel ZFS might have beat the hardware RAID on at least a few of the tests, due to that handicap.
(Please don't ask me to test one of the in-kernel ZFS patches for Linux. We can't delay putting this box into production any longer, and in any case, we're building this server for another organization, so we couldn't send the patched box out without violating the GPL.)
Oh, and in case anyone is thinking I somehow threw the test, realize that I was rooting for ZFS from the start. I only did the benchmark when it so completely failed to perform under load. ZFS is beautiful tech. Too bad it doesn't play well with others.
Phoronix reported that http://kqinfotech.com/ would release some form of ZFS for the kernel but I have found nothing.
What a total cock-up that was.
Here we had this random company no one had ever heard from before putting out a press release that they *will be releasing* something in a few months.
Maybe it's easy to say this 6 months hence and we're all sitting here listening to the crickets, but I called it at the time: Phoronix should have tossed that press release into the trash, or at least held off on saying anything about it until something actually shipped. Reporting a clearly BS press release, seriously? Are they *trying* to destroy their credibility?
On 04/04/11 8:09 PM, Warren Young wrote:
On 4/2/2011 2:54 PM, Dawid Horacio Golebiewski wrote:
I do want to use ZFS and I thus far I have only found information about the ZFS-Fuse implementation and unclear hints that there is another way.
Here are some benchmark numbers I came up with just a week or two ago. (View with fixed-width font.)
try iozone and bonnie++
On 4/2/2011 2:54 PM, Dawid Horacio Golebiewski wrote: You might be asking why I didn't choose to make a ~19 TB RAID-5 volume for the native 3ware RAID test
That is really a no-brainer. In the time it takes to re-build such a "RAID", another disk might just fail and the "R in "RAID" goes down the toilet. Your 19-disk RAID5 just got turned into 25kg of scrap-metal.
As for ZFS - we're using it with FreeBSD with mixed results. The truth is, you've got to follow the development very closely and work with the developers (via mailinglists), potentially testing patches/backports from current - or tracking current from the start. It works much better with Solaris. Frankly, I don't know why people want to do this ZFS on Linux thing. It works perfectly well with Solaris, which runs most stuff that runs on Linux just as well. I wouldn't try to run Linux-binaries on Solaris with lxrun, either.
On Monday, April 04, 2011 11:09:29 PM Warren Young wrote:
I did this test with Bonnie++ on a 3ware/LSI 9750-8i controller, with eight WD 3 TB disks attached. Both tests were done with XFS on CentOS 5.5, 32-bit. (Yes, 32-bit. Hard requirement for this application.)
[snip]
For the RAID-6 configuration, I used the 3ware card's hardware RAID, creating a single ~16 TB volume, formatted XFS.
[snip]
Dropping to 16.37 TB on the RAID configuration by switching to RAID-6 let us put almost the entire array under a single 16 TB XFS filesystem.
You really, really, really don't want to do this. Not on 32-bit. When you roll one byte over 16TB you will lose access to your filesystem, silently, and it will not remount on a 32-bit kernel. XFS works best on a 64-bit kernel for a number of reasons; the one you're likely to hit first is the 16TB hard limit for *occupied* file space; you can mkfs an XFS filesystem on a 17TB or even larger partition or volume, but the moment the occupied data rolls over the 16TB boundary you will be in disaster recovery mode, and a 64-bit kernel will be required for rescue.
The reason I know this? I had it happen. On a CentOS 32-bit backup server with a 17TB LVM logical volume on EMC storage. Worked great, until it rolled 16TB. Then it quit working. Altogether. /var/log/messages told me that the filesystem was too large to be mounted. Had to re-image the VM as a 64-bit CentOS, and then re-attached the RDM's to the LUNs holding the PV's for the LV, and it mounted instantly, and we kept on trucking.
There's a reason upstream doesn't do XFS on 32-bit.
On Tue, Apr 5, 2011 at 10:21 AM, Lamar Owen lowen@pari.edu wrote:
You really, really, really don't want to do this. Not on 32-bit. When you roll one byte over 16TB you will lose access to your filesystem, silently, and it will not remount on a 32-bit kernel. XFS works best on a 64-bit kernel for a number of reasons; the one you're likely to hit first is the 16TB hard limit for *occupied* file space; you can mkfs an XFS filesystem on a 17TB or even larger partition or volume, but the moment the occupied data rolls over the 16TB boundary you will be in disaster recovery mode, and a 64-bit kernel will be required for rescue.
The reason I know this? I had it happen. On a CentOS 32-bit backup server with a 17TB LVM logical volume on EMC storage. Worked great, until it rolled 16TB. Then it quit working. Altogether. /var/log/messages told me that the filesystem was too large to be mounted. Had to re-image the VM as a 64-bit CentOS, and then re-attached the RDM's to the LUNs holding the PV's for the LV, and it mounted instantly, and we kept on trucking.
There's a reason upstream doesn't do XFS on 32-bit.
Afaik 32-bit binaries do run on the 64-bit build and compat libraries exist for most everything. You should evaluate if you really *really* need 32-bit.
On 4/5/2011 11:24 AM, Brandon Ooi wrote:
Afaik 32-bit binaries do run on the 64-bit build and compat libraries exist for most everything. You should evaluate if you really *really* need 32-bit.
Yes, thanks for assuming I don't know what I was talking about when I wrote that we had a hard requirement for 32-bit in this application.
Since you seem to care, we're stuck with 32-bit for this particular server because it needs to use an uncommon PCI card that does have Linux drivers but they only work with 32-bit kernels. The driver will rebuild against a 64-bit kernel, but it oopses it when you try to use it. The card is a legacy design, so no one has bothered to do debug this, and likely no one ever will.
And before you ask, no, there is no direct replacement for this PCI card that does support 64-bit kernels. The path forward is to use an entirely different technology, which is great, but using it requires changing physical infrastructure ($$$) that the server plugs into.
Legacy is hard. Next time someone tells you they can't use the latest and greatest for some reason, you might take them at their word.
On 6.4.2011 17:27, Warren Young wrote:
On 4/5/2011 11:24 AM, Brandon Ooi wrote:
Afaik 32-bit binaries do run on the 64-bit build and compat libraries exist for most everything. You should evaluate if you really *really* need 32-bit.
Yes, thanks for assuming I don't know what I was talking about when I wrote that we had a hard requirement for 32-bit in this application.
Since you seem to care, we're stuck with 32-bit for this particular server because it needs to use an uncommon PCI card that does have Linux drivers but they only work with 32-bit kernels. The driver will rebuild against a 64-bit kernel, but it oopses it when you try to use it. The card is a legacy design, so no one has bothered to do debug this, and likely no one ever will.
And before you ask, no, there is no direct replacement for this PCI card that does support 64-bit kernels. The path forward is to use an entirely different technology, which is great, but using it requires changing physical infrastructure ($$$) that the server plugs into.
Legacy is hard. Next time someone tells you they can't use the latest and greatest for some reason, you might take them at their word. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hi,
Just a shot in the dark....but can't you have a x86_64 NFS server export a fs larger then 16TB and mount that on your x86 machine for use with your application?
Bgrds, FOG
On 4/6/2011 11:40 AM, Finnur Örn Guðmundsson wrote:
Just a shot in the dark....but can't you have a x86_64 NFS server export a fs larger then 16TB and mount that on your x86 machine for use with your application?
I already ran the two-server idea past the decision makers. It was rejected, even though this server I just built is going to replace an existing one purely to add the extra storage, and so it could have just acted as a storage side-car to the existing server instead. I don't know whether it was rejected on rack space limits or "cleanliness" or whatever. All I got was, "no".
On 04/06/11 11:08 AM, Warren Young wrote:
I already ran the two-server idea past the decision makers. It was rejected, even though this server I just built is going to replace an existing one purely to add the extra storage, and so it could have just acted as a storage side-car to the existing server instead. I don't know whether it was rejected on rack space limits or "cleanliness" or whatever. All I got was, "no".
then the decision making process is faulty. 32bit kernels are in no way capable of sanely supporting giant file systems like you have. so you're 'musts' are ... 1) legacy application requires 32 bit kernel drivers 2) giant file system which requires 64 bit kernel. that won't be easy to resolve without a separate storage server or NAS.
centos-bounces@centos.org wrote:
On 6.4.2011 17:27, Warren Young wrote:
On 4/5/2011 11:24 AM, Brandon Ooi wrote:
Afaik 32-bit binaries do run on the 64-bit build and compat libraries exist for most everything. You should evaluate if you really *really* need 32-bit.
Yes, thanks for assuming I don't know what I was talking about when I wrote that we had a hard requirement for 32-bit in this application.
Since you seem to care, we're stuck with 32-bit for this particular server because it needs to use an uncommon PCI card that does have Linux drivers but they only work with 32-bit kernels.
Don't scream: I'm using RedHat 7.3 for related reasons.
Legacy is hard. Next time someone tells you they can't use the latest and greatest for some reason, you might take them at their word.
Worthy note.
Just a shot in the dark....but can't you have a x86_64 NFS server export a fs larger then 16TB, and mount that on your x86 machine for use with your application?
Bgrds, FOG
If a separate server can't be done, can a 64-bit system KVM-host a 32-bit OS to handle just that board and its application?
Insert spiffy .sig here: Life is complex: it has both real and imaginary parts.
//me ******************************************************************* This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept for the presence of computer viruses. www.Hubbell.com - Hubbell Incorporated**
On 4/6/2011 12:25 PM, Brunner, Brian T. wrote:
Don't scream: I'm using RedHat 7.3 for related reasons.
Yep, we've still got a bunch of those running in the field, too, and many older besides. We still build new boxes using CentOS 3, also for legacy compatibility reasons.
can a 64-bit system KVM-host a 32-bit OS to handle just that board and its application?
It's something we could try on the next server, but I don't hold much hope that it would actually work.
I know KVM has the VirtIO feature for paravirtualizing a block or network device into the guest space, but that doesn't help here. The /dev nodes for this device are character devices.
Additionally, this device doesn't fit into any of the standard categories; it isn't a network card, it isn't a disk controller, it isn't a VGA card, etc. VM systems tend to cope poorly with hardware that doesn't fit predefined categories.
On Wednesday, April 06, 2011 01:27:10 PM Warren Young wrote:
Legacy is hard. Next time someone tells you they can't use the latest and greatest for some reason, you might take them at their word.
Yes, it is.
To give another for instance, we do work with some interfaces for telescopes (optical and radio) where for insurance and assurance reasons, and the fact that a PE's seal is on the prints, a tested, tried, and proven ISA interface card is a requirement.
Ever try to find a Pentium 4 or better motherboard with ISA slots?
It's easy to say 'just go PCI' but that's not as easy as it sounds. In fact, for power supplies for the PC's into which this ISA card (which is a custom A/D, D/A, encoder input, counter, and DIO card rolled into one) I have to be careful to pick ones that have enough -5V current capability. Yes, negative 5 volts. That's only used by ISA cards, right? Well.....
Do you know how many ATX12V power supplies don't even *have* a -5V output? Most of them, it seems.
On 4/5/2011 11:21 AM, Lamar Owen wrote:
Dropping to 16.37 TB on the RAID configuration by switching to RAID-6 let us put almost the entire array under a single 16 TB XFS filesystem.
You really, really, really don't want to do this.
Actually, it seems that you can't do it any more. I tried, just to see what would happen. (I already knew about all you're talking about.) You are still able to convince gparted to create a > 16 TB partition on 32-bit CentOS 5, but when you go to format it, it gives some bogus error that doesn't tell you what actually went wrong. Repartition to get under 16 TB, and the mkfs step succeeds.
I expect they added some checks for this since you last tried XFS on 32-bit.
Perhaps it wasn't clear from what I wrote, but the big partition on this system is actually 15.9mumble TB, just to be sure we don't even get 1 byte over the limit. The remaining 1/3 TB is currently unused.
On Wednesday, April 06, 2011 01:16:19 PM Warren Young wrote:
I expect they added some checks for this since you last tried XFS on 32-bit.
Perhaps it wasn't clear from what I wrote, but the big partition on this system is actually 15.9mumble TB, just to be sure we don't even get 1 byte over the limit. The remaining 1/3 TB is currently unused.
I didn't get there in one step. Perhaps that's the difference. What you say in the last paragraph will prevent the effect I saw. Just hope you don't need to do an xfs_repair. No, it wasn't completely clear that you were keeping below 16TB from what you wrote, at least not to me.
Now, I didn't do mkfs on a 16.xTB disk initially; I got there in steps with LVM, lvextend, and xfs_growfs. The starting size of the filesystem was ~4TB in two ~2TB LUNs/PV's; VMware is limited to 2TB LUNs, so I added storage, as needed, in ~2TB chunks (actually did 2,000GB chunks; pvscan reports these as 1.95TB (with some at 1.92TB for RAID group setup reasons). The 1.32TB and 1.37TB LUNs are there due to the way the RAID groups on this Clariion CX3-10c this is on are set up. So after a while of doing this, I had a hair over 14TB; xfs_growfs going from 14TB to a hair over 16TB didn't complain. But when the data hit 16TB, it quit mounting. So I migrated to a C5 x86_64 VM, and things started working again. I've added one more 1.95TB PV to the VG since then.
Current setup: PV /dev/sdd1 VG pachy-mirror lvm2 [1.92 TB / 0 free] PV /dev/sdg1 VG pachy-mirror lvm2 [1.92 TB / 0 free] PV /dev/sde1 VG pachy-mirror lvm2 [1.95 TB / 0 free] PV /dev/sdu1 VG pachy-mirror lvm2 [1.95 TB / 0 free] PV /dev/sdl1 VG pachy-mirror lvm2 [1.37 TB / 0 free] PV /dev/sdm1 VG pachy-mirror lvm2 [1.32 TB / 0 free] PV /dev/sdx1 VG pachy-mirror lvm2 [1.95 TB / 0 free] PV /dev/sdz1 VG pachy-mirror lvm2 [1.95 TB / 0 free] PV /dev/sdab1 VG pachy-mirror lvm2 [1.95 TB / 0 free] PV /dev/sdt1 VG pachy-mirror lvm2 [1.95 TB / 0 free] ACTIVE '/dev/pachy-mirror/home' [18.24 TB] inherit
The growth was over a period of two years, incidentally.
There are other issues with XFS and 32-bit; see: http://bugs.centos.org/view.php?id=3364 and http://www.mail-archive.com/scientific-linux-users@listserv.fnal.gov/msg0534... and google for 'XFS 32-bit 4K stacks' for more of the gory details.
On 4/6/2011 1:16 PM, Lamar Owen wrote:
There are other issues with XFS and 32-bit; see: http://bugs.centos.org/view.php?id=3364 and http://www.mail-archive.com/scientific-linux-users@listserv.fnal.gov/msg0534... and google for 'XFS 32-bit 4K stacks' for more of the gory details.
Thanks for the info.
The problem seems to be tied to LVM and high amounts of I/O, particularly writes. None of that applies to this application. The filesystem is a plain-old partition, the array is mostly going to be read-only, and due to a bottleneck elsewhere in the system, the peak read rate can't be higher than 20 Mbyte/s.
(If you're wondering, then, why I bothered to benchmark the system at all, it's because we get much higher I/O rates when initially loading the array up, so that the low write speed of ZFS-FUSE would have increased that initial load from days to weeks.)
That load went off without a hitch, so we've probably already done the worst thing to this server that it will ever see. (Kind of like the old advice for happiness: eat a bug every morning, and nothing worse will happen to you for the rest of the day.)
We'll test it under load before shipping it anyway, however.
On 4/2/11 10:54 PM, Dawid Horacio Golebiewski wrote:
I have trouble finding definitive information about this. I am considering the use of SME 7.5.1 (centOS based) for my server needs, but I do want to use ZFS and I thus far I have only found information about the ZFS-Fuse implementation and unclear hints that there is another way. Phoronix reported that http://kqinfotech.com/ would release some form of ZFS for the kernel but I have found nothing.
Can so. tell me if fuse-ZFS is more trouble than it's worth?
Thanks in adv.
Dawide
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
if you need ZFS I suggest you to try out FreeBSD, where ZFS has native support. FreeBSD also is an excellent OS, even better than Linux.