Francois Caen frcaen@gmail.com wrote:
Wow! Odd! RH says 8TB but ext3 FAQ says 4TB.
Any filesystem originally designed for 32-bit x86 is full of signed 32-bit structures. The 2^31 * 512 = 1.1TB (1TiB) limit comes from those structures using a 512 sector size.
Ext3 has used a couple of different techniques to allow larger and larger support. Depending on the hardware, kernel (especially 2.4), etc..., there can be limits at 1, 2, 4, 8 and 16TiB.
Which is why the "common denominator" is 1.1TB (1TiB). It was rather enfuriating in front of a client when I attempted to mount one 2TB Ext3 volume over a SAN created by one Red Hat Enterprise Linux from another. For the "heck of it" -- I created a 1TB and tried again ... it worked.
ReiserFS 3 has the same issue, it grew up as a PC LBA32 filesystem. ReiserFS 4 is supposed 64-bit clean. Although JFS for Linux came from OS/2 (32-bit PC) and not AIX/Power (true 64-bit), it was designed to be largely "64-bit clean" too. XFS came from Irix/MIPS4000+ (true 64-bit).
Both JFS and XFS would _not_ work on 32-bit Linux until patched with complete POSIX32 Large File Support (LFS). LFS became standard in the x86 target Linux kernel 2.4 and GLibC 2.2 (Red Hat Linux 7.x / Red Hat Enterprise Linux 2.1).
Joshua, thanks for the reply on this. There's something kludgy about having to do softraid across 2 partitions before formatting.
RAID-0 is an _ideal_ software RAID. Striping is best handled by the OS, which can schedule over multiple I/O options. In 2x and 4x S940 Opteron systems with at least one AMD8131 (dual PCI-X channels), I put a 3Ware card on each PCI-X channel connected to the same CPU and stripe with LVM. The CPU interlaces writes directly over two (2) PCI-X channels to two (2) 3Ware cards. Ultimate I/O affinity, no bus arbitration overhead, etc..., as well as the added performance of striping.
The only negative is if one 3Ware card dies. But that's why I keep a spare per N servers (typically 1 for every 4 servers, 8 cards total).
It adds a layer of complexity and reduces reliability.
That varies. Yes, various kernel storage approaches -- especially LVM2/Device Manager (DM) at this point -- have race conditions if you use more than one operation. E.g., resizing and snapshots, RAID-1 (DM) atop of RAID-0, etc... But I _only_ use LVM/LVM2 with its native RAID-0 stripe, and across two (2) 3Ware cards.
I've yet to have an issue. But that's probably because LVM2 doesn't require DM for RAID-0. DM is required for RAID-1, snapshots, FRAID meta-data, etc...
Joshua Baker-LePain jlb17@duke.edu wrote:
I wouldn't call it that odd. RH patches their kernels to a fair extent, both for stability and features.
Yep. They are _very_ well trusted. Now if they'd put that into XFS too, I'd be a happy camper.
mke2fs -b 4096 -j -m 0 -R stride=1024 -T largefile4 /dev/md0
BTW, aren't you worried about running out of inodes?
At the same time, have you benchmarked how much faster a full fsck takes using 1 inode per 4MiB versus the standard 16-64KiB?
That would be an interesting test IMHO.
Err, it's not a kludge and it's not a trick. Those 2 "disks" are hardware RAID5 arrays from 2 12 port 3ware 9500 cards. I like 3ware's hardware RAID, and those are the biggest (in terms of ports) cards 3ware makes. So, I hook 12 disks up to each card, and the OS sees those as 2 SCSI disks. I then do the software RAID to get 1) speed and 2) one partition to present to the users. Folks (myself included) have been doing this for years.
I am in total agreeance with you, with one exception. I always make 2 volumes (one System, one Data) per card (yes, I'm aware of the 9.2 firmware bug, hence why I have avoided the 9500S largely, although 9.2.1.1 seems promising now that it's officially released). So in my case, I'd have two RAID-0 stripes.
BTW, supposedly 3Ware supports volumes across up to 4 cards. Have you tried this? I have not myself.
The one gotcha in this setup (other than not being able to boot from the big RAID5 arrays, since each is >2TiB)
Another reason to create a "System" volume and a "Data" volume. My "System" volume is typically 2/4 drives in RAID-1/10. My "Data" volume is typically RAID-5, or if I really need performance, RAID-10.
is that the version of mdadm shipped with RHEL4 does not support array members bigger than 2TiB. I had to upgrade to an upstream release to get that support.
Which is why I use LVM (and now LVM2) for RAID-0. I know there are claims it is slower than MD (at least LVM2), but I just like the management of LVM. I guess I'm typical of a commercial UNIX wennie.
Chris Mauritz chrism@imntv.com wrote:
For what it's worth, I have also done RAID0 stripes of 2 raid arrays to get *really* fast read/write performance when used for storing uncompressed video. Recently, when I was at Apple for a meeting, that was their engineer's preferred method for getting huge RAIDs....running software RAID volumes across multiple Xserve-RAID devices
Software RAID-0 at the OS level (and not some FRAID driver) is _always_ going to be the _ultimate_ because you can span peripheral interconnects and cards.
Perhaps I'm just extremely lucky, but I've not run into this magic 1TB barrier that I see bandied about here.
As I said, I've just ran into it on kernel 2.4 distributions.
Any filesystem that grows up on a POSIX32 implementation (especially pre-kernel 2.4 / GLibC 2.2 before LFS was standard) is going to have signed 32-bit int structures.
I'm sure Tweedie and the gang have gotten around all of them in kernel 2.6 now. But at the same time, I don't trust how they are doing.
Unfortunately, a lot of the documentation and FAQs are quite out of date which can lead to some confusion.
Yeah. LVM2 and Device Mapper (DM) are a real PITA if you start playing with newer developments, and race conditions seem to be never-ending.
But when it comes to using intelligent 3Ware RAID with just LVM2 for RAID-0, it has worked flawlessly for myself on kernel 2.6.