Francois Caen <frcaen at gmail.com> wrote: > Wow! Odd! RH says 8TB but ext3 FAQ says 4TB. Any filesystem originally designed for 32-bit x86 is full of signed 32-bit structures. The 2^31 * 512 = 1.1TB (1TiB) limit comes from those structures using a 512 sector size. Ext3 has used a couple of different techniques to allow larger and larger support. Depending on the hardware, kernel (especially 2.4), etc..., there can be limits at 1, 2, 4, 8 and 16TiB. Which is why the "common denominator" is 1.1TB (1TiB). It was rather enfuriating in front of a client when I attempted to mount one 2TB Ext3 volume over a SAN created by one Red Hat Enterprise Linux from another. For the "heck of it" -- I created a 1TB and tried again ... it worked. ReiserFS 3 has the same issue, it grew up as a PC LBA32 filesystem. ReiserFS 4 is supposed 64-bit clean. Although JFS for Linux came from OS/2 (32-bit PC) and not AIX/Power (true 64-bit), it was designed to be largely "64-bit clean" too. XFS came from Irix/MIPS4000+ (true 64-bit). Both JFS and XFS would _not_ work on 32-bit Linux until patched with complete POSIX32 Large File Support (LFS). LFS became standard in the x86 target Linux kernel 2.4 and GLibC 2.2 (Red Hat Linux 7.x / Red Hat Enterprise Linux 2.1). > Joshua, thanks for the reply on this. > There's something kludgy about having to do softraid across > 2 partitions before formatting. RAID-0 is an _ideal_ software RAID. Striping is best handled by the OS, which can schedule over multiple I/O options. In 2x and 4x S940 Opteron systems with at least one AMD8131 (dual PCI-X channels), I put a 3Ware card on each PCI-X channel connected to the same CPU and stripe with LVM. The CPU interlaces writes directly over two (2) PCI-X channels to two (2) 3Ware cards. Ultimate I/O affinity, no bus arbitration overhead, etc..., as well as the added performance of striping. The only negative is if one 3Ware card dies. But that's why I keep a spare per N servers (typically 1 for every 4 servers, 8 cards total). > It adds a layer of complexity and reduces reliability. That varies. Yes, various kernel storage approaches -- especially LVM2/Device Manager (DM) at this point -- have race conditions if you use more than one operation. E.g., resizing and snapshots, RAID-1 (DM) atop of RAID-0, etc... But I _only_ use LVM/LVM2 with its native RAID-0 stripe, and across two (2) 3Ware cards. I've yet to have an issue. But that's probably because LVM2 doesn't require DM for RAID-0. DM is required for RAID-1, snapshots, FRAID meta-data, etc... Joshua Baker-LePain <jlb17 at duke.edu> wrote: > I wouldn't call it that odd. RH patches their kernels to a > fair extent, both for stability and features. Yep. They are _very_ well trusted. Now if they'd put that into XFS too, I'd be a happy camper. > > > mke2fs -b 4096 -j -m 0 -R stride=1024 -T largefile4 > > > /dev/md0 BTW, aren't you worried about running out of inodes? At the same time, have you benchmarked how much faster a full fsck takes using 1 inode per 4MiB versus the standard 16-64KiB? That would be an interesting test IMHO. > Err, it's not a kludge and it's not a trick. Those 2 > "disks" are hardware RAID5 arrays from 2 12 port 3ware 9500 > cards. I like 3ware's hardware RAID, and those are the > biggest (in terms of ports) cards 3ware makes. > So, I hook 12 disks up to each card, and the OS sees those > as 2 SCSI disks. I then do the software RAID to get 1) > speed and 2) one partition to present to the users. Folks > (myself included) have been doing this for years. I am in total agreeance with you, with one exception. I always make 2 volumes (one System, one Data) per card (yes, I'm aware of the 9.2 firmware bug, hence why I have avoided the 9500S largely, although 9.2.1.1 seems promising now that it's officially released). So in my case, I'd have two RAID-0 stripes. BTW, supposedly 3Ware supports volumes across up to 4 cards. Have you tried this? I have not myself. > The one gotcha in this setup (other than not being able to > boot from the big RAID5 arrays, since each is >2TiB) Another reason to create a "System" volume and a "Data" volume. My "System" volume is typically 2/4 drives in RAID-1/10. My "Data" volume is typically RAID-5, or if I really need performance, RAID-10. > is that the version of mdadm shipped with RHEL4 does not > support array members bigger than 2TiB. I had > to upgrade to an upstream release to get that support. Which is why I use LVM (and now LVM2) for RAID-0. I know there are claims it is slower than MD (at least LVM2), but I just like the management of LVM. I guess I'm typical of a commercial UNIX wennie. Chris Mauritz <chrism at imntv.com> wrote: > For what it's worth, I have also done RAID0 stripes of 2 > raid arrays to get *really* fast read/write performance > when used for storing uncompressed video. Recently, when > I was at Apple for a meeting, that was their engineer's > preferred method for getting huge RAIDs....running > software RAID volumes across multiple Xserve-RAID devices Software RAID-0 at the OS level (and not some FRAID driver) is _always_ going to be the _ultimate_ because you can span peripheral interconnects and cards. > Perhaps I'm just extremely lucky, but I've not run into > this magic 1TB barrier that I see bandied about here. As I said, I've just ran into it on kernel 2.4 distributions. Any filesystem that grows up on a POSIX32 implementation (especially pre-kernel 2.4 / GLibC 2.2 before LFS was standard) is going to have signed 32-bit int structures. I'm sure Tweedie and the gang have gotten around all of them in kernel 2.6 now. But at the same time, I don't trust how they are doing. > Unfortunately, a lot of the documentation and FAQs are > quite out of date which can lead to some confusion. Yeah. LVM2 and Device Mapper (DM) are a real PITA if you start playing with newer developments, and race conditions seem to be never-ending. But when it comes to using intelligent 3Ware RAID with just LVM2 for RAID-0, it has worked flawlessly for myself on kernel 2.6. -- Bryan J. Smith | Sent from Yahoo Mail mailto:b.j.smith at ieee.org | (please excuse any http://thebs413.blogspot.com/ | missing headers)