[CentOS] XFS on a 25 TB device

Wed Sep 29 18:53:07 UTC 2010

On Wednesday, September 29, 2010 01:25:11 pm Peter Kjellstrom wrote:
> You are a bit mistaken. The raid controller does not "copy data around as it 
> sees fit". It stores data on each disk in chunk-size'ed pieces. It then 
> stripes this across all drives giving you a stripe-size'ed piece of chunk 
> size times the number of data drives.

[Snip math]

> Then again, for other workloads the effect could be insignificant. YMMV.

For a simple RAID controller I can see some benefit.  

However, in my case the 'RAID controller' is on SAN, consisting of three EMC Clariion arrays: a CX3-10c, a CX3-80, and a CX700.  The EMC Navisphere/Unisphere tools allow LUN migration across RAID groups; I could very well take a LUN from a RAID1/0 with 16 drives to a RAID5 with 9 drives to a RAID6 with 10 drives to a RAID6 with 16 drives and have different stripe sizes.  Further, since this is all being accessed through VMware ESX, I'm limited to 2TB LUNs anyway, even using raw device mappings, which I do, but for a different reason; LVM to the rescue to get this:
[root at backup-rdc ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       37G   18G   18G  50% /
/dev/sda1              99M   26M   69M  28% /boot
/dev/mapper/dasch--backup-volume1
                       21T   19T  2.6T  88% /opt/backups
tmpfs                1006M     0 1006M   0% /dev/shm
/dev/mapper/dasch--rdc-cx3--80
                       23T   19T  4.2T  82% /opt/dasch-rdc
[root at backup-rdc ~]# 

Yeah, the output of pvscan is pretty long (it has been longer, and seeing things like /dev/sdak1 is strange....).

Using XFS at the moment.  The two volume groups are on two different arrays; one is on the CX700 and the other on the CX3-80, and they're physically separated at two locations on-campus, with single-mode 4Gb/s FC ISL's between switches.  They're soon to be connected to different VMware ESX hosts; the dual fibre-channel connect was so the initial sync time would be reasonable.  

I looked through all the performance optimization howtos for XFS that I could find, but then realized how futile that would be with these 'RAID controllers' and their massive caches (our CX3-80 SP's have 8GB of RAM each; the shared write cache and the variable-sized read cache, which I have set up for a rather large size on our CX3-80: 3GB on each SP for read, and 2GB for write; the CX700 has 4GB (actually 3968MB) split 1GB read 2GB write); the benchmarks that I did (that I can't release due to both EMC and VMware's EULAs' prohibitions) showed that the performance differences with alignment versus without were insignificant with these 'RAID controllers'.

But for something inside the server, like a 3ware 9500 or similar, it might be worthwhile to align to stripe size, since that is a fixed constant for the logical drives that controller exports.

And Peter is very right: YMMV depending upon workload.  Our load for this system is, as can be inferred from the name of the machine, backups of a raw data set that are processed once and then archived.  I/O's per second isn't even on the radar for this workload; throughput, on the other hand, is.  And man these Clariions are fast.