[CentOS] suggestions for large filesystem server setup (n * 100 TB)

Fri Feb 28 15:15:41 UTC 2014
Lamar Owen <lowen at pari.edu>

On 02/28/2014 08:15 AM, Götz Reinicke - IT Koordinator wrote:
> ...
> Big in that context is up to couple of 100 TB may be.
> ...
>  From my POV I could e.g. use hardware raidboxes, and use LVM and
> filesystem growth options to extend the final share, but what if one of
> the boxes crash totally? The whole Filesystem would be gone.
> ...
> So how do you handle big filesystems/storages/shares?

We handle it with EMC Clariion fibre channel units, and LVM.  Always 
make your PV's relatively small, use RAID6 on the array, and mirror it, 
either at the SAN level with MirrorView or at the OS level, using LUNs 
as mdraid components for the PV's.  Today that would become a set of VNX 
systems with SAS on the backend and iSCSI on the front end, but, well, a 
SAN is a SAN.  Surplus FC HBA's are cheap; licenses won't be, nor will 
the array.  But how valuable is that data, or, more to the point, if you 
were to lose all of that 100TB what would it cost you to recreate it?

With this size of data, rolling-your-own should be the last resort, and 
only if you can't afford something properly engineered for high 
availability, like basically anything from NetApp, Nimble, or EMC (among 
others; those are the first three off the top of my head). The value-add 
with these three (among others) is the long track record of reliability 
and the software management tools that make life so much easier when a 
drive or other component inevitably fails.

An enterprise-grade SAN or NAS from a serious vendor is going to cost 
serious money, but you do get what you pay for, again primarily on the 
software side.  Our four Clariions (two CX3-10c's, one CX3-80, and a 
CX4-480) just simply don't go down, and upgrades are very easy and 
reliable, in my experience.  The two CX3-10c's have been online 
continually since mid-2007, and while they are way out of warranty, past 
the normal service life, even, they just run and run and run and run.  
(I even used the redundancy features in one of them to good effect while 
(slowly and carefully!) moving the array from one room to another.... 
long extension power cords, and long fiber jumpers worked to my 
advantage; of course, a stable rack on wheels made it possible.  The 
array stayed up, and no servers lost connectivity to the SAN during the 
move, not that I would recommend it for normal operations, but this 
wasn't a normal operation.)  The storage processor sends alerts when 
drives fault, and a drive fault is an easy hotswap with the DAE and the 
drive clearly identified at the front of the array.  Everything (drives, 
power supplies, storage processor modules, LCC's) except a whole DAE or 
storage processor enclosure is hotswap, and I haven't had a DAE fault 
yet that required pulling the whole DAE out of service.

If you do roll-your-own, do not use consumer-class drives.  One reason 
NetApp and the rest charge so much for drives is due to the extra 
testing and sometimes the custom firmware that goes into the drives (in 
a nutshell, you do NOT want the drive doing its own error recovery, 
that's the array storage processor's job!).

Those are my opinions and experience.  YMMV.