[CentOS] suggestions for large filesystem server setup (n * 100 TB)

Fri Feb 28 17:35:54 UTC 2014
James A. Peltier <jpeltier at sfu.ca>

----- Original Message -----
| Hi,
| 
| over time the requirements and possibilities regarding filesystems
| changed for our users.
| 
| currently I'm faced with the question:
| 
| What might be a good way to provide one big filesystem for a few
| users
| which could also be enlarged; backuping the data is not the question.
| 
| Big in that context is up to couple of 100 TB may be.
| 
| O.K. I could install one hardware raid with e.g. N big drives format
| with xfs. And export one big share. Done.
| 
| On the other hand, e.g. using 60 4 TB Disks in one storage would be a
| lot of space, but a nightmare in rebuilding on a disk crash.
| 
| Now if the share fills up, my users "complain", that they usually get
| a
| new share (what is a new raidbox).
| 
| From my POV I could e.g. use hardware raidboxes, and use LVM and
| filesystem growth options to extend the final share, but what if one
| of
| the boxes crash totally? The whole Filesystem would be gone.
| 
| hm.
| 
| So how do you handle big filesystems/storages/shares?
| 
| 	Regards . Götz

My personal view is that you don't want any single machine to contain a 100TB file system.  You'd be best served using a distributed file system such as GlusterFS or Lustre.  If you insist on having a single machine with a 100TB file system on it, make sure that you install at least 300GB of memory or more if you think you'll ever have to perform a file system check on it.  You're going to need it.

Note, it's that that difficult or expensive to build a supermicro box with 48 x 4TB drives to scale out the size that you need with GlusterFS, however, building it is the easiest part.  It's maintaining it and troubleshooting it when things go wrong.  Choosing a platform to support also depends on I/O access patterns, number of clients, connectivity (IB vs Ethernet vs iSCSI/FC/AoE,etc).

Currently we're not using any clustered file system for our data access.  We have a single NFS machine which is the "front-end" to the data.  It contains a whole bunch of symlinks to other NFS servers (Dell R720XD/36TB each) which the machines automount.  This is really simple to maintain and if we want to do replication on a per volume level we can.  We are looking into GlusterFS though for certain things.

-- 
James A. Peltier
Manager, IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.sfu.ca/itservices

"Around here, however, we don’t look backwards for very long.  We KEEP MOVING FORWARD, opening up new doors and doing things because we’re curious and curiosity keeps leading us down new paths." - Walt Disney