[CentOS] suggestions for large filesystem server setup (n * 100 TB)

Fri Feb 28 17:35:54 UTC 2014
James A. Peltier <jpeltier at sfu.ca>

| Hi,
| over time the requirements and possibilities regarding filesystems
| changed for our users.
| currently I'm faced with the question:
| What might be a good way to provide one big filesystem for a few
| users
| which could also be enlarged; backuping the data is not the question.
| Big in that context is up to couple of 100 TB may be.
| O.K. I could install one hardware raid with e.g. N big drives format
| with xfs. And export one big share. Done.
| On the other hand, e.g. using 60 4 TB Disks in one storage would be a
| lot of space, but a nightmare in rebuilding on a disk crash.
| Now if the share fills up, my users "complain", that they usually get
| a
| new share (what is a new raidbox).
| From my POV I could e.g. use hardware raidboxes, and use LVM and
| filesystem growth options to extend the final share, but what if one
| of
| the boxes crash totally? The whole Filesystem would be gone.
| hm.
| So how do you handle big filesystems/storages/shares?
| 	Regards . Götz

My personal view is that you don't want any single machine to contain a 100TB file system.  You'd be best served using a distributed file system such as GlusterFS or Lustre.  If you insist on having a single machine with a 100TB file system on it, make sure that you install at least 300GB of memory or more if you think you'll ever have to perform a file system check on it.  You're going to need it.

Note, it's that that difficult or expensive to build a supermicro box with 48 x 4TB drives to scale out the size that you need with GlusterFS, however, building it is the easiest part.  It's maintaining it and troubleshooting it when things go wrong.  Choosing a platform to support also depends on I/O access patterns, number of clients, connectivity (IB vs Ethernet vs iSCSI/FC/AoE,etc).

Currently we're not using any clustered file system for our data access.  We have a single NFS machine which is the "front-end" to the data.  It contains a whole bunch of symlinks to other NFS servers (Dell R720XD/36TB each) which the machines automount.  This is really simple to maintain and if we want to do replication on a per volume level we can.  We are looking into GlusterFS though for certain things.

