----- Original Message ----- | Hi, | | over time the requirements and possibilities regarding filesystems | changed for our users. | | currently I'm faced with the question: | | What might be a good way to provide one big filesystem for a few | users | which could also be enlarged; backuping the data is not the question. | | Big in that context is up to couple of 100 TB may be. | | O.K. I could install one hardware raid with e.g. N big drives format | with xfs. And export one big share. Done. | | On the other hand, e.g. using 60 4 TB Disks in one storage would be a | lot of space, but a nightmare in rebuilding on a disk crash. | | Now if the share fills up, my users "complain", that they usually get | a | new share (what is a new raidbox). | | From my POV I could e.g. use hardware raidboxes, and use LVM and | filesystem growth options to extend the final share, but what if one | of | the boxes crash totally? The whole Filesystem would be gone. | | hm. | | So how do you handle big filesystems/storages/shares? | | Regards . Götz
My personal view is that you don't want any single machine to contain a 100TB file system. You'd be best served using a distributed file system such as GlusterFS or Lustre. If you insist on having a single machine with a 100TB file system on it, make sure that you install at least 300GB of memory or more if you think you'll ever have to perform a file system check on it. You're going to need it.
Note, it's that that difficult or expensive to build a supermicro box with 48 x 4TB drives to scale out the size that you need with GlusterFS, however, building it is the easiest part. It's maintaining it and troubleshooting it when things go wrong. Choosing a platform to support also depends on I/O access patterns, number of clients, connectivity (IB vs Ethernet vs iSCSI/FC/AoE,etc).
Currently we're not using any clustered file system for our data access. We have a single NFS machine which is the "front-end" to the data. It contains a whole bunch of symlinks to other NFS servers (Dell R720XD/36TB each) which the machines automount. This is really simple to maintain and if we want to do replication on a per volume level we can. We are looking into GlusterFS though for certain things.