I've been asked for ideas on building a rather large archival storage system for inhouse use, on the order of 100-400TB. Probably using CentOS 6. The existing system this would replace is using Solaris 10 and ZFS, but I want to explore using Linux instead.
We have our own tomcat based archiving software that would run on this storage server, along with NFS client and server. Its a write once, read almost never kind of application, storing compressed batches of archive files for a year or two. 400TB written over 2 years translates to about 200TB/year or about 7MB/second average write speed. The very rare and occasional read accesses are done by batches where a client makes a webservice call to get a specific set of files, then they are pushed as a batch to staging storage where the user can then browse them, this can take minutes without any problems.
My general idea is a 2U server with 1-4 SAS cards connected to strings of about 48 SATA disks (4 x 12 or 3 x 16), all configured as JBOD, so there would potentially be 48 or 96 or 192 drives on this one server. I'm thinking they should be laid as as 4 or 8 or 16 seperate RAID6 sets of 10 disks each, then use LVM to put those into a larger volume. About 10% of the disks would be reserved as global hot spares.
So, my questions...
A) Can CentOS 6 handle that many JBOD disks in one system? is my upper size too big and I should plan for 2 or more servers? What happens with the device names when you've gone past /dev/sdz ?
B) What is the status of large file system support in CentOS 6? I know XFS is frequently mentioned with such systems, but I/we have zero experience with it, its never been natively supported in EL up to 5, anyways.
C) Is GFS suitable for this, or is it strictly for clustered storage systems?
D) anything important I've neglected?
-- john r pierce N 37, W 122 santa cruz ca mid-left coast