On 7/14/2011 1:32 AM, John R Pierce wrote:
I've been asked for ideas on building a rather large archival storage system for inhouse use, on the order of 100-400TB. Probably using CentOS 6. The existing system this would replace is using Solaris 10 and ZFS, but I want to explore using Linux instead.
We have our own tomcat based archiving software that would run on this storage server, along with NFS client and server. Its a write once, read almost never kind of application, storing compressed batches of archive files for a year or two. 400TB written over 2 years translates to about 200TB/year or about 7MB/second average write speed. The very rare and occasional read accesses are done by batches where a client makes a webservice call to get a specific set of files, then they are pushed as a batch to staging storage where the user can then browse them, this can take minutes without any problems.
If it doesn't have to look exactly like a file system you might like luwak which is a layer over the riak nosql distributed database to handle large files. (http://wiki.basho.com/Luwak.html) The underlying storage is distributed across any number of nodes with a scheme that lets you add more as needed and keeps redundant copies to handle node failures. A down side of luwak for most purposes is that because it chunks the data and re-uses duplicates, you can't remove anything, but for archive purposes it might work well.
For something that looks more like a filesystem, but is also distributed and redundant: http://www.moosefs.org/.