James A. Peltier wrote:
There isn't a good file system for this type of thing. filesystems with many very small files are always slow. Ext3, XFS, JFS are all terrible for this type of thing.
I can think of one...though you'll pay out the ass for it, the Silicon file system from BlueArc (NFS), file system runs on FPGAs. Our BlueArc's never had more than 50-100,000 files in any particular directory(millions in any particular tree), though they are supposed to be able to handle this sort of thing quite well.
I think entry level list pricing starts at about $80-100k for 1 NAS gateway (no disks).
Our BlueArc's went end of life earlier this year and we migrated to an Exanet cluster(runs on top of CentOS 4.4 though uses it's own file system, clustering and NFS services) which is still very fast though not as fast as BlueArc.
And with block based replication it doesn't matter how many files there are, performance is excellent for backup, send data to another rack in your data center or to another continent over the WAN. In BlueArc's case transparently send data to a dedupe device or tape drive based on dynamic access patterns(and move it back automatically when needed).
http://www.bluearc.com/html/products/file_system.shtml http://www.exanet.com/default.asp?contentID=231
Both systems scale to gigabytes/second of throughput linearly, and petabytes of storage without downtime. The only downside to BlueArc is their back end storage, they only offer tier 2 storage and only have HDS for tier 1. You can make an HDS perform but it'll cost you even more..The tier 2 stuff is too unreliable(LSI logic). Exanet at least supports almost any storage out there(we went with 3PAR).
Don't even try to get a netapp to do such a thing.
nate