On Mon, 2011-03-14 at 13:10 -0700, Dr. Ed Morbius wrote: > on 13:10 Sat 12 Mar, Alain Spineux (aspineux at gmail.com) wrote: > > Hi > > I need to store about 250.000.000 files. Files are less than 4k. > > On a ext4 (fedora 14) the system crawl at 10.000.000 in the same directory. > > I tried to create hash directories, two level of 4096 dir = 16.000.000 > > but I had to stop the script to create these dir after hours > > and "rm -rf" would have taken days ! mkfs was my friend > > I tried two levels, first of 4096 dir, second of 64 dir. The creation > > of the hash dir took "only" few minutes, > > but copying 10000 files make my HD scream for 120s ! I take only 10s > > when working in the same directory. > > The filenames are all 27 chars and the first chars can be used to hash > > the files. Exactly {XY}/{XY}/{ABCDEFGHIJKLMNOPQRSTUVW} will probably work just fine. Two characters is 676 combinations, hardly a large directory, and that puts less than 1,000 entries in a folder. > > My question is : Which filesystem and how to store these files ? > I'd also question the architecture and suggest an alternate approach: > hierarchical directory tree, database, "nosql" hashing lookup, or other > approach. See squid for an example of using directory trees to handle > very large numbers of objects. Exactly. Squid and Cyrus IMAPd both manage to store massive number of objects in a filesystem using hashing. It is simple and reliable. I'd wonder if that is really an issue if you just don't have an actual I/O throughput problem. Certainly trying to do any solution on a single-disk is going to be awful.