On Mon, 2011-03-14 at 13:10 -0700, Dr. Ed Morbius wrote:
on 13:10 Sat 12 Mar, Alain Spineux (aspineux@gmail.com) wrote:
Hi I need to store about 250.000.000 files. Files are less than 4k. On a ext4 (fedora 14) the system crawl at 10.000.000 in the same directory. I tried to create hash directories, two level of 4096 dir = 16.000.000 but I had to stop the script to create these dir after hours and "rm -rf" would have taken days ! mkfs was my friend I tried two levels, first of 4096 dir, second of 64 dir. The creation of the hash dir took "only" few minutes, but copying 10000 files make my HD scream for 120s ! I take only 10s when working in the same directory. The filenames are all 27 chars and the first chars can be used to hash the files.
Exactly {XY}/{XY}/{ABCDEFGHIJKLMNOPQRSTUVW} will probably work just fine. Two characters is 676 combinations, hardly a large directory, and that puts less than 1,000 entries in a folder.
My question is : Which filesystem and how to store these files ?
I'd also question the architecture and suggest an alternate approach: hierarchical directory tree, database, "nosql" hashing lookup, or other approach. See squid for an example of using directory trees to handle very large numbers of objects.
Exactly. Squid and Cyrus IMAPd both manage to store massive number of objects in a filesystem using hashing. It is simple and reliable.
I'd wonder if that is really an issue if you just don't have an actual I/O throughput problem. Certainly trying to do any solution on a single-disk is going to be awful.