[CentOS] Question about optimal filesystem with many small files.

Wed Jul 8 15:56:29 UTC 2009
Les Mikesell <lesmikesell at gmail.com>

oooooooooooo ooooooooooooo wrote:
> Hi,
> I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don't have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can't redesign the way it distributes files into disk due to the other app requirements).
> The filesystem I use is ext3 with teh following options enabled:
> Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
> Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
> Thanks in advance (and please excuse my bad english).

I haven't done, or even seen, any recent benchmarks but I'd expect 
reiserfs to still be the best at that sort of thing.   However even if 
you can improve things slightly, do not let whoever is responsible for 
that application ignore the fact that it is a horrible design that 
ignores a very well known problem that has easy solutions.  And don't 
ever do business with someone who would write a program like that again. 
  Any way you approach it, when you want to write a file the system must 
check to see if the name already exists, and if not, create it in an 
empty space that it must also find - and this must be done atomically so 
the directory must be locked against other concurrent operations until 
the update is complete.  If you don't index the contents the lookup is a 
slow linear scan - if you do, you then have to rewrite the index on 
every change so you can't win.  Sensible programs that expect to access 
a lot of files will build a tree structure to break up the number that 
land in any single directory (see squid for an example).  Even more 
sensible programs would re-use some existing caching mechanism like 
squid or memcached instead of writing a new one badly.

   Les Mikesell
    lesmikesell at gmail.com