oooooooooooo ooooooooooooo wrote:
Hi,
I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don't have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can't redesign the way it distributes files into disk due to the other app requirements).
The filesystem I use is ext3 with teh following options enabled:
Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
Thanks in advance (and please excuse my bad english).
I haven't done, or even seen, any recent benchmarks but I'd expect reiserfs to still be the best at that sort of thing. However even if you can improve things slightly, do not let whoever is responsible for that application ignore the fact that it is a horrible design that ignores a very well known problem that has easy solutions. And don't ever do business with someone who would write a program like that again. Any way you approach it, when you want to write a file the system must check to see if the name already exists, and if not, create it in an empty space that it must also find - and this must be done atomically so the directory must be locked against other concurrent operations until the update is complete. If you don't index the contents the lookup is a slow linear scan - if you do, you then have to rewrite the index on every change so you can't win. Sensible programs that expect to access a lot of files will build a tree structure to break up the number that land in any single directory (see squid for an example). Even more sensible programs would re-use some existing caching mechanism like squid or memcached instead of writing a new one badly.