[CentOS] Question about optimal filesystem with many small files.

Wed Jul 8 22:03:42 UTC 2009
oooooooooooo ooooooooooooo <hhh735 at hotmail.com>

(i resent thsi message as previous one seems bad formatted, sorry for the mess).


>Perhaps think about running tune2fs maybe also consider adding noatime 
 
Yes, I added it and I got a perfomance increase, anyway as the number of fields grows the speed keeps going below an acceptable level.
 


>I saw this article some time back.
 
http://www.linux.com/archive/feature/127055


Good idea, I already use mysql for indexing the files, so everytime I need to make a lookup I don't need the entire dir and then get the file, anyway my requirements are keeping the files on disk.


 
>The only way to deal with it (especially if the
application adds and removes these files regularly) is to every once in a
while copy the files to another directory, nuke the directory and restore
from the copy.


Thanks, but there will not be too many file updates once the cache is done, so recreating directories can not be very helpful here. The issue is that as the number of files grows, bot reads from existing files and new insertion gets slower and slower.


 
>I haven't done, or even seen, any recent benchmarks but I'd expect
 reiserfs to still be the best at that sort of thing. I've looking at some benchmarks and reiser seems a bit faster in my scenario, however my problem happens when I have a arge number of files, for what I have seen, I'm not sure if reiser would be a fix....
>However even if 
you can improve things slightly, do not let whoever is responsible for 
that application ignore the fact that it is a horrible design that 
ignores a very well known problem that has easy solutions.

My original idea was storing the file with a hash of it name, and then store a  hash->real filename in mysql. By this way I have direct access to the file and I can make a directory hierachy with the first characters of teh hash /c/0/2/a, so i would have 16*4 =65536 leaves in the directoy tree, and the files would be identically distributed, with around 200 files per dir (waht should not give any perfomance issues). But the requiremenst are to use the real file name for the directory tree, what gives the issue.

 
 
>Did that program also write your address header ?
:)


 
Thanks for the help.
 
 
----------------------------------------
> From: hhh735 at hotmail.com
> To: centos at centos.org
> Date: Wed, 8 Jul 2009 06:27:40 +0000
> Subject: [CentOS] Question about optimal filesystem with many small files.
>
>
> Hi,
>
> I have a program that writes lots of files to a directory tree (around 15 Million fo files), and a node can have up to 400000 files (and I don't have any way to split this ammount in smaller ones). As the number of files grows, my application gets slower and slower (the app is works something like a cache for another app and I can't redesign the way it distributes files into disk due to the other app requirements).
>
> The filesystem I use is ext3 with teh following options enabled:
>
> Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
>
> Is there any way to improve performance in ext3? Would you suggest another FS for this situation (this is a prodution server, so I need a stable one) ?
>
> Thanks in advance (and please excuse my bad english).
>
>
> _________________________________________________________________
> Connect to the next generation of MSN Messenger
> http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
 
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
_________________________________________________________________
Connect to the next generation of MSN Messenger 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline