[CentOS] Question about optimal filesystem with many small files.

Les Mikesell

lesmikesell at gmail.com
Fri Jul 10 18:52:56 UTC 2009


oooooooooooo ooooooooooooo wrote:
> Hi, After talking with te customer, I finnaly managed to convince him for using the first characters of the hash as directory names.
> 
> Now I'm in doubt about the following options:
> 
> a) Using directory 4 levels /c/2/a/4/ (200 files per directory) and mysql with a hash->filename table, so I can get teh file name from the hash and then I can directly access it (I first query mysql for the hash of the file, and the I read the file).
> 
> b) Using 5 levels without mysql, and making a dir listing (due to technical issues, I can't only know an approximate file name, so I can't make a direct access here), match the file name and then read it. The issue here is that I would have 16^5 leave directories (more than a million).
> 
> I could also make more combinations of mysql/not mysql and number of levels.
> 
> What do you think it would give the best performance in ext3?

I don't think you've explained the constraint that would make you use 
mysql or not.  I'd avoid it if everything involved can compute the hash 
or is passed the whole path since is bound to be slower than doing the 
math, and just on general principles I'd use a tree like 
00/AA/FF/filename (three levels of 2 hex characters) as the first cut, 
although squid uses just two levels with a default of 16 first level and 
256 2nd level directories and probably has some good reason for it.

-- 
   Les Mikesell
    lesmikesell at gmail.com




More information about the CentOS mailing list