[CentOS] Question about optimal filesystem with many small files.

Fri Jul 10 22:19:52 UTC 2009
Les Mikesell <lesmikesell at gmail.com>

oooooooooooo ooooooooooooo wrote:
>> I don't think you've explained the constraint that would make you use
>> mysql or not.
> 
> My original idea was using the just the hash as filename, by this way I could have a direct access. But the customer rejected this and requested to have part of the long file name (from 11 to 1023 characters). As linux only allows 256 characters in the path and I could get duplicates with the 256 first chars, I trim teh real filename to around 200 characters and I add the hash at the end (plus a couple metadata small fields). 
> 
> Yes, there requirements does not makes too much sense, but I've tried to convince the customer to use just the hash with no luck (seems he does not understand well what is a hash although I've tried to explain it several times).

You mentioned that the data can be retrieved from somewhere else.  Is 
some part of this filename a unique key?  Do you have to track this 
relationship anyway - or age/expire content?  I'd try to arrange things 
so the most likely scenario would take the fewest operations.  Perhaps a 
mix of hash+filename would give direct access 99+% of the time and you 
could move all copies of collisions to a different area.  Then you could 
  keep the database mapping the full name to the hashed path but you'd 
only have to consult it when the open() attempt fails.

> That's why  I need or a) use mysql or b) do a directory lising.
> 
>> 00/AA/FF/filename
> That would make up to 256^3 directory leaves, what is more than 16 Million ones, due I have around 15M files, I think that this is an excessive number of directories.

I guess that's why squid only uses 16 x 256...

-- 
   Les Mikesell
     lesmikesell at gmail.com