[CentOS] Question about optimal filesystem with many small files.

Sat Jul 11 15:47:55 UTC 2009
JohnS <jses27 at gmail.com>

On Sat, 2009-07-11 at 00:01 +0000, oooooooooooo ooooooooooooo wrote:
> > You mentioned that the data can be retrieved from somewhere else. Is
> > some part of this filename a unique key? 
> 
> The real key is up to 1023 chracters long and it's unique, but I have to trim to 256 charactes, by this way is not unique unless I add the hash.
> 
> >Do you have to track this
> > relationship anyway - or age/expire content? 
> 
> I have to track the long filename -> short file name realation ship. Age is not relevant here.
> 
> I'd try to arrange things
> > so the most likely scenario would take the fewest operations. Perhaps a
> > mix of hash+filename would give direct access 99+% of the time and you
> > could move all copies of collisions to a different area. 
> 
> yes its a good idea, but at this point I don't want to add more complexity tomy app, and having a separate area for collisions would make it more complex.
> 
> >Then you could
> > keep the database mapping the full name to the hashed path but you'd
> > only have to consult it when the open() attempt fails.
> 
> As the long filename is up to 1023 chars long i can't index it with mysql (it has a lower max limit). That's why I use the hash which is indexed). What I do is keeping a list of just the md5 of teh cached files in memory in my app, before going to mysql, I frist check if it's in the list (realy a RB-Tree).
---
It is 1024 chars long. Witch want still help. MSSQL 2005 and up is
longer, if your interested:
http://msdn.microsoft.com/en-us/library/ms143432.aspx
But that greatly depends on your data size 900 bytes is the limit but
can be exceeded.

You can use either one if you do a unique key id name for the index.
File name to Unique short name. I would not store images in either one
as your SELECT LIKE and Random will kill it. As much as I like DBs I
have to say the flat file system is for those.

John