[CentOS] Question about optimal filesystem with many small files.

JohnS

jses27 at gmail.com
Sat Jul 11 17:10:57 UTC 2009


On Sat, 2009-07-11 at 11:48 -0400, JohnS wrote:
> On Sat, 2009-07-11 at 00:01 +0000, oooooooooooo ooooooooooooo wrote:
> > > You mentioned that the data can be retrieved from somewhere else. Is
> > > some part of this filename a unique key? 
> > 
> > The real key is up to 1023 chracters long and it's unique, but I have to trim to 256 charactes, by this way is not unique unless I add the hash.
> > 
> > >Do you have to track this
> > > relationship anyway - or age/expire content? 
> > 
> > I have to track the long filename -> short file name realation ship. Age is not relevant here.
> > 
> > I'd try to arrange things
> > > so the most likely scenario would take the fewest operations. Perhaps a
> > > mix of hash+filename would give direct access 99+% of the time and you
> > > could move all copies of collisions to a different area. 
> > 
> > yes its a good idea, but at this point I don't want to add more complexity tomy app, and having a separate area for collisions would make it more complex.
> > 
> > >Then you could
> > > keep the database mapping the full name to the hashed path but you'd
> > > only have to consult it when the open() attempt fails.
> > 
> > As the long filename is up to 1023 chars long i can't index it with mysql (it has a lower max limit). That's why I use the hash which is indexed). What I do is keeping a list of just the md5 of teh cached files in memory in my app, before going to mysql, I frist check if it's in the list (realy a RB-Tree).
> ---
> It is 1024 chars long. Witch want still help. MSSQL 2005 and up is
> longer, if your interested:
> http://msdn.microsoft.com/en-us/library/ms143432.aspx
> But that greatly depends on your data size 900 bytes is the limit but
> can be exceeded.
> 
> You can use either one if you do a unique key id name for the index.
> File name to Unique short name. I would not store images in either one
> as your SELECT LIKE and Random will kill it. As much as I like DBs I
> have to say the flat file system is for those.
> 
> John
---
Just a random thought on Hashes VIA DB that none hardly give any thought
about.

Using Extended Stored Procedures like:MSSQL. You can make your on hashes
on the file insert.

USE master;
EXEC sp_extendedproc 'your_md5', 'your_md5.dll'

Of course you will have to create your own .DLL to to do the Hashing. 

Then create your on functions:
SELECT dbo.your_md5('YourHash');

Direct:
EXEC master.dbo.your_md5 'YourHash'

However I have not a clue that this is even doable in MySQL.

John




More information about the CentOS mailing list