[CentOS] Question about optimal filesystem with many small files.

Mon Jul 13 15:37:44 UTC 2009
Les Mikesell <lesmikesell at gmail.com>

JohnS wrote:
> On Mon, 2009-07-13 at 05:49 +0000, oooooooooooo ooooooooooooo wrote:
>>> It is 1024 chars long. Witch want still help.
>> I'm usng mysam and according to: http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html
>> "The maximum key length is 1000 bytes. This can also be changed by changing the source and recompiling. For the case of a key longer than 250 bytes, a larger key block size than the default of 1024 bytes is used. "
>>> I would not store images in either one
>> as your SELECT LIKE and Random will kill it. 
>> Well, I think that this can be avoided, using just searches in teh key fields should not give these issues. Does somebody have experience storing a large amount of medium (1KB-150KB) blob objects in mysql?
> True
> An option would be to encode them to Base64 on INSERT but if you Index
> all of you BLOBS on INSERT really there should be no problem. Besides
> 150Kb is not a big for a BLOB. Consider 20MB to 100MB with multiple
> joins on MSSQL, 64Bit although. Apparently size is based on the maximum
> amount of memory the client has. VARBLOB apparently has no limit per
> docs. As doing this on MySQL I can not relate to. I can on DB2 and
> MSSQL. I can say you can rival the 32Bit MSSQL performance by at least
> 15 percent. I can only say that I have experiance with raw DB
> predictions in Graphing. Edge and Adjacency Modeling on MySQL.
> What I see slowing you down is the TQSL and SPROCS. The dll for the md5
> I posted earlier will scale to 1000s of inserts at the time. If speed is
> really your essence then use RAW Partitions for the DB and RAM. Use the
> MySQL Connector or the ODBC or you will hit size limits on INSERT and
>>> However I have not a clue that this is even doable in MySQL.
>> In mysql there is already a MD5 funtion: http://dev.mysql.com/doc/refman/5.1/en/encryption-functions.html#function_md5
> Yes, I was informed that a call from a SPROC to "md5()" would do the
> trick and take the load of the client. At least that was my intent of
> the idea to balance the load. That is if this is client/server.
> I do wonder about your memory allocation and disk. It is all about the
> DB design. Think about a Genealogy DB. Where do you end design? You
> don't. Where does predictions end? They don't.

I think you are making this way too complicated.  You are going to end 
up filling a large disk with small bits of data and your speed is going 
to be limited by how fast the disk head can get to the right place for 
anything that isn't already in a buffer.  Other than the special case of 
too many entries in a single directory, the software overhead isn't 
going to make much difference unless you can effectively predict what 
you are likely to want next or keep the most popular things in your 
buffers.  Hardware-wise, adding RAM is likely to help even if it is just 
for the filesystem inode/directory cache - and if you are lucky, the LRU 
data buffering.  Also, spreading your data over several disks would help 
by reducing the head contention.

   Les Mikesell
    lesmikesell at gmail.com