[CentOS] wich filesystem to store > 250E6 small files in same or hashed dire

Les Mikesell lesmikesell at gmail.com
Mon Mar 14 18:43:22 UTC 2011


On 3/14/2011 12:33 PM, Alain Spineux wrote:
>
>> File aaa12345 goes in
>>
>>    $DIR/a/a/a/12345
>>
>> File abc6789 goes in
>>
>>     $DIR/a/b/c/6789
>
> Try to create this king of tree yourself and when done, remove it.
> I took hours on my box and it is even faster to keep all files in the
> same diretory.
> I looks like working in multiple directories slow down the process also

Normally, directory/inode caching would help but you are probably 
exceeding any reasonable caching attempt and making it thresh.

> I read other posts and articles and handle more than 100M files become
> a problem !
> 256M is a problem and more than 1G files is a big problem.
>
> I was splitting data into files to help me. I will keep them in big files.

Depending on your access needs you might investigate some of the 
scalable nosql databases like riak or cassandra.   These would let you 
distribute the data (and access contention) across a cluster of 
machines.  Riak has an extension called luwak that handles large data 
streams by chunking into smaller key/value sets which are then 
distributed over the cluster.  The way implementation works, you also 
get de-duplication of chunks - with the down side that you can't really 
delete anything.  And riak in general wants to keep all the keys in RAM 
which might be an issue for your use unless you can spread it over 
several machines.

-- 
   Les Mikesell
    lesmikesell at gmail.com



More information about the CentOS mailing list