[CentOS] wich filesystem to store > 250E6 small files in same or hashed dire
Les Mikesell
lesmikesell at gmail.com
Mon Mar 14 18:43:22 UTC 2011
On 3/14/2011 12:33 PM, Alain Spineux wrote:
>
>> File aaa12345 goes in
>>
>> $DIR/a/a/a/12345
>>
>> File abc6789 goes in
>>
>> $DIR/a/b/c/6789
>
> Try to create this king of tree yourself and when done, remove it.
> I took hours on my box and it is even faster to keep all files in the
> same diretory.
> I looks like working in multiple directories slow down the process also
Normally, directory/inode caching would help but you are probably
exceeding any reasonable caching attempt and making it thresh.
> I read other posts and articles and handle more than 100M files become
> a problem !
> 256M is a problem and more than 1G files is a big problem.
>
> I was splitting data into files to help me. I will keep them in big files.
Depending on your access needs you might investigate some of the
scalable nosql databases like riak or cassandra. These would let you
distribute the data (and access contention) across a cluster of
machines. Riak has an extension called luwak that handles large data
streams by chunking into smaller key/value sets which are then
distributed over the cluster. The way implementation works, you also
get de-duplication of chunks - with the down side that you can't really
delete anything. And riak in general wants to keep all the keys in RAM
which might be an issue for your use unless you can spread it over
several machines.
--
Les Mikesell
lesmikesell at gmail.com
More information about the CentOS
mailing list