On 3/14/2011 12:33 PM, Alain Spineux wrote: > >> File aaa12345 goes in >> >> $DIR/a/a/a/12345 >> >> File abc6789 goes in >> >> $DIR/a/b/c/6789 > > Try to create this king of tree yourself and when done, remove it. > I took hours on my box and it is even faster to keep all files in the > same diretory. > I looks like working in multiple directories slow down the process also Normally, directory/inode caching would help but you are probably exceeding any reasonable caching attempt and making it thresh. > I read other posts and articles and handle more than 100M files become > a problem ! > 256M is a problem and more than 1G files is a big problem. > > I was splitting data into files to help me. I will keep them in big files. Depending on your access needs you might investigate some of the scalable nosql databases like riak or cassandra. These would let you distribute the data (and access contention) across a cluster of machines. Riak has an extension called luwak that handles large data streams by chunking into smaller key/value sets which are then distributed over the cluster. The way implementation works, you also get de-duplication of chunks - with the down side that you can't really delete anything. And riak in general wants to keep all the keys in RAM which might be an issue for your use unless you can spread it over several machines. -- Les Mikesell lesmikesell at gmail.com