[CentOS] lots of small files in a folder on Linux centos

R P Herrold herrold at centos.org
Sun Jul 24 21:50:11 UTC 2011


On Sun, 24 Jul 2011, Keith Roberts wrote:

>> By using a hash, we remove those constraints, and also gain
>> the virtuous effect for free of self-organizing a relatively
>> level dispersion of files to the destination directories
>
> Not followed the whole thread, but a SQL database index of
> the actual picture files, giving the path into the directory
> structure. Would that work?

Fortunately there is a full, and freely accessible of all 
posts to this mailing list.  The link to that archive is in 
the header of every message through this list.  As such you 
need not speculate

As I read the post initially, the problem was as stated in the 
subject line, and the database issue was not in the forefront

Per the initial problem description, the files were all 
splatted into a single directory.  The fastest database I know 
of is using the filesystem as a database; The addition of the 
hashing is just a pointer, and so also O(1)

Adding a database engine, with the overhead that it brings, 
and as the thread has already pointed out, in a domU as well 
(not usually the best place to add the overhead of a 
database), simply are additonal points of mis-design

“We should forget about small efficiencies, say about 97% of 
the time: premature optimization is the root of all evil. Yet 
we should not pass up our opportunities in that critical 3%. A 
good programmer will not be lulled into complacency by such 
reasoning, he will be wise to look carefully at the critical 
code; but only after that code has been identified”
   - Donald Knuth [1]

Once the implementation is 'correct', then it is time to do 
A:B testing to see where the really problem lies ... which 
testing was at the head of my initial post on this topic

-- Russ herrold

[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

A person not willing to pony up $2.73 for a used copy of 'The 
Art of Computer Programming: Sorting and Searching. Volume 3', 
which discusses the specific problem space here, may wish to 
read and consider his rather nice lecture published by the 
ACM



More information about the CentOS mailing list