On Sun, 24 Jul 2011, Keith Roberts wrote:
By using a hash, we remove those constraints, and also gain the virtuous effect for free of self-organizing a relatively level dispersion of files to the destination directories
Not followed the whole thread, but a SQL database index of the actual picture files, giving the path into the directory structure. Would that work?
Fortunately there is a full, and freely accessible of all posts to this mailing list. The link to that archive is in the header of every message through this list. As such you need not speculate
As I read the post initially, the problem was as stated in the subject line, and the database issue was not in the forefront
Per the initial problem description, the files were all splatted into a single directory. The fastest database I know of is using the filesystem as a database; The addition of the hashing is just a pointer, and so also O(1)
Adding a database engine, with the overhead that it brings, and as the thread has already pointed out, in a domU as well (not usually the best place to add the overhead of a database), simply are additonal points of mis-design
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified” - Donald Knuth [1]
Once the implementation is 'correct', then it is time to do A:B testing to see where the really problem lies ... which testing was at the head of my initial post on this topic
-- Russ herrold
[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf
A person not willing to pony up $2.73 for a used copy of 'The Art of Computer Programming: Sorting and Searching. Volume 3', which discusses the specific problem space here, may wish to read and consider his rather nice lecture published by the ACM