On Sun, 24 Jul 2011, yonatan pingle wrote:
the coder is not tech savvy as one might expect, so it's really hard for me to explain the issue of having lots of files in one folder to the site owner or to the coder.
I do not expect coders to remain 'not tech savvy'
If the coder is not willing to learn and to test, you are already doomed, and should walk away from the project
To show the problem, take a pile of pennies, and ask the coder to find one with a given year. The coder will have to do a linear search, to even know if the target exists. Then show a egg carton with another pile of pennies sorted and labelled by year in each section, and aask them to repeat the task -- in the latter case, it is a 'single seek' to solve the problem
Obviously, the target year may not even be present. With a single pile (directory) the linear search is still required, but with 'binning' by years, that is obvious by inspection as well
One approach to lots of files in a single directory (which can cause problems in getting timely access to a specific file) is to build a permuted directory tree from the file names to spread the load around. If the files are of a form where they have 'closely identical' names [pix00001.jpg, pix00002.jpg, etc], first build a 'hashed' version of the file name with md5sum, or such, to level the hash leading characters
[herrold@localhost ~]$ ./hashdemo.sh pix00001.jpg fd8f49c6487588989cd764eb493251ec pix00002.jpg 12955d9587d99becf3b2ede46305624c pix00003.jpg bfdc8f593676e4f1e878bb6959f14ce2 [herrold@localhost ~]$ cat hashdemo.sh #!/bin/sh # CANDIDATES="pix00001.jpg pix00002.jpg pix00003.jpg" for i in `echo "${CANDIDATES}"`; do HASH=`echo "$i" | md5sum - | awk {'print $1'}` echo "$i ${HASH}" done [herrold@localhost ~]$
then, we look to the leading letter of the hask, to design our egg carton bins. We place pix00001.jpg in directory: ./f/ and pix00002.jpg in directory ./1/ and pix00003.jpg in directory ./b/ and so forth -- if the directories get too full again, you might go to using the first two letters of the hash to perform the 'binning' process
The md5sum function is readily available in php, as are directory creation and so forth, so positioning the files, and computing the indexes are straightforward there
This is all pretty basic stuff, covered in Knuth in TAOCP long ago
-- Russ herrold