I would like to know what you do about the number of files in a folder, or if that is a concern. I think there is a limitation or a slow down if it gets to big, but what is optimal (if necessary)
SO what is best for file management and system resources?
Using hash_index on ext3 or a hashing file system helps... but in many such contexts, I've found if you can do a multi-level directory hashing scheme (compute some reproducible hash on a file name or user name/ID) and index into a directory structure, this can help. -Alan
I set up using an ext3 and with centos I believe that is 4blocks which has a 8tb size limit overall. However, I believe that is per logical drive. Also, the number of total files per logical drive is some strange formula or Volume size divided by 2 to the 23rd power...but not sure, it may be size /2 and then to the 23rd power. That is a lot of files I think. 32,000 is the max sub directory count for a directory.
I am going with a max size of 1000 files per folder and a max sub directory for, let's say an image folder, of 10,000. I think this will keep the application I am building fine with most computers.
For my own sites, I think when approacing a huge volume it will be time to just get some bigger drives with a different file system to host those specific directories and that should solve it all.
The only way, I can see, to not slow the computer down is limit to number of files in a directory and number of folders in a directory (such as no more than 1000 1st tier sub directories in the image folder. And tying to make sure it is eiter folders or files in a folder, not both should help.
OF course it would be nice to be able to benchmark the process by number of files, sub directories, and files per sub directory....there might be a way.
I think that is the only way to handle it, at least in a small system without large drives and using ext3.
Thanks for all the input.