[CentOS] lots of small files in a folder on Linux centos

25 Jul 2011


      On Mon, 25 Jul 2011, Marc Deop wrote:
...
It's more than twice as fast than the previous sh script.
In part this is /bin/sh v /bin/bash and using 'bashisms' 
matter, but yes, I did not seek to optimize a teaching 
throwaway
...
1- m5sum the file we need
... actually the NAME of the file, to make it explicit we are
    not looking at content [also a reasonable approach if one is
    looking to find and de-duplicate a filestore]
...
2- look for the first letter of the hash
... actually this may be more than a single letter of the
    hash --- with ca 3000 files, and 16 hash characters,
    we should end up with about 200 files per
    subdirectory.  The filesystem should be doing some sort of
    index as well -- as I recall, a B-tree in the case of
    extN but I've not expressly looked.  The php case was
    mentioned, however, and its directory searching is less
    optimal
We have a customer with a similar problem with a naiively 
written set of home brewed PHP code, and are helping them work 
through similar issues
...
3- get into the directory
4- now we look for our file
... this is probably a single operation to suck the sub-directory
    listing into an array in php, and use an associative
    match
but you are right, we are moving increasingly away from a 
CentOS issue to a more general coding style issue
-- Russ herrold

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[CentOS] lots of small files in a folder on Linux centos