[CentOS] GFS and Small Files

JohnS jses27 at gmail.com
Sun May 3 15:26:58 UTC 2009


On Sun, 2009-05-03 at 06:09 -0700, Nifty Cluster Mitch wrote:
> On Wed, Apr 29, 2009 at 07:01:17PM +0800, Hairul Ikmal Mohamad Fuzi wrote:
> > 
> > Hi all,
> > 
> > We are running CentOS 5.2 64bit as our file server.
> > Currently, we used GFS (with CLVM underneath it) as our filesystem
> > (for our multiple 2TB SAN volume exports) since we plan to add more
> > file servers (serving the same contents) later on.
> > 
> > The issue we are facing at the moment is we found out that command
> > such as 'ls' gives a very slow response.(e.g 3-4minutes for the
> > outputs of ls to be printed out, or in certain cases, 20minutes or so)
> > This is completely true especially in directories containing "large
> > number" of small files (e.g 90000+ of 1-4kb files). The thing is, most
> > of system users are generating these small files frequently as part of
> > their workflow.
> > 
> > We tried emulating the same scenario (90000+ of small files) on a ext3
> > partition and it gives almost the same result.
> 
> This is likely related to the size of the "ls" process growing.
> To sort by date etc.  "ls" pulls all the meta data into memory
> then reports.
> 
> > 
> > I believe most of the CLVM/GFS settings done are using the defaults
> > parameters. Additionally, we would prefer to stick to GFS (or at least
> > ext3) as it is part of CentOS / RHEL distribution rather than changing
> > into other small-files 'friendly' filesystems (such as XFS, ReiserFS).
> > 
> > I'm exploring whether is there anyway we can tune the GFS parameters
> > to make the system more responsive?
> 
> With 'gobs' of files you may find that find, xargs and stat are the tools
> of choice.
>   
> > I have read that we can apply 'dir_index' option to ext3 partition to
> > speedup things, but I'm not so sure about GFS.
> 
> Do look at "ls" with strace, top or a debugger.
---
I don't know if this will help you but if there is more than 100,000
files you can use UPPERCASE or lowercase file naming. That will speeds
things greatly under samba with over 200,000 images. Ahh I see you have
90,000. This is also workable in samba 4 with a GFS Cluster.

JohnStanley




More information about the CentOS mailing list