[CentOS] GFS and Small Files

Sun May 3 13:09:30 UTC 2009
Nifty Cluster Mitch <niftycluster at niftyegg.com>

On Wed, Apr 29, 2009 at 07:01:17PM +0800, Hairul Ikmal Mohamad Fuzi wrote:
> 
> Hi all,
> 
> We are running CentOS 5.2 64bit as our file server.
> Currently, we used GFS (with CLVM underneath it) as our filesystem
> (for our multiple 2TB SAN volume exports) since we plan to add more
> file servers (serving the same contents) later on.
> 
> The issue we are facing at the moment is we found out that command
> such as 'ls' gives a very slow response.(e.g 3-4minutes for the
> outputs of ls to be printed out, or in certain cases, 20minutes or so)
> This is completely true especially in directories containing "large
> number" of small files (e.g 90000+ of 1-4kb files). The thing is, most
> of system users are generating these small files frequently as part of
> their workflow.
> 
> We tried emulating the same scenario (90000+ of small files) on a ext3
> partition and it gives almost the same result.

This is likely related to the size of the "ls" process growing.
To sort by date etc.  "ls" pulls all the meta data into memory
then reports.

> 
> I believe most of the CLVM/GFS settings done are using the defaults
> parameters. Additionally, we would prefer to stick to GFS (or at least
> ext3) as it is part of CentOS / RHEL distribution rather than changing
> into other small-files 'friendly' filesystems (such as XFS, ReiserFS).
> 
> I'm exploring whether is there anyway we can tune the GFS parameters
> to make the system more responsive?

With 'gobs' of files you may find that find, xargs and stat are the tools
of choice.
  
> I have read that we can apply 'dir_index' option to ext3 partition to
> speedup things, but I'm not so sure about GFS.

Do look at "ls" with strace, top or a debugger.
 

-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?