Hi all,
We are running CentOS 5.2 64bit as our file server. Currently, we used GFS (with CLVM underneath it) as our filesystem (for our multiple 2TB SAN volume exports) since we plan to add more file servers (serving the same contents) later on.
The issue we are facing at the moment is we found out that command such as 'ls' gives a very slow response.(e.g 3-4minutes for the outputs of ls to be printed out, or in certain cases, 20minutes or so) This is completely true especially in directories containing "large number" of small files (e.g 90000+ of 1-4kb files). The thing is, most of system users are generating these small files frequently as part of their workflow.
We tried emulating the same scenario (90000+ of small files) on a ext3 partition and it gives almost the same result.
I believe most of the CLVM/GFS settings done are using the defaults parameters. Additionally, we would prefer to stick to GFS (or at least ext3) as it is part of CentOS / RHEL distribution rather than changing into other small-files 'friendly' filesystems (such as XFS, ReiserFS).
I'm exploring whether is there anyway we can tune the GFS parameters to make the system more responsive? I have read that we can apply 'dir_index' option to ext3 partition to speedup things, but I'm not so sure about GFS.
Below are the output from "gfs_tool gettune /export/gfs" :
ilimit1 = 100 ilimit1_tries = 3 ilimit1_min = 1 ilimit2 = 500 ilimit2_tries = 10 ilimit2_min = 3 demote_secs = 300 incore_log_blocks = 1024 jindex_refresh_secs = 60 depend_secs = 60 scand_secs = 5 recoverd_secs = 60 logd_secs = 1 quotad_secs = 5 inoded_secs = 15 glock_purge = 0 quota_simul_sync = 64 quota_warn_period = 10 atime_quantum = 3600 quota_quantum = 60 quota_scale = 1.0000 (1, 1) quota_enforce = 1 quota_account = 1 new_files_jdata = 0 new_files_directio = 0 max_atomic_write = 4194304 max_readahead = 262144 lockdump_size = 131072 stall_secs = 600 complain_secs = 10 reclaim_limit = 5000 entries_per_readdir = 32 prefetch_secs = 10 statfs_slots = 64 max_mhc = 10000 greedy_default = 100 greedy_quantum = 25 greedy_max = 250 rgrp_try_threshold = 100 statfs_fast = 0
TIA.
.ikmal
Hi, independently from the results you have seen it might be always reasonable to tune a gfs filesystem as follows: http://kbase.redhat.com/faq/docs/DOC-6533 specially mount with noatime and gfs_tool settune <fs> glock_purge 50
Regards Marc. On Wednesday 29 April 2009 13:01:17 Hairul Ikmal Mohamad Fuzi wrote:
Hi all,
We are running CentOS 5.2 64bit as our file server. Currently, we used GFS (with CLVM underneath it) as our filesystem (for our multiple 2TB SAN volume exports) since we plan to add more file servers (serving the same contents) later on.
The issue we are facing at the moment is we found out that command such as 'ls' gives a very slow response.(e.g 3-4minutes for the outputs of ls to be printed out, or in certain cases, 20minutes or so) This is completely true especially in directories containing "large number" of small files (e.g 90000+ of 1-4kb files). The thing is, most of system users are generating these small files frequently as part of their workflow.
We tried emulating the same scenario (90000+ of small files) on a ext3 partition and it gives almost the same result.
I believe most of the CLVM/GFS settings done are using the defaults parameters. Additionally, we would prefer to stick to GFS (or at least ext3) as it is part of CentOS / RHEL distribution rather than changing into other small-files 'friendly' filesystems (such as XFS, ReiserFS).
I'm exploring whether is there anyway we can tune the GFS parameters to make the system more responsive? I have read that we can apply 'dir_index' option to ext3 partition to speedup things, but I'm not so sure about GFS.
Below are the output from "gfs_tool gettune /export/gfs" :
ilimit1 = 100 ilimit1_tries = 3 ilimit1_min = 1 ilimit2 = 500 ilimit2_tries = 10 ilimit2_min = 3 demote_secs = 300 incore_log_blocks = 1024 jindex_refresh_secs = 60 depend_secs = 60 scand_secs = 5 recoverd_secs = 60 logd_secs = 1 quotad_secs = 5 inoded_secs = 15 glock_purge = 0 quota_simul_sync = 64 quota_warn_period = 10 atime_quantum = 3600 quota_quantum = 60 quota_scale = 1.0000 (1, 1) quota_enforce = 1 quota_account = 1 new_files_jdata = 0 new_files_directio = 0 max_atomic_write = 4194304 max_readahead = 262144 lockdump_size = 131072 stall_secs = 600 complain_secs = 10 reclaim_limit = 5000 entries_per_readdir = 32 prefetch_secs = 10 statfs_slots = 64 max_mhc = 10000 greedy_default = 100 greedy_quantum = 25 greedy_max = 250 rgrp_try_threshold = 100 statfs_fast = 0
TIA.
.ikmal _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Wed, 2009-04-29 at 19:01 +0800, Hairul Ikmal Mohamad Fuzi wrote:
Hi all,
We are running CentOS 5.2 64bit as our file server. Currently, we used GFS (with CLVM underneath it) as our filesystem (for our multiple 2TB SAN volume exports) since we plan to add more file servers (serving the same contents) later on.
The issue we are facing at the moment is we found out that command such as 'ls' gives a very slow response.(e.g 3-4minutes for the outputs of ls to be printed out, or in certain cases, 20minutes or so) This is completely true especially in directories containing "large number" of small files (e.g 90000+ of 1-4kb files). The thing is, most of system users are generating these small files frequently as part of their workflow.
One thing to keep in mind is that ls must sort the file list. If the system load is high and memory is short, you may be getting into a swap situation. I suggest trying the test when the system is lightly loaded to see if the results differ. This might be especially significant if you have a large number of concurrent users doing lots of things.
As well (I believe this is not FUD), 64 bit systems use more memory inherently so memory shortage problems could be exacerbated.
<snip>
You might want to check and see if swap is heavily used (if not root, /sbin/swapon -s). That might be helpful, or not.
TIA.
.ikmal
<snip sig stuff>
HTH
On Apr 29, 2009, at 8:35, William L. Maltby wrote:
One thing to keep in mind is that ls must sort the file list. If the system load is high and memory is short, you may be getting into a swap situation. I suggest trying the test when the system is lightly loaded to see if the results differ. This might be especially significant if you have a large number of concurrent users doing lots of things.
If sorting is the issue, what happens if you do an "ls -1U" (one file per line and unsorted)? Is that a lot faster?
Alfred
Hi,
On Wed, Apr 29, 2009 at 08:35, William L. Maltby CentOS4Bill@triad.rr.com wrote:
One thing to keep in mind is that ls must sort the file list.
Not only sorting, but usually "ls" ends up trying to find out if the file is a directory, which uses a "stat" syscall for each of the files.
This is always expensive on remote filesystems (e.g., NFS) and I would expect it to be also the case in GFS.
Besides the "-U" option of "ls", you might want to look at other options that would make "ls" not try to "stat" each file. I'm not sure exactly which of them, but one I can think of is disabling color (since color will trigger a "stat" to find if the file is a directory).
In general, having directories with a huge number of files tends to be a bad idea, you will most likely have performance bottlenecks with specific filesystems or tools. If possible, try to change the application to create two or three levels of directories using a hash on the filename and then creating directories with a small number of files on each of them.
HTH, Filipe
Filipe Brandenburger wrote:
In general, having directories with a huge number of files tends to be a bad idea, you will most likely have performance bottlenecks with specific filesystems or tools. If possible, try to change the application to create two or three levels of directories using a hash on the filename and then creating directories with a small number of files on each of them.
This is particularly true of directories with a lot of activity. Whenever an open for writing happens, the directory has to be searched to see if the file already exists, and if it doesn't it must be created - and the search/create must be atomic so the directory must be locked while it completes.
On Wed, Apr 29, 2009 at 07:01:17PM +0800, Hairul Ikmal Mohamad Fuzi wrote:
Hi all,
We are running CentOS 5.2 64bit as our file server. Currently, we used GFS (with CLVM underneath it) as our filesystem (for our multiple 2TB SAN volume exports) since we plan to add more file servers (serving the same contents) later on.
The issue we are facing at the moment is we found out that command such as 'ls' gives a very slow response.(e.g 3-4minutes for the outputs of ls to be printed out, or in certain cases, 20minutes or so) This is completely true especially in directories containing "large number" of small files (e.g 90000+ of 1-4kb files). The thing is, most of system users are generating these small files frequently as part of their workflow.
We tried emulating the same scenario (90000+ of small files) on a ext3 partition and it gives almost the same result.
This is likely related to the size of the "ls" process growing. To sort by date etc. "ls" pulls all the meta data into memory then reports.
I believe most of the CLVM/GFS settings done are using the defaults parameters. Additionally, we would prefer to stick to GFS (or at least ext3) as it is part of CentOS / RHEL distribution rather than changing into other small-files 'friendly' filesystems (such as XFS, ReiserFS).
I'm exploring whether is there anyway we can tune the GFS parameters to make the system more responsive?
With 'gobs' of files you may find that find, xargs and stat are the tools of choice.
I have read that we can apply 'dir_index' option to ext3 partition to speedup things, but I'm not so sure about GFS.
Do look at "ls" with strace, top or a debugger.
On Sun, 2009-05-03 at 06:09 -0700, Nifty Cluster Mitch wrote:
On Wed, Apr 29, 2009 at 07:01:17PM +0800, Hairul Ikmal Mohamad Fuzi wrote:
Hi all,
We are running CentOS 5.2 64bit as our file server. Currently, we used GFS (with CLVM underneath it) as our filesystem (for our multiple 2TB SAN volume exports) since we plan to add more file servers (serving the same contents) later on.
The issue we are facing at the moment is we found out that command such as 'ls' gives a very slow response.(e.g 3-4minutes for the outputs of ls to be printed out, or in certain cases, 20minutes or so) This is completely true especially in directories containing "large number" of small files (e.g 90000+ of 1-4kb files). The thing is, most of system users are generating these small files frequently as part of their workflow.
We tried emulating the same scenario (90000+ of small files) on a ext3 partition and it gives almost the same result.
This is likely related to the size of the "ls" process growing. To sort by date etc. "ls" pulls all the meta data into memory then reports.
I believe most of the CLVM/GFS settings done are using the defaults parameters. Additionally, we would prefer to stick to GFS (or at least ext3) as it is part of CentOS / RHEL distribution rather than changing into other small-files 'friendly' filesystems (such as XFS, ReiserFS).
I'm exploring whether is there anyway we can tune the GFS parameters to make the system more responsive?
With 'gobs' of files you may find that find, xargs and stat are the tools of choice.
I have read that we can apply 'dir_index' option to ext3 partition to speedup things, but I'm not so sure about GFS.
Do look at "ls" with strace, top or a debugger.
--- I don't know if this will help you but if there is more than 100,000 files you can use UPPERCASE or lowercase file naming. That will speeds things greatly under samba with over 200,000 images. Ahh I see you have 90,000. This is also workable in samba 4 with a GFS Cluster.
JohnStanley