[CentOS] ext3 heavy file fragmentation with NFS write

Wed Mar 4 21:00:09 UTC 2009
Nifty Cluster Mitch <niftycluster at niftyegg.com>

On Fri, Feb 27, 2009 at 08:31:01AM +0100, Andrzej Szymański wrote:
> Does anybody know how to avoid the file fragmentation when a file is 
> created over NFSv3?
> A file created locally is OK:
> dd bs=32k if=/dev/zero of=test count=32x1024 conv=fsync
> filefrag test
> test: 10 extents found, perfection would be 9 extents
> When I create the file in the same dir, but from another machine, 
> mounted over NFS:
> filefrag test
> test: 4833 extents found, perfection would be 9 extents
> With such a file a sequential read is quite slow (~76MB vs >200MB on my 
> raid card).
> I can just suspect that this is a problem of block allocation when the 
> same file is appended by different processes (8 NFS threads).
> I've tried mounting ext3 with -o reservation and switch to NFS over TCP, 
> with no improvement.
> Both systems are Centos 5.2 with kernel 2.6.18-92.1.22.el5
> The ext3 is mounted with rw,nosuid,nodev,usrquota,grpquota,acl
> NFS export: rw,sync,no_root_squash
> 8 NFS threads.
> Remotely mounted with options 
> rw,intr,nfsvers=3,proto=udp,rsize=32768,wsize=32768
> I would be very grateful for any help.
> Andrzej

First watch out for comparing sparse files and real files.

	dd bs=32k if=/dev/zero of=test count=32x1024 conv=fsync

Note that dev/zero combined with dd may be building a sparse file (or not)
Sparse file block allocation is very different.
I would build up a large file of binary data and dd it into test
having been bitten by sparse file filesystem tricks.

Also a local filesystem can have a very different free list
than your NFS file system's underlying FS.  You need to do
the comparison on the exact same filesystem with the only
difference being that one case is local and the other NFS.
If I run your dd on my /tmp I get 18 extents while on /var/tmp
I get 582 extents. Both are local to this system.  So 18 local
and 582 local tells me that you must test exactly the same FS
with the only difference is that the creation was local .vs. NFS.

All in all this is a don't care -- extents are not exactly equivalent to
disk seeks and other disk I/O issues. 

Some of this can be improved only if you rebuild the file system. mkfs
has a lot of flags and choices...  You might also need to switch
filesystems -- xfs, ext2, ext3, ext4, jfs, reiser...  

To some extent if you make an ideal local copy of a badly fragmented
file you can improve the layout on disk/ filesystem.   This should only
be considered for very long lived very large files. Making a copy and
comparing the original and copy with filefrag can tell you if this is
worth doing.  Backup and restore can help.  As a filesystem gets full
this will get worse and worse.  If you are +60% full do not bother.

	T o m  M i t c h e l l 
	Found me a new hat, now what?