CentOS 6 NFS mmap I/O bug? - Discuss

22 Oct 2012


      I'm working with a company who is running into an issue occasionally
with their app running CentOS 6 on an NFS mount.  The problem is
essentially that, from a single CentOS 6 client, the client sometimes
gets the wrong file size back from a stat() call.
The problem specifically seems to happen after mmap and ftruncate
calls.  The former envionrment for the application was CentOS 4 where
is has run for many years with no apparent issues.  Here's an strace
output showing the problem:
13:10:38.270134 stat("trunk_file14", {st_mode=S_IFREG|0600,
st_size=36528128, ...}) = 0 <0.000019>
  13:10:38.270204 open("trunk_file14", O_RDWR|O_SYNC) = 3 <0.000440>
  13:10:38.270688 mmap(NULL, 36528128, PROT_READ|PROT_WRITE,
MAP_SHARED, 3, 0) = 0x7f57bade3000 <0.003543>
  13:10:38.275585 msync(0x7f57bade3000, 36527128, MS_SYNC) = 0 <0.000534>
  13:10:38.276176 munmap(0x7f57bade3000, 36528128) = 0 <0.000026>
  13:10:38.276240 ftruncate(3, 36527128)  = 0 <0.000463>
  13:10:38.276744 close(3)                = 0 <0.000020>
  13:10:38.276794 stat("trunk_file14", {st_mode=S_IFREG|0600,
st_size=36528128, ...}) = 0 <0.000020>
Note that the file size returned in the last stat() following the
close() returns the size of the file prior to the ftruncate (36528128,
but should be 36527128).  Normally, this final stat() will return the
correct file size, but every now and then we get this.  After waiting
a few seconds another stat to the file will show the expected file
size which would be 36527128 in this case.
The NFS client is using mount options "tcp,nfsvers=3".  The process is
also single threaded, and there's no expectation for the I/O to appear
to behave synchronously beyond more than a single NFS client.  We're
suspicious that the CentOS 6 NFS client is not caching properly.
Additional testing is happening now, but it also appears that it may
only happen when the memory map calls are in place ( mmap(),
memmove(), and msync() ).  We don't expect the files to
We have reproduced this issue with both CentOS 6.0 and 6.3 on multiple
NFS servers from different vendors.
Any help is appreciated.  I've attached a small C program I've hacked
together to reproduce the problem in case anyone is interested.
Thanks,
Tom