I'm working with a company who is running into an issue occasionally with their app running CentOS 6 on an NFS mount. The problem is essentially that, from a single CentOS 6 client, the client sometimes gets the wrong file size back from a stat() call.
The problem specifically seems to happen after mmap and ftruncate calls. The former envionrment for the application was CentOS 4 where is has run for many years with no apparent issues. Here's an strace output showing the problem:
13:10:38.270134 stat("trunk_file14", {st_mode=S_IFREG|0600, st_size=36528128, ...}) = 0 <0.000019> 13:10:38.270204 open("trunk_file14", O_RDWR|O_SYNC) = 3 <0.000440> 13:10:38.270688 mmap(NULL, 36528128, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f57bade3000 <0.003543> 13:10:38.275585 msync(0x7f57bade3000, 36527128, MS_SYNC) = 0 <0.000534> 13:10:38.276176 munmap(0x7f57bade3000, 36528128) = 0 <0.000026> 13:10:38.276240 ftruncate(3, 36527128) = 0 <0.000463> 13:10:38.276744 close(3) = 0 <0.000020> 13:10:38.276794 stat("trunk_file14", {st_mode=S_IFREG|0600, st_size=36528128, ...}) = 0 <0.000020>
Note that the file size returned in the last stat() following the close() returns the size of the file prior to the ftruncate (36528128, but should be 36527128). Normally, this final stat() will return the correct file size, but every now and then we get this. After waiting a few seconds another stat to the file will show the expected file size which would be 36527128 in this case.
The NFS client is using mount options "tcp,nfsvers=3". The process is also single threaded, and there's no expectation for the I/O to appear to behave synchronously beyond more than a single NFS client. We're suspicious that the CentOS 6 NFS client is not caching properly.
Additional testing is happening now, but it also appears that it may only happen when the memory map calls are in place ( mmap(), memmove(), and msync() ). We don't expect the files to
We have reproduced this issue with both CentOS 6.0 and 6.3 on multiple NFS servers from different vendors.
Any help is appreciated. I've attached a small C program I've hacked together to reproduce the problem in case anyone is interested.
Thanks, Tom