We have a number of identical NFS clients mounting a server using
NFSv4.1 - server and clients are all running CentOS 7.5 (kernel
3.10.0-862.14.4.el7.x86_64)
However, on some clients, the NFS performance 'degrades' with time ...
Running a simple test - a python script that just imports a module
(python and its modules are installed on the NFS share) can be an order
of magnitude or more slower on some clients. i.e. very little data is
transferred, it is the rate of stat'ing and opening files on the NFS
server that is 'slow'
Running a tcpdump on a 'slow' client shows that the NFS traffic
generated on the 'slow' client is again an order of magnitude or more
when compared with that generated by a 'fast' client
The majority of the extra NFS traffic in the slow case, appears to be a
large number of NFS 'TEST_STATEID' calls the client makes - which are
not there in the tcpdump on the fast client
The issue can be 'fixed' in the short term by rebooting the affected
client - and after a reboot, running the same tcpdump shows no
TEST_STATEID calls - however after a while (several days), the
performance might degrade again
I've found a number of reports of excessive TEST_STATEID calls - but
most seem to relate to NFSv4 client hangs - which is not happening here
- things are working, but much slower than they should be ...
Has anyone come across this issue - and have any fixes/workarounds?
Thanks
James Pearson