[CentOS] NFS/RDMA connection closed

Thu Aug 2 20:11:16 UTC 2018
admin at genome.arizona.edu <admin at genome.arizona.edu>

Hi I also forgot to add the following information which was discussed on 
NFS mailing list with Chuck Lever, leading us to believe there is a 
software bug in the kernel, not necessarily a server overload.

On the NFS server, we also mount some other NFS shares from other NFS 
servers, over 1GbE:
150.x.x.116:/wing on /wing type nfs (rw,addr=150.x.x.116) on /opt/ftproot type nfs 
150.x.x.202:/archive on /archive type nfs 

This hangup/bug seems to occur when we are reading/writing to these 
other shares from the NFS server and the NFS server is also busy 
processing our work from the cluster using the RDMA exports.  There used 
to be two other NFS mounts, which were used to send/write backups to, 
and were scheduled every night at 8PM.  I noticed the RDMA errors from 
my original post were all showing up shortly after 8PM.  So we decided 
to get rid of these NFS mounts and convert the backup to transfer via 
SSH instead.  The RDMA errors stopped happening after 8PM when the 
backup ran, but now the errors are still showing up, when we are 
reading/writing to the other NFS mounts above that we still need.

It seems we should be able to use these different mounts and exports 
without issue, leading us to believe there is a software bug somewhere.

Are there any other suggested solutions to this problem?  Perhaps some 
system, network and/or filesystem tuning?  Any comments on adding the 
"inode64,nobarrier" XFS mount options?  Any extra information I can 
gather to help with a bug report?  Debug info or whatnot?