[CentOS] NFS mount on Centos 7 crashing

Fri Sep 22 11:58:17 UTC 2017
Nikolaos Milas <nmilas at noa.gr>

On 2/6/2017 1:46 μμ, Nikolaos Milas wrote:

> After a bit of search, I found the associated reports:
>
> https://bugs.centos.org/view.php?id=13351
> https://bugzilla.redhat.com/show_bug.cgi?id=1454876
>
> No solution yet, but -as a workaround- it seems that -at least- nfs 
> problems are indeed solved with downgrading.

I have been working fine with CentOS 7.3, since I downgraded to 
rpcbind-0.2.0-38.el7.x86_64.

Today, I decided to upgrade to 7.4 (which, among several hundred 
updates, includes rpcbind-0.2.0-42.el7.x86_64); after that I have 
started having similar NFS issues again: NFS communication hungs. In 
/var/log/messages:

-----------------------------------------------------------------------------------------
...
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered named UNIX socket 
transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered udp transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp transport module.
Sep 22 11:03:21 hesperia1 kernel: RPC: Registered tcp NFSv4.1 
backchannel transport module.
Sep 22 11:03:21 hesperia1 systemd-udevd: starting version 219
Sep 22 11:03:21 hesperia1 systemd: Started Configure read-only root support.
Sep 22 11:03:21 hesperia1 kernel: Installing knfsd (copyright (C) 1996 
okir at monad.swb.de).
Sep 22 11:03:21 hesperia1 systemd: Mounted NFSD configuration filesystem.
...
Sep 22 11:03:27 hesperia1 systemd: Mounting /mnt/dd2500-1...
Sep 22 11:03:27 hesperia1 systemd: Starting Notify NFS peers of a restart...
Sep 22 11:03:27 hesperia1 sm-notify[948]: Version 1.3.0 starting
Sep 22 11:03:27 hesperia1 systemd: Started Notify NFS peers of a restart.
Sep 22 11:03:27 hesperia1 systemd: Started OpenSSH server daemon.
Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Loaded
Sep 22 11:03:27 hesperia1 kernel: FS-Cache: Netfs 'nfs' registered for 
caching
Sep 22 11:03:27 hesperia1 systemd: Mounted /mnt/dd2500-1.
Sep 22 11:03:27 hesperia1 systemd: Reached target Remote File Systems.
Sep 22 11:03:27 hesperia1 systemd: Starting Remote File Systems.
...
Sep 22 11:11:16 hesperia1 kernel: nfs: server 10.201.40.34 not 
responding, still trying
...
Sep 22 11:20:44 hesperia1 kernel: nfs: server 10.201.40.34 not 
responding, still trying
...
-----------------------------------------------------------------------------------------

I tried downgrading to rpcbind-0.2.0-38.el7.x86_64 but this time it 
didn't help.

I mount either directly:

   mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 
-t nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2

or through /etc/fstab:

   10.201.40.34:/data/col1/hesperia-mount   /hesperiamount2   nfs 
auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0

The box may even hung during reboot, which has never happened in the past.

It needs a hard reboot (via VM admin console) to boot again.

I have confirmed the above behavior multiple times.

Please advise me on how to resolve this situation. We are very much 
dependent on NFS mounts.

Is it a known bug? (As far as I could search, I didn't came up with 
something.)

The earlier bug report appears resolved: 
https://bugzilla.redhat.com/show_bug.cgi?id=1454876

Can I safely/easily revert to 7.3?

Thanks in advance,
Nick