[CentOS] how does autofs deal with stuck NFS mounts and suspending to RAM?

Thu May 21 02:55:29 UTC 2020
Orion Poplawski <orion at nwra.com>

On 5/18/20 5:13 AM, hw wrote:
> Hi,
> 
> after trying sshfs to mount a remote file system on a server with the result
> that sshfs will sooner or later get stuck and require a reboot of the client,
> I'm fed up with it and am looking for alternatives.
> 
> So next I would like to use NFS over a VPN connection instead.  To minimize
> the instances of the NFS mount getting stuck, it might be helpful to use
> autofs.
> 
> What happens when the mount is stuck because the connection is down and autofs
> figures the idle timeout has expired and tries to unmount the remote file
> system?

Nothing good, and bad things happen before this.

> What happens when I put the client to sleep by suspending to RAM?  Will autofs
> automatically unmount first, or will the server have to deal with a client
> that has apparently gone away and might re-appear later in unexpected ways?

This is the mechanism that I use to try to mitigate this on our systems:

This triggers on suspend type events:

# cat /etc/systemd/system/suspend.target.wants/offnet.service
[Unit]
Description=Unmount all NFS mounts before disconnecting from network
Before=systemd-hibernate.service
Before=systemd-shutdown.service
Before=systemd-suspend.service

[Service]
ExecStart=/usr/local/sbin/offnet
Type=oneshot

[Install]
WantedBy=hibernate.target
WantedBy=shutdown.target
WantedBy=suspend.target

----

This triggers when you bring down a vpn connection with NetworkManager:

# cat /etc/NetworkManager/dispatcher.d/pre-down.d/autofs
#!/bin/bash

if [ -x /usr/bin/logger ]; then
   LOGGER="/usr/bin/logger -s -p user.notice -t $0"
else
   LOGGER=echo
fi

[ -z "${DEVICE_IP_IFACE}" ] && exit

# Unmount NFS and shutdown autofs if we are shutting down the last 
ethernet device or exiting vpn
if [ "$(/usr/bin/nmcli --terse --fields 'device,type' c show --active | 
grep -v "^${DEVICE_IP_IFACE}:" | grep -c :802-)" -eq 0 -o \
      "${DEVICE_IP_IFACE}" = tun0 ]; then
   $LOGGER "Unmounting NFS/CIFS directories"
   /usr/local/sbin/offnet
   $LOGGER "Performing autofs pre-down stop"
   systemctl stop autofs.service
fi

----

# cat /usr/local/sbin/offnet
#!/bin/bash
. /etc/init.d/functions

# __umount_loop awk_program fstab_file first_msg retry_msg retry_umount_args
# awk_program should process fstab_file and return a list of fstab-encoded
# paths; it doesn't have to handle comments in fstab_file.
__umount_loop() {
         local remaining sig=
         local retry=3 count

         remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r)
         while [ -n "$remaining" -a "$retry" -gt 0 ]; do
                 if [ "$retry" -eq 3 ]; then
                         action "$3" umount $remaining
                 else
                         action "$4" umount $5 $remaining
                 fi
                 count=4
                 remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r)
                 while [ "$count" -gt 0 ]; do
                         [ -z "$remaining" ] && break
                         count=$(($count-1))
                         usleep 500000
                         remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" 
| sort -r)
                 done
                 [ -z "$remaining" ] && break
                 kill $sig $(/sbin/fuser -m $remaining 2>/dev/null  | 
sed -e "s/\b$$\b//g") > /dev/null
                 sleep 3
                 retry=$(($retry -1))
                 sig=-9
         done
}

__umount_loop '$3 ~ /^nfs/ && $3 != "nfsd" && $2 != "/" {print $2}' \
     /proc/mounts \
     $"Unmounting NFS filesystems: " \
     $"Unmounting NFS filesystems (retry): " \
     "-f -l"

__umount_loop '$3 ~ /^cifs/ && $2 != "/" {print $2}' \
     /proc/mounts \
     $"Unmounting CIFS filesystems: " \
     $"Unmounting CIFS filesystems (retry): " \
     "-f -l"

> Is there a way to tell NFS to retry an operation _now_ after the connection
> went down and came back, rather than having to wait for a possibly rather long
> time?

Not that I'm aware of.

> Is there a better alternative for mounting remote file systems over unreliable
> connections?

I would second the recommendation for SMBv3/CIFS for a fault tolerant 
remote file system.

-- 
Orion Poplawski
Manager of NWRA Technical Systems          720-772-5637
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                 https://www.nwra.com/