[CentOS] systemd and 'Stale file handle' errors?

Fri May 14 10:44:52 UTC 2021
Simon Matter <simon.matter at invoca.ch>

> I have a CentOS 7 system where I needed to restart chronyd - but the
> systemctl restart failed with the error:
>
>  systemd[1]: Starting NTP client/server...
>  systemd[43578]: Failed at step NAMESPACE spawning /usr/sbin/chronyd:
> Stale file handle
>  systemd[1]: chronyd.service: control process exited, code=exited
> status=226
>
> Turns out there are a couple of Stale NFS file handles from fuse mounts
> (related to gvfsd) of sub directories under an NFS mounted home directory
> server - but the home directory for the user in this case, no longer exist
> (user has left)
>
> However, I have no idea why these 'Stale file handles' prevent a service
> being started by systemd ?
>
> In this case, chronyd has nothing to do with NFS mounted user home
> directories - so shouldn't really care ?
>
> I have tried everything I can think of to clear these stale mounts, but
> with no luck
>
> Does anyone know why systemd complains about unconnected 'Stale file
> handles' - and is there any way I can tell systemctl to start a service
> regardless of these 'errors' ?
>
> Rebooting the host will be a last resort (the system is used by many
> users) - but in the meantime, I've manually started the /usr/sbin/chronyd
> binary directly, which runs fine

We're running large multi user systems with desktop sessions on Red Hat
based systems for decades but it became increasingly painful after EL6
with the introduction of systemd in EL7. It may have improved the user
experience on developers laptops but for our use case things are worse
today...

Regards,
Simon