[CentOS-devel] Vagrant centos/7 v1811.01 breaks nfs at boot

Fri Jan 11 21:19:21 UTC 2019
John-Paul Robinson <jprorama at gmail.com>

Hi folks,

It seems that the latest vagrant box for centos/7 is breaking nfs mounts 
at boot.   My vagrant project has an nfs server and client node.  The 
client has two nfs mount in fstab. 
https://gitlab.rc.uab.edu/jpr/ohpc_vagrant

Using the image from the prior release (1804.02, kernel 
3.10.0-862.2.3.el7.x86_64, CentOS Linux release 7.5.1804 (Core) ) the 
mounts complete successfully.  The newest releases (1811.02 and 1809.01) 
fail to mount the drives at boot.

I believe the differences between the older VMs using XFS and volume 
manager vs direct device and ext4 in the newer images are changing the 
boot timings of services.  The messages.log shows that the network is 
unavailable at the point which the nfs mounts are tried on the failing 
nodes. The wait service doesn't appear to work right or wait long enough 
to allow the network to come up first.  It seems that the boot for the 
direct use of the sda1 and use of ext4 simply initializes too fast.

Here's the top of the systemd blame info for the box that succeeds in 
it's nfs mounts at boot (v1804.02).  Note that the boot times are 
dominated by the network manager wait service:

[vagrant at ood ~]$ systemd-analyze blame
           2.970s NetworkManager-wait-online.service
           1.511s tuned.service
           1.272s postfix.service
            594ms httpd24-httpd.service
            538ms lvm2-monitor.service
            474ms opt-ohpc-pub.mount
            471ms home.mount
            450ms auditd.service
            444ms dev-mapper-VolGroup00\x2dLogVol00.device
            380ms boot.mount
            343ms network.service
            216ms munge.service
            206ms NetworkManager.service
            177ms chronyd.service
            157ms polkit.service
            149ms sshd.service
            148ms systemd-logind.service
            134ms slurmd.service
            114ms gssproxy.service
            112ms lvm2-pvscan at 8:3.service
            112ms rpc-statd.service
            112ms systemd-udev-trigger.service
            111ms rsyslog.service
            110ms rhel-readonly.service
            105ms rhel-dmesg.service
             91ms systemd-vconsole-setup.service
             84ms systemd-tmpfiles-setup-dev.service
             74ms systemd-tmpfiles-clean.service
             72ms dev-mapper-VolGroup00\x2dLogVol01.swap
             66ms kmod-static-nodes.service
             65ms rhel-domainname.service
             65ms rpc-statd-notify.service
             ...


Here's the top of  systemd blame info for the box that is failing to 
mount nfs (v1811.02), network manager barely waits:

[vagrant at ood ~]$ systemd-analyze blame
           1.811s dev-sda1.device
           1.737s tuned.service
           1.594s postfix.service
            785ms httpd24-httpd.service
            378ms systemd-vconsole-setup.service
            295ms slurmd.service
            291ms network.service
            286ms home.mount
            266ms auditd.service
            253ms opt-ohpc-pub.mount
            250ms NetworkManager-wait-online.service
            208ms systemd-udev-trigger.service
            200ms polkit.service
            188ms systemd-tmpfiles-setup-dev.service
            180ms sshd.service
            177ms chronyd.service
            175ms rhel-readonly.service
            146ms rhel-dmesg.service
            145ms munge.service
            144ms gssproxy.service
            141ms rpcbind.service
            139ms swapfile.swap
            112ms rhel-domainname.service
            102ms systemd-udevd.service
            101ms systemd-journald.service
             91ms rpc-statd.service
             78ms rsyslog.service
             70ms var-lib-nfs-rpc_pipefs.mount
             69ms systemd-tmpfiles-setup.service
             61ms rpc-statd-notify.service
             58ms systemd-journal-flush.service
             58ms systemd-sysctl.service
             ...

I have the boot svd graphs that tell a similar story on the sequencing 
of service startup.

Is there a way to add delay to the network manager wait or otherwise 
influence the boot configuration to ensure the NFS drives mount 
correctly at boot?

Thanks,

John-Paul