[CentOS] Network services start before network is up since migrating to 7.2

Thu Dec 17 13:12:48 UTC 2015
James Hogarth <james.hogarth at gmail.com>

On 17 December 2015 at 11:12, Sylvain CANOINE <sylvain.canoine at tv5monde.org>
wrote:

> Hello all,
>
> I updated two of my servers to CentOS 7.2 (1511) two days ago, and since,
> on one of them, the network services are started (and fail to start) before
> the network interfaces are online.
>
> Parts of "journalctl" after the last reboot :
>
> déc. 17 10:21:44 myserver kernel: NET: Registered protocol family 40
> déc. 17 10:21:45 myserver sshd[700]: error: Bind to port 22 on
> 172.20.XX.XX failed: Cannot assign requested address.
> déc. 17 10:21:45 myserver sshd[700]: fatal: Cannot bind any address.
> déc. 17 10:21:45 myserver systemd[1]: sshd.service: main process exited,
> code=exited, status=255/n/a
> déc. 17 10:21:45 myserver systemd[1]: Unit sshd.service entered failed
> state.
> déc. 17 10:21:45 myserver systemd[1]: sshd.service failed.
> déc. 17 10:21:45 myserver sssd[729]: Starting up
> déc. 17 10:21:45 myserver kernel: nf_conntrack version 0.5.0 (16384
> buckets, 65536 max)
> déc. 17 10:21:34 myserver systemd[1]: Time has been changed
> déc. 17 10:21:35 myserver iptables.init[699]: iptables: Applying firewall
> rules: [ OK ]
> déc. 17 10:21:35 myserver systemd[1]: Started IPv4 firewall with iptables.
> déc. 17 10:21:35 myserver systemd[1]: Starting LSB: Bring up/down
> networking...
> déc. 17 10:21:35 myserver network[790]: Activation de l'interface loopback
> : [ OK ]
> déc. 17 10:21:36 myserver httpd[686]: (99)Cannot assign requested address:
> AH00072: make_sock: could not bind to address 172.19.XX.XX:443
> déc. 17 10:21:36 myserver httpd[686]: no listening sockets available,
> shutting down
> déc. 17 10:21:36 myserver httpd[686]: AH00015: Unable to open logs
> déc. 17 10:21:36 myserver systemd[1]: httpd.service: main process exited,
> code=exited, status=1/FAILURE
> déc. 17 10:21:36 myserver kernel: vmxnet3 0000:03:00.0 ens160: intr type
> 3, mode 0, 2 vectors allocated
> déc. 17 10:21:36 myserver kernel: vmxnet3 0000:03:00.0 ens160: NIC Link is
> Up 10000 Mbps
> déc. 17 10:21:36 myserver kill[924]: kill: cannot find process ""
> déc. 17 10:21:36 myserver systemd[1]: httpd.service: control process
> exited, code=exited status=1
> déc. 17 10:21:36 myserver systemd[1]: Failed to start The Apache HTTP
> Server.
> déc. 17 10:21:36 myserver systemd[1]: Unit httpd.service entered failed
> state.
> déc. 17 10:21:36 myserver systemd[1]: httpd.service failed.
> déc. 17 10:21:36 myserver postfix/postfix-script[959]: starting the
> Postfix mail system
> déc. 17 10:21:36 myserver postfix/master[961]: daemon started -- version
> 2.10.1, configuration /etc/postfix
> déc. 17 10:21:36 myserver systemd[1]: Started Postfix Mail Transport Agent.
> déc. 17 10:21:36 myserver snmpd[704]: Turning on AgentX master support.
> déc. 17 10:21:36 myserver snmpd[704]: Error opening specified endpoint
> "udp:172.19.XX.XX:161"
> déc. 17 10:21:36 myserver snmpd[704]: Server Exiting with code 1
> déc. 17 10:21:36 myserver systemd[1]: snmpd.service: main process exited,
> code=exited, status=1/FAILURE
> déc. 17 10:21:36 myserver systemd[1]: Failed to start Simple Network
> Management Protocol (SNMP) Daemon..
> déc. 17 10:21:36 myserver systemd[1]: Unit snmpd.service entered failed
> state.
> déc. 17 10:21:36 myserver systemd[1]: snmpd.service failed.
> (...)
> déc. 17 10:21:38 myserver network[790]: Activation de l'interface ens160 :
> [ OK ]
> déc. 17 10:21:38 myserver kernel: vmxnet3 0000:0b:00.0 ens192: intr type
> 3, mode 0, 2 vectors allocated
> déc. 17 10:21:38 myserver kernel: vmxnet3 0000:0b:00.0 ens192: NIC Link is
> Up 10000 Mbps
> déc. 17 10:21:39 myserver ntpd[694]: Listen normally on 1 ens160
> 172.19.XX.XX UDP 123
> déc. 17 10:21:39 myserver ntpd[694]: new interface(s) found: waking up
> resolver
> déc. 17 10:21:40 myserver ntpd[694]: 0.0.0.0 c61c 0c clock_step +11.002914
> s
> déc. 17 10:21:51 myserver ntpd[694]: 0.0.0.0 c614 04 freq_mode
> déc. 17 10:21:51 myserver systemd[1]: Time has been changed
> déc. 17 10:21:51 myserver network[790]: Activation de l'interface ens192 :
> [ OK ]
> déc. 17 10:21:51 myserver systemd[1]: Started LSB: Bring up/down
> networking.
> déc. 17 10:21:51 myserver systemd[1]: Reached target Network is Online.
> déc. 17 10:21:51 myserver systemd[1]: Starting Network is Online.
> déc. 17 10:21:51 myserver systemd[1]: Reached target Multi-User System.
> déc. 17 10:21:51 myserver systemd[1]: Starting Multi-User System.
> déc. 17 10:21:51 myserver systemd[1]: Starting Update UTMP about System
> Runlevel Changes...
> déc. 17 10:21:51 myserver systemd[1]: Started Stop Read-Ahead Data
> Collection 10s After Completed Startup.
> déc. 17 10:21:51 myserver systemd[1]: Started Update UTMP about System
> Runlevel Changes.
> déc. 17 10:21:51 myserver systemd[1]: Startup finished in 650ms (kernel) +
> 2.623s (initrd) + 13.647s (userspace) = 16.922s.
>
>
Well it looks like you are using the network service rather than the
recommended NetworkManager ...

The network service is not blocking  the flow so it executes and systemd
carries on ...

>From the point of view of the system as soon as /etc/init.d/network start
has been called the service is running as a state... as you can see from
your logs lots of other services also start before the network interface
itself is up.

There's a few of different ways of accomplishing what you want ...

Keep in mind that you must not edit files in /usr/lib/systemd/ if you want
to maintain your sanity for future updates... use overrides in
/etc/systemd/system/foo.service.d

The real reason httpd/sshd/snmpd failed there is that unlike the default
configuration of these you aren't listening on all addresses (:: or
0.0.0.0) but on a specific 172.X address ... which isn't present until the
network adaptor is up and configured.

So how to solve this...

1) Have the services bind on :: (or 0.0.0.0) rather than a specific IP like
the default configuration so that they are not dependent on the network
being up with a specific IP on the interface
2) Set the sysctl ip_nonlocal_bind so that the services can bind to IPs not
yet on the system (if it's using a systemd socket you can override
with FreeBind for that socket rather than set this globally)
3) Provide overrides for each service to order it after
network-online.target (which is effectively when the non-local IP address
can be found on the interface) as per the systemd.special man page
documenting this.

Look at man systemd.special for more detail on this ...

Incidentally I just tried a quick test in a VM and it would appear
NetworkManager.service completed with an IP on the network interface before
network.target was considered reached ... you may want to test this on your
system to see if it's a race condition or it actually works out that way
for you as a systemctl cat NetworkManager indicates it should be before
network and it looks like it may block progress until it's on dbus ...