On 17 December 2015 at 11:12, Sylvain CANOINE sylvain.canoine@tv5monde.org wrote:
Hello all,
I updated two of my servers to CentOS 7.2 (1511) two days ago, and since, on one of them, the network services are started (and fail to start) before the network interfaces are online.
Parts of "journalctl" after the last reboot :
déc. 17 10:21:44 myserver kernel: NET: Registered protocol family 40 déc. 17 10:21:45 myserver sshd[700]: error: Bind to port 22 on 172.20.XX.XX failed: Cannot assign requested address. déc. 17 10:21:45 myserver sshd[700]: fatal: Cannot bind any address. déc. 17 10:21:45 myserver systemd[1]: sshd.service: main process exited, code=exited, status=255/n/a déc. 17 10:21:45 myserver systemd[1]: Unit sshd.service entered failed state. déc. 17 10:21:45 myserver systemd[1]: sshd.service failed. déc. 17 10:21:45 myserver sssd[729]: Starting up déc. 17 10:21:45 myserver kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 max) déc. 17 10:21:34 myserver systemd[1]: Time has been changed déc. 17 10:21:35 myserver iptables.init[699]: iptables: Applying firewall rules: [ OK ] déc. 17 10:21:35 myserver systemd[1]: Started IPv4 firewall with iptables. déc. 17 10:21:35 myserver systemd[1]: Starting LSB: Bring up/down networking... déc. 17 10:21:35 myserver network[790]: Activation de l'interface loopback : [ OK ] déc. 17 10:21:36 myserver httpd[686]: (99)Cannot assign requested address: AH00072: make_sock: could not bind to address 172.19.XX.XX:443 déc. 17 10:21:36 myserver httpd[686]: no listening sockets available, shutting down déc. 17 10:21:36 myserver httpd[686]: AH00015: Unable to open logs déc. 17 10:21:36 myserver systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE déc. 17 10:21:36 myserver kernel: vmxnet3 0000:03:00.0 ens160: intr type 3, mode 0, 2 vectors allocated déc. 17 10:21:36 myserver kernel: vmxnet3 0000:03:00.0 ens160: NIC Link is Up 10000 Mbps déc. 17 10:21:36 myserver kill[924]: kill: cannot find process "" déc. 17 10:21:36 myserver systemd[1]: httpd.service: control process exited, code=exited status=1 déc. 17 10:21:36 myserver systemd[1]: Failed to start The Apache HTTP Server. déc. 17 10:21:36 myserver systemd[1]: Unit httpd.service entered failed state. déc. 17 10:21:36 myserver systemd[1]: httpd.service failed. déc. 17 10:21:36 myserver postfix/postfix-script[959]: starting the Postfix mail system déc. 17 10:21:36 myserver postfix/master[961]: daemon started -- version 2.10.1, configuration /etc/postfix déc. 17 10:21:36 myserver systemd[1]: Started Postfix Mail Transport Agent. déc. 17 10:21:36 myserver snmpd[704]: Turning on AgentX master support. déc. 17 10:21:36 myserver snmpd[704]: Error opening specified endpoint "udp:172.19.XX.XX:161" déc. 17 10:21:36 myserver snmpd[704]: Server Exiting with code 1 déc. 17 10:21:36 myserver systemd[1]: snmpd.service: main process exited, code=exited, status=1/FAILURE déc. 17 10:21:36 myserver systemd[1]: Failed to start Simple Network Management Protocol (SNMP) Daemon.. déc. 17 10:21:36 myserver systemd[1]: Unit snmpd.service entered failed state. déc. 17 10:21:36 myserver systemd[1]: snmpd.service failed. (...) déc. 17 10:21:38 myserver network[790]: Activation de l'interface ens160 : [ OK ] déc. 17 10:21:38 myserver kernel: vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 2 vectors allocated déc. 17 10:21:38 myserver kernel: vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps déc. 17 10:21:39 myserver ntpd[694]: Listen normally on 1 ens160 172.19.XX.XX UDP 123 déc. 17 10:21:39 myserver ntpd[694]: new interface(s) found: waking up resolver déc. 17 10:21:40 myserver ntpd[694]: 0.0.0.0 c61c 0c clock_step +11.002914 s déc. 17 10:21:51 myserver ntpd[694]: 0.0.0.0 c614 04 freq_mode déc. 17 10:21:51 myserver systemd[1]: Time has been changed déc. 17 10:21:51 myserver network[790]: Activation de l'interface ens192 : [ OK ] déc. 17 10:21:51 myserver systemd[1]: Started LSB: Bring up/down networking. déc. 17 10:21:51 myserver systemd[1]: Reached target Network is Online. déc. 17 10:21:51 myserver systemd[1]: Starting Network is Online. déc. 17 10:21:51 myserver systemd[1]: Reached target Multi-User System. déc. 17 10:21:51 myserver systemd[1]: Starting Multi-User System. déc. 17 10:21:51 myserver systemd[1]: Starting Update UTMP about System Runlevel Changes... déc. 17 10:21:51 myserver systemd[1]: Started Stop Read-Ahead Data Collection 10s After Completed Startup. déc. 17 10:21:51 myserver systemd[1]: Started Update UTMP about System Runlevel Changes. déc. 17 10:21:51 myserver systemd[1]: Startup finished in 650ms (kernel) + 2.623s (initrd) + 13.647s (userspace) = 16.922s.
Well it looks like you are using the network service rather than the recommended NetworkManager ...
The network service is not blocking the flow so it executes and systemd carries on ...
From the point of view of the system as soon as /etc/init.d/network start
has been called the service is running as a state... as you can see from your logs lots of other services also start before the network interface itself is up.
There's a few of different ways of accomplishing what you want ...
Keep in mind that you must not edit files in /usr/lib/systemd/ if you want to maintain your sanity for future updates... use overrides in /etc/systemd/system/foo.service.d
The real reason httpd/sshd/snmpd failed there is that unlike the default configuration of these you aren't listening on all addresses (:: or 0.0.0.0) but on a specific 172.X address ... which isn't present until the network adaptor is up and configured.
So how to solve this...
1) Have the services bind on :: (or 0.0.0.0) rather than a specific IP like the default configuration so that they are not dependent on the network being up with a specific IP on the interface 2) Set the sysctl ip_nonlocal_bind so that the services can bind to IPs not yet on the system (if it's using a systemd socket you can override with FreeBind for that socket rather than set this globally) 3) Provide overrides for each service to order it after network-online.target (which is effectively when the non-local IP address can be found on the interface) as per the systemd.special man page documenting this.
Look at man systemd.special for more detail on this ...
Incidentally I just tried a quick test in a VM and it would appear NetworkManager.service completed with an IP on the network interface before network.target was considered reached ... you may want to test this on your system to see if it's a race condition or it actually works out that way for you as a systemctl cat NetworkManager indicates it should be before network and it looks like it may block progress until it's on dbus ...