[CentOS] CentOS 7, systemd, NetworkMangler, oh, my

Mon Feb 13 18:29:30 UTC 2017
m.roth at 5-cent.us <m.roth at 5-cent.us>

James Hogarth wrote:
> On 13 February 2017 at 15:35,  <m.roth at 5-cent.us> wrote:
>> My manager tells me a system in the datacenter is down. I go down there,
>> and plug in a monitor-on-a-stick and keyboard. It's up, but no network.
>> I
>> try systemctl restart NetworkManager several times, and ip a shows *no*
>> change.
>>
>> Finally, I do an ifdown, followed by an ifup, and everything's
>> wonderful.
>>
>> My manager thinks that the NM daemon thinks everything's fine, and
>> there've been no changes, so it does nothing. He suggests that it might
>> have to be stopped, then started, rather than restarted.
>>
>> This is completely unacceptable behavior, since it leave the system with
>> no network connection. Pre-systemd, as we all know, restart *RESTARTED*
>> the damn thing.
>>
>> Is there some Magic (#insert "pixie-dust-sparkles") incantation, either
>> restarting NetworkManager, or using nm-cli, to force it to perform the
>> expected actions?
>>
>
>
> I'd be interested in the journal from the NetworkManager restart as
> that's not the way it behaves ... it uses the netlink API to get state
> and not it's own internal tracker of state (ie doing an ip link down
> will reflect in nmcli output) ... a restart of NetworkManager should
> not ignore interfaces but rather bring the system to the on disk
> configured state ... and a quick check it doesn't override ExecRestart
> in the unit file to do a reload or similar instead ...
>
> And indeed a quick test in a VM shows nmcli device status correctly
> changing between connected and unavailable when doing ip link set eth0
> down/up
>
> Do note that on a NM based system ifup and ifdown are effectively
> aliases to nmcli conn down and nmcli conn up
>
> nmcli conn down "connection name" will make it disconnected
> nmcli conn up "connection mame" will bring it back to connected
>
> there is a slight interesting difference between using nmcli and ip
> link set though ...
>
> with ip link set down <interface> the interface is marked
> administratively down (as if you've pulled the cable) but nmcli conn
> down "connection name" will unconfigure the interface but leave it in
> an UP state ... just without an IP address etc
>
> anyway that's just an interesting diversion on behavioural differences
>
> NM won't change an interface state without some sort of event though
> (manual or virtual cable pulled etc), and if you have a case where it
> *has* done that then you have found a bug that would be great to get
> reported
>
> TL;DR: cannot reproduce, need logs to determine what happened without
> a working crystal ball

>From journalctl, I see this happening when I do systemctl restart
NetworkManager (much edited)
Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.7755] manager: (em1): new Ethernet device
(/org/freedesktop/NetworkManager/Devi
Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.7791] ifcfg-rh: add connection in-memory
(79d3ed9d-cc41-498c-9169-44320e332f68,
Feb 13 09:47:52 <servername> systemd[1]: Started Hostname Service.
Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.7797] device (em1): state change: unmanaged -> unavailable
(reason 'connection-
Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.7805] device (em1): state change: unavailable -> disconnected
(reason 'connecti
<...>
eb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.7986] device (em1): state change: disconnected -> prepare
(reason 'none') [30 4
Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.7999] policy: set 'em1' (em1) as default for IPv6 routing and
DNS
Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.8027] device (em1): state change: prepare -> config (reason
'none') [40 50 0]
Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> 
[1486997272.8034] device (em1): state change: config -> ip-config (reason
'none') [50 70 0]
Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> 
[1486997273.3594] device (em1): state change: ip-config -> ip-check
(reason 'none') [70 80
Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> 
[1486997273.3661] device (em1): state change: ip-check -> secondaries
(reason 'none') [80 9
Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> 
[1486997273.3666] device (em1): state change: secondaries -> activated
(reason 'none') [90
Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> 
[1486997273.3667] manager: NetworkManager state is now CONNECTED_GLOBAL
Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> 
[1486997273.3670] manager: NetworkManager state is now CONNECTED_SITE
Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> 
[1486997273.3670] manager: NetworkManager state is now CONNECTED_GLOBAL
Feb 13 09:47:53 <servername> nm-dispatcher[67317]: req:2
'connectivity-change': new request (6 scripts)
Feb 13 09:47:53 <servername> nm-dispatcher[67317]: req:2
'connectivity-change': start running ordered scripts...
Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> 
[1486997273.3697] device (em1): Activation: successful, device activated.

Note there is no IP address being obtained. Now, when I run ifdown/ifup:

Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6804] device (em1): Activation: starting connection 'em1'
(c432eaa1-023b-4f1f-a
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6809] audit: op="connection-activate"
uuid="c432eaa1-023b-4f1f-a7b5-4605ec07195
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6810] device (em1): state change: disconnected -> prepare
(reason 'none') [30 4
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6811] manager: NetworkManager state is now CONNECTING
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6816] device (em1): state change: prepare -> config (reason
'none') [40 50 0]
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6858] device (em1): state change: config -> ip-config (reason
'none') [50 70 0]
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6869] dhcp4 (em1): activation: beginning transaction (timeout
in 45 seconds)
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> 
[1486997297.6900] dhcp4 (em1): dhclient started with pid 67715
Feb 13 09:48:17 <servername> dhclient[67715]: DHCPDISCOVER on em1 to
255.255.255.255 port 67 interval 6 (xid=0x745ba623)
Feb 13 09:48:17 <servername> dhclient[67715]: DHCPREQUEST on em1 to
255.255.255.255 port 67 (xid=0x745ba623)
Feb 13 09:48:17 <servername> dhclient[67715]: DHCPOFFER from <DHCP server>
Feb 13 09:48:17 <servername> dhclient[67715]: DHCPACK from <DHCP server>
(xid=0x745ba623)

And it then gets an IP address.

And looking at /var/log/messages, it *appears* that the restart never
invokes the dhclient script, while ifup does.

       mark