James Hogarth wrote:
On 13 February 2017 at 15:35, m.roth@5-cent.us wrote:
My manager tells me a system in the datacenter is down. I go down there, and plug in a monitor-on-a-stick and keyboard. It's up, but no network. I try systemctl restart NetworkManager several times, and ip a shows *no* change.
Finally, I do an ifdown, followed by an ifup, and everything's wonderful.
My manager thinks that the NM daemon thinks everything's fine, and there've been no changes, so it does nothing. He suggests that it might have to be stopped, then started, rather than restarted.
This is completely unacceptable behavior, since it leave the system with no network connection. Pre-systemd, as we all know, restart *RESTARTED* the damn thing.
Is there some Magic (#insert "pixie-dust-sparkles") incantation, either restarting NetworkManager, or using nm-cli, to force it to perform the expected actions?
I'd be interested in the journal from the NetworkManager restart as that's not the way it behaves ... it uses the netlink API to get state and not it's own internal tracker of state (ie doing an ip link down will reflect in nmcli output) ... a restart of NetworkManager should not ignore interfaces but rather bring the system to the on disk configured state ... and a quick check it doesn't override ExecRestart in the unit file to do a reload or similar instead ...
And indeed a quick test in a VM shows nmcli device status correctly changing between connected and unavailable when doing ip link set eth0 down/up
Do note that on a NM based system ifup and ifdown are effectively aliases to nmcli conn down and nmcli conn up
nmcli conn down "connection name" will make it disconnected nmcli conn up "connection mame" will bring it back to connected
there is a slight interesting difference between using nmcli and ip link set though ...
with ip link set down <interface> the interface is marked administratively down (as if you've pulled the cable) but nmcli conn down "connection name" will unconfigure the interface but leave it in an UP state ... just without an IP address etc
anyway that's just an interesting diversion on behavioural differences
NM won't change an interface state without some sort of event though (manual or virtual cable pulled etc), and if you have a case where it *has* done that then you have found a bug that would be great to get reported
TL;DR: cannot reproduce, need logs to determine what happened without a working crystal ball
From journalctl, I see this happening when I do systemctl restart
NetworkManager (much edited) Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7755] manager: (em1): new Ethernet device (/org/freedesktop/NetworkManager/Devi Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7791] ifcfg-rh: add connection in-memory (79d3ed9d-cc41-498c-9169-44320e332f68, Feb 13 09:47:52 <servername> systemd[1]: Started Hostname Service. Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7797] device (em1): state change: unmanaged -> unavailable (reason 'connection- Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7805] device (em1): state change: unavailable -> disconnected (reason 'connecti <...> eb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7986] device (em1): state change: disconnected -> prepare (reason 'none') [30 4 Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.7999] policy: set 'em1' (em1) as default for IPv6 routing and DNS Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.8027] device (em1): state change: prepare -> config (reason 'none') [40 50 0] Feb 13 09:47:52 <servername> NetworkManager[67312]: <info> [1486997272.8034] device (em1): state change: config -> ip-config (reason 'none') [50 70 0] Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3594] device (em1): state change: ip-config -> ip-check (reason 'none') [70 80 Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3661] device (em1): state change: ip-check -> secondaries (reason 'none') [80 9 Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3666] device (em1): state change: secondaries -> activated (reason 'none') [90 Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3667] manager: NetworkManager state is now CONNECTED_GLOBAL Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3670] manager: NetworkManager state is now CONNECTED_SITE Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3670] manager: NetworkManager state is now CONNECTED_GLOBAL Feb 13 09:47:53 <servername> nm-dispatcher[67317]: req:2 'connectivity-change': new request (6 scripts) Feb 13 09:47:53 <servername> nm-dispatcher[67317]: req:2 'connectivity-change': start running ordered scripts... Feb 13 09:47:53 <servername> NetworkManager[67312]: <info> [1486997273.3697] device (em1): Activation: successful, device activated.
Note there is no IP address being obtained. Now, when I run ifdown/ifup:
Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6804] device (em1): Activation: starting connection 'em1' (c432eaa1-023b-4f1f-a Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6809] audit: op="connection-activate" uuid="c432eaa1-023b-4f1f-a7b5-4605ec07195 Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6810] device (em1): state change: disconnected -> prepare (reason 'none') [30 4 Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6811] manager: NetworkManager state is now CONNECTING Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6816] device (em1): state change: prepare -> config (reason 'none') [40 50 0] Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6858] device (em1): state change: config -> ip-config (reason 'none') [50 70 0] Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6869] dhcp4 (em1): activation: beginning transaction (timeout in 45 seconds) Feb 13 09:48:17 <servername> NetworkManager[67312]: <info> [1486997297.6900] dhcp4 (em1): dhclient started with pid 67715 Feb 13 09:48:17 <servername> dhclient[67715]: DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 6 (xid=0x745ba623) Feb 13 09:48:17 <servername> dhclient[67715]: DHCPREQUEST on em1 to 255.255.255.255 port 67 (xid=0x745ba623) Feb 13 09:48:17 <servername> dhclient[67715]: DHCPOFFER from <DHCP server> Feb 13 09:48:17 <servername> dhclient[67715]: DHCPACK from <DHCP server> (xid=0x745ba623)
And it then gets an IP address.
And looking at /var/log/messages, it *appears* that the restart never invokes the dhclient script, while ifup does.
mark