I did not hear back on this posting so I figured I was addressing the wrong audience.
Maybe someone on the host-side better understands how the 4.12 kernel is interacting with KVM.
Thanks,
-Philip
Begin forwarded message:
From: Philip Prindeville philipp_subx@redfish-solutions.com Subject: Network interface regression on F26 VM after 4.13/4.12 kernel update Date: October 26, 2017 at 4:16:53 PM MDT To: devel@lists.fedoraproject.org Reply-To: Development discussions related to Fedora devel@lists.fedoraproject.org
I was running F25 (4.10) on a VM inside KVM/Qemu/libvirt on CentOS 7.3 (updated).
Then I upgraded it (via dnf system-upgrade) to F26 and 4.11 and it was still working well, as I recall.
Then I upgraded it again to 4.13 and now I’m seeing flakiness in the network: the NIC will randomly come up and go down.
Right now I’m seeing:
$ ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ens3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:29:01:5b brd ff:ff:ff:ff:ff:ff $
my messages file shows:
Oct 26 14:25:51 son-of-builder kernel: igbvf 0000:00:03.0: Link is Down Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info> [1509049556.0757] device (ens3): state change: activated -> unavailable (reason 'carrier-changed', internal state 'managed') Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=filter family=2 entries=86 Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info> [1509049556.0932] dhcp4 (ens3): canceled DHCP transaction, DHCP client pid 8008 Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=nat family=2 entries=52 Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=mangle family=2 entries=40 Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=raw family=2 entries=29 Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info> [1509049556.0933] dhcp4 (ens3): state changed bound -> done Oct 26 14:25:56 son-of-builder avahi-daemon[756]: Withdrawing address record for 192.168.1.56 on ens3. Oct 26 14:25:56 son-of-builder avahi-daemon[756]: Leaving mDNS multicast group on interface ens3.IPv4 with address 192.168.1.56. Oct 26 14:25:56 son-of-builder avahi-daemon[756]: Interface ens3.IPv4 no longer relevant for mDNS. Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:4 'connectivity-change': new request (5 scripts) Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:4 'connectivity-change': start running ordered scripts... Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=filter family=10 entries=87 Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=nat family=10 entries=52 Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=mangle family=10 entries=40 Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=raw family=10 entries=30 Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info> [1509049556.1140] manager: NetworkManager state is now CONNECTED_LOCAL Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info> [1509049556.1145] manager: NetworkManager state is now DISCONNECTED Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info> [1509049556.1256] policy: set-hostname: set hostname to 'localhost.localdomain' (no default device) Oct 26 14:25:56 son-of-builder systemd-hostnamed[8026]: Changed host name to 'localhost.localdomain' Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:5 'down' [ens3]: new request (5 scripts) Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:6 'hostname': new request (5 scripts) Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:5 'down' [ens3]: start running ordered scripts... Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:6 'hostname': start running ordered scripts... Oct 26 14:26:06 son-of-builder audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 26 14:26:21 son-of-builder NetworkManager[824]: <info> [1509049581.0808] connectivity: (ens3) timed out Oct 26 14:26:21 son-of-builder dbus-daemon[761]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.10' (uid=0 pid=824 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0") Oct 26 14:26:21 son-of-builder systemd[1]: Starting Network Manager Script Dispatcher Service... Oct 26 14:26:21 son-of-builder dbus-daemon[761]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Oct 26 14:26:21 son-of-builder systemd[1]: Started Network Manager Script Dispatcher Service. Oct 26 14:26:21 son-of-builder audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 26 14:26:21 son-of-builder nm-dispatcher[8169]: req:1 'connectivity-change': new request (5 scripts) Oct 26 14:26:21 son-of-builder nm-dispatcher[8169]: req:1 'connectivity-change': start running ordered scripts... Oct 26 14:26:26 son-of-builder audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Oct 26 14:26:31 son-of-builder audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success’
Before the network configuration used to be rock-solid. I’m running on a Xeon D-1548 SoC with an on-chip X552/557 (ixgbe.ko) and an off-chip i350 (igb.ko) quad-NIC. In this case, I’m using the first port of the i350.
The VM’s XML is unchanged, as it was previously (while things were working reliably):
<interface type='network'> <mac address='52:54:00:29:01:5b'/> <source network='hostdev-net0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface>
I’ve also tried “e1000”, “igb”, and “rtl8139” as the model type, with no appreciable difference since this problem started.
Just did a quick check: reinstalling 4.11.10-300 seems to restore functionality.
It was broken when I tried 4.12.11-300 as well.
-Philip