[CentOS-virt] Fwd: Network interface regression on F26 VM after 4.13/4.12 kernel update

Fri Oct 27 18:50:00 UTC 2017
Philip Prindeville <philipp_subx at redfish-solutions.com>

I did not hear back on this posting so I figured I was addressing the wrong audience.

Maybe someone on the host-side better understands how the 4.12 kernel is interacting with KVM.

Thanks,

-Philip


> Begin forwarded message:
> 
> From: Philip Prindeville <philipp_subx at redfish-solutions.com>
> Subject: Network interface regression on F26 VM after 4.13/4.12 kernel update
> Date: October 26, 2017 at 4:16:53 PM MDT
> To: devel at lists.fedoraproject.org
> Reply-To: Development discussions related to Fedora <devel at lists.fedoraproject.org>
> 
> I was running F25 (4.10) on a VM inside KVM/Qemu/libvirt on CentOS 7.3 (updated).
> 
> Then I upgraded it (via dnf system-upgrade) to F26 and 4.11 and it was still working well, as I recall.
> 
> Then I upgraded it again to 4.13 and now I’m seeing flakiness in the network: the NIC will randomly come up and go down.
> 
> Right now I’m seeing:
> 
> $ ip link show
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
>    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: ens3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
>    link/ether 52:54:00:29:01:5b brd ff:ff:ff:ff:ff:ff
> $ 
> 
> my messages file shows:
> 
> Oct 26 14:25:51 son-of-builder kernel: igbvf 0000:00:03.0: Link is Down
> Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info>  [1509049556.0757] device (ens3): state change: activated -> unavailable (reason 'carrier-changed', internal state 'managed')
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=filter family=2 entries=86
> Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info>  [1509049556.0932] dhcp4 (ens3): canceled DHCP transaction, DHCP client pid 8008
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=nat family=2 entries=52
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=mangle family=2 entries=40
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=raw family=2 entries=29
> Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info>  [1509049556.0933] dhcp4 (ens3): state changed bound -> done
> Oct 26 14:25:56 son-of-builder avahi-daemon[756]: Withdrawing address record for 192.168.1.56 on ens3.
> Oct 26 14:25:56 son-of-builder avahi-daemon[756]: Leaving mDNS multicast group on interface ens3.IPv4 with address 192.168.1.56.
> Oct 26 14:25:56 son-of-builder avahi-daemon[756]: Interface ens3.IPv4 no longer relevant for mDNS.
> Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:4 'connectivity-change': new request (5 scripts)
> Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:4 'connectivity-change': start running ordered scripts...
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=filter family=10 entries=87
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=nat family=10 entries=52
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=mangle family=10 entries=40
> Oct 26 14:25:56 son-of-builder audit: NETFILTER_CFG table=raw family=10 entries=30
> Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info>  [1509049556.1140] manager: NetworkManager state is now CONNECTED_LOCAL
> Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info>  [1509049556.1145] manager: NetworkManager state is now DISCONNECTED
> Oct 26 14:25:56 son-of-builder NetworkManager[824]: <info>  [1509049556.1256] policy: set-hostname: set hostname to 'localhost.localdomain' (no default device)
> Oct 26 14:25:56 son-of-builder systemd-hostnamed[8026]: Changed host name to 'localhost.localdomain'
> Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:5 'down' [ens3]: new request (5 scripts)
> Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:6 'hostname': new request (5 scripts)
> Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:5 'down' [ens3]: start running ordered scripts...
> Oct 26 14:25:56 son-of-builder nm-dispatcher[8018]: req:6 'hostname': start running ordered scripts...
> Oct 26 14:26:06 son-of-builder audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> Oct 26 14:26:21 son-of-builder NetworkManager[824]: <info>  [1509049581.0808] connectivity: (ens3) timed out
> Oct 26 14:26:21 son-of-builder dbus-daemon[761]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.10' (uid=0 pid=824 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")
> Oct 26 14:26:21 son-of-builder systemd[1]: Starting Network Manager Script Dispatcher Service...
> Oct 26 14:26:21 son-of-builder dbus-daemon[761]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
> Oct 26 14:26:21 son-of-builder systemd[1]: Started Network Manager Script Dispatcher Service.
> Oct 26 14:26:21 son-of-builder audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> Oct 26 14:26:21 son-of-builder nm-dispatcher[8169]: req:1 'connectivity-change': new request (5 scripts)
> Oct 26 14:26:21 son-of-builder nm-dispatcher[8169]: req:1 'connectivity-change': start running ordered scripts...
> Oct 26 14:26:26 son-of-builder audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
> Oct 26 14:26:31 son-of-builder audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success’
> 
> 
> Before the network configuration used to be rock-solid.  I’m running on a Xeon D-1548 SoC with an on-chip X552/557 (ixgbe.ko) and an off-chip i350 (igb.ko) quad-NIC.  In this case, I’m using the first port of the i350.
> 
> The VM’s XML is unchanged, as it was previously (while things were working reliably):
> 
>    <interface type='network'>
>      <mac address='52:54:00:29:01:5b'/>
>      <source network='hostdev-net0'/>
>      <model type='virtio'/>
>      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
>    </interface>
> 
> I’ve also tried “e1000”, “igb”, and “rtl8139” as the model type, with no appreciable difference since this problem started.
> 
> Just did a quick check: reinstalling 4.11.10-300 seems to restore functionality.
> 
> It was broken when I tried 4.12.11-300 as well.
> 
> -Philip