[CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

Fri Apr 17 19:20:06 UTC 2015
Nathan March <nathan at gt.net>

Hi All,

I've tracked this down... We do rate limiting of our vms with a mix of ebtables/tc.

Running these commands (replace vif1.0 with the correct vif for your VM) will reproduce this:

ebtables -A FORWARD -i vif1.0 -j mark --set-mark 990 --mark-target CONTINUE

tc qdisc add dev bond0 root handle 1: htb default 2 
tc class add dev bond0 parent 1: classid 1:0 htb rate 10000mbit 

tc class add dev bond0 parent 1: classid 1:990 htb rate 10000mbit
tc filter add dev bond0 protocol ip parent 1:0 prio 990 handle 990 fw flowid 1:990

Note that the speed limits being applied here are 10gb and I'm testing this on a 1gb network, so TC shouldn't really be doing anything here except letting the packets through. These same commands worked fine on gentoo xen 4.1 / kernel 3.2.57, compared to this now not working on centos xen 4.4.1 / kernel 3.10.68.

Easiest way to reproduce is simply generate a large file, scp it to a remote host and on the remote host run:
tshark -Y "tcp.analysis.duplicate_ack_num"

If you run the ssh in a loop + tshark in another window, you can see the Dup ACK's begin immediately after adding the last filter rule:

25790294 1752.756733 xxx.xxx.xxx.13 -> xxx.xxx.xxx.205 TCP 78 [TCP Dup ACK 25790286#4] ssh > 51515 [ACK] Seq=15994 Ack=50769840 Win=1544704 Len=0 TSval=738150929 TSecr=4294944346 SLE=50785768 SRE=50790596
25790296 1752.756742 xxx.xxx.xxx.13 -> xxx.xxx.xxx.205 TCP 78 [TCP Dup ACK 25790286#5] ssh > 51515 [ACK] Seq=15994 Ack=50769840 Win=1544704 Len=0 TSval=738150929 TSecr=4294944346 SLE=50785768 SRE=50792044

- Nathan