[CentOS] UDP de-fragmentation problem

Thu Apr 7 15:19:46 UTC 2016
Volker <volker at openbios.org>

Hi all.

I have a strange problem at hand regarding UDP fragmentation on Centos7:
Applications are unable to receive UDP packets which have undergone
fragmentation UNLESS the netfilter modules are loaded.

The problem arose on a application which would run fine on OpenSuse but
does not work on Centos7. The application processes UDP data and on
Centos only small packets are received and processed, packets below the
fragmentation size limit of about 1500 bytes. UDP packets which have
undergone fragmentation are not received by the application.

The application in question uses Qt, which opens the UDP socket in
non-blocking mode - apparently that's an issue because reading from the
socket in blocking mode does not cause the problem.

By chance I hit on the fact that once the netfilter kernel-modules
(nf_nat, iptable_nat, nf_nat ...) are loaded the problem disappears and
UDP packets of all sizes are correctly delivered and processed.

NOTES:
- I'm not using netfilter. My iptables are empty, firewalld is not running.

- Other networking applications -at least tcp- are working fine:
webbrowsing, ssh, nfs etc even DNS

- Does not happen on Opensuse regardless if netfilter modules are loaded
or not.

- Does not happen on Opensuse on the same machine. Does happen on
different machines on Centos7. So it's not HW dependend

- There is AFAIK nothing special about my Centos7 installation. Out of
the box install, simple network config, latest updates applied.

- Rebuilding the application on Centos7 with centos supplied gcc, libs
etc does not make the problem go away.

- I have broken the application down to a small Qt test program which
opens a UDP socket, binds and waits on it

This is an strace output of the problem, where a 10000 byte UDP packet
is send to the application, triggers the select(), then the
recvfrom(7...) fails with eagain
[...]
socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 7
fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(7, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
bind(7, {sa_family=AF_INET, sin_port=htons(10001),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
getsockname(7, {sa_family=AF_INET, sin_port=htons(10001),
sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
getpeername(7, 0x7ffdf3073470, [16])    = -1 ENOTCONN (Transport
endpoint is not connected)
getsockopt(7, SOL_SOCKET, SO_TYPE, [2], [4]) = 0
select(8, [3 7], [], [], NULL)          = 1 (in [7])
recvfrom(7, 0x7ffdf3072e1b, 1, 2, 0x7ffdf3072e20, 0x7ffdf3072e1c) = -1
EAGAIN (Resource temporarily unavailable)
select(8, [3 7], [], [], NULL
[...]

And after the netfilter modules are loaded:
[...]
socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 7
fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(7, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
bind(7, {sa_family=AF_INET, sin_port=htons(10001),
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
getsockname(7, {sa_family=AF_INET, sin_port=htons(10001),
sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
getpeername(7, 0x7ffc5939e8c0, [16])    = -1 ENOTCONN (Transport
endpoint is not connected)
getsockopt(7, SOL_SOCKET, SO_TYPE, [2], [4]) = 0
select(8, [3 7], [], [], NULL)          = 1 (in [7])
recvfrom(7, "x", 1, MSG_PEEK, {sa_family=AF_INET, sin_port=htons(60921),
sin_addr=inet_addr("10.77.32.30")}, [16]) = 1
recvfrom(7, "x", 1, MSG_PEEK, {sa_family=AF_INET, sin_port=htons(60921),
sin_addr=inet_addr("10.77.32.30")}, [16]) = 1
recvfrom(7, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 65536, 0,
{sa_family=AF_INET, sin_port=htons(60921),
sin_addr=inet_addr("10.77.32.30")}, [16]) = 10000
recvfrom(7, 0x7ffc5939e0bb, 1, 2, 0x7ffc5939e0c0, 0x7ffc5939e0bc) = -1
EAGAIN (Resource temporarily unavailable)
select(8, [3 7], [], [], NULL
[...]


Any help? bug?

Regards
.....Volker