I just replaced a dead system disk on my KVM host that was running an ancient fedora 13. Since centos 7 was available, I decided to go with it to get some long term stability.
The problem is that NFS mounts inside the virtual machines don't work for spit when talking to older NFS servers that must speak UDP.
Is there something about UDP traffic that requires tweaks I don't know about for centos 7 to serve as a gateway machine? I've got the ip forwarding settings and other sysctl stuff that was set in the old fedora 13 system.
I've got the bridges defined that same way as the old f13 system.
I've got TCP stream connections working flawlessly, it is just the UDP traffic that seems to barf.
Does this strike a familiar note with anyone?
When I run wireshark on the KVM host machine, I see NFS packets retransmitting a lot and I also see ICMP messages about Destination Unreachable, Fragmentation Needed. (I don't know what any of it means though :-).
This is an intel motherboard with these ethernets: 04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
In article 20140814120002.16440e86@tomh, Tom Horsley horsley1953@gmail.com wrote:
I just replaced a dead system disk on my KVM host that was running an ancient fedora 13. Since centos 7 was available, I decided to go with it to get some long term stability.
The problem is that NFS mounts inside the virtual machines don't work for spit when talking to older NFS servers that must speak UDP.
Is there something about UDP traffic that requires tweaks I don't know about for centos 7 to serve as a gateway machine? I've got the ip forwarding settings and other sysctl stuff that was set in the old fedora 13 system.
I've got the bridges defined that same way as the old f13 system.
I've got TCP stream connections working flawlessly, it is just the UDP traffic that seems to barf.
Does this strike a familiar note with anyone?
When I run wireshark on the KVM host machine, I see NFS packets retransmitting a lot and I also see ICMP messages about Destination Unreachable, Fragmentation Needed. (I don't know what any of it means though :-).
This means that either the host or one of the guests is trying to send packets with a larger MTU than part of the path to the destination will allow.
If you look inside the ICMP packet in wireshark, it will tell you who sent it and what MTU they said was acceptable.
For TCP, the protocol stack is able to adapt by reducing its MSS dynamically in response to those ICMPs and retry. I don't think UDP is able to do that.
Also examine the MTU settings for your network interfaces on both the host and the guests, using ifconfig -a.
Cheers Tony
If you look inside the ICMP packet in wireshark, it will tell you who sent it and what MTU they said was acceptable.
Well, I'm definitely drowning in network confusion here :-).
Everyone's MTU is the default 1500, I checked all systems in the path.
The wireshark display says 1516 in the Length column for the NFS packet that always shows up before the ICMP errors. If I expand the "IP V4" line in the packet, it says "Total Length: 1500" for that READDIRPLUS Reply which says 1516 for the capture length. It also has the "Don't fragment" flag set.
It looks like the 16 byte extra is confusing it, but I have no idea why that is different than the IPv4 length info.
On Thu, Aug 14, 2014 at 1:19 PM, Tom Horsley horsley1953@gmail.com wrote:
If you look inside the ICMP packet in wireshark, it will tell you who sent it and what MTU they said was acceptable.
Well, I'm definitely drowning in network confusion here :-).
Everyone's MTU is the default 1500, I checked all systems in the path.
The wireshark display says 1516 in the Length column for the NFS packet that always shows up before the ICMP errors. If I expand the "IP V4" line in the packet, it says "Total Length: 1500" for that READDIRPLUS Reply which says 1516 for the capture length. It also has the "Don't fragment" flag set.
It looks like the 16 byte extra is confusing it, but I have no idea why that is different than the IPv4 length info.
I thought NFS defaulted to writing 8192 blocks and let the network stack fragment as needed, so having DF set doesn't make much sense. Also, some firewalling schemes have issues with fragments, especially if they arrive out of order - not sure about the new stuff in C7.
On Thu, 14 Aug 2014 13:35:48 -0500 Les Mikesell wrote:
I thought NFS defaulted to writing 8192 blocks and let the network stack fragment as needed
I think it is those fragments I'm looking at in wireshark.
I just did another experiment - If I mount the same NFS filesystem on the centos 7 host, and do the same "ls" command, it works perfectly and the wireshark trace shows the same 1516 capture length for the NFS readdir messages.
Somehow it is just the idea of forwarding the UDP packets to the virtual machine that the host objects to. The exact same size packets destined for it to use directly have no problems.
On Thu, Aug 14, 2014 at 1:53 PM, Tom Horsley horsley1953@gmail.com wrote:
I thought NFS defaulted to writing 8192 blocks and let the network stack fragment as needed
I think it is those fragments I'm looking at in wireshark.
I just did another experiment - If I mount the same NFS filesystem on the centos 7 host, and do the same "ls" command, it works perfectly and the wireshark trace shows the same 1516 capture length for the NFS readdir messages.
Somehow it is just the idea of forwarding the UDP packets to the virtual machine that the host objects to. The exact same size packets destined for it to use directly have no problems.
Seems like a horrible thing to do, but does it fix it if you mount with rsize=1500, wsize=1500 - or maybe 1484?
Are you just bridging to the NIC interface? I don't see why that would need to change the packets at all. What happens if you ping with a large -s value through the bridge (host or external box to guest)?
On Thu, 14 Aug 2014 14:09:44 -0500 Les Mikesell wrote:
Seems like a horrible thing to do, but does it fix it if you mount with rsize=1500, wsize=1500 - or maybe 1484?
I already tried that - no change :-).
Are you just bridging to the NIC interface? I don't see why that would need to change the packets at all. What happens if you ping with a large -s value through the bridge (host or external box to guest)?
There are two NICs. The one with the bridge is also running a subnet with the virtual machines and one real machine on the NIC. The other NIC is connected to the wider world of our local LAN where the NFS servers reside, so the host has to operate as a gateway for the traffic from the LAN to the virtual machine subnet.
I did just try the ping experiment, and on the outer NFS server, if I try to ping the virtual machine with a big size, I get the error about the packet fragmentation:
dino> ping -c 1 -s 1500 ubu14d04x PING ubu14d04x.ccur.kvm (192.168.118.52) from 10.134.30.46 : 1500(1528) bytes of data. From godzilla (10.134.30.124) icmp_seq=1 Frag needed and DF set (mtu = 1500)
But weirdly, I don't get that from every machine I try out here on the LAN, some can ping it just fine, others get the error.
Whatever I discover just makes me more confused :-).
On Thu, Aug 14, 2014 at 2:48 PM, Tom Horsley horsley1953@gmail.com wrote:
Seems like a horrible thing to do, but does it fix it if you mount with rsize=1500, wsize=1500 - or maybe 1484?
I already tried that - no change :-).
It just seems very wrong for the NFS device to be sending 1516 bytes - and to set DF on the packet. What OS is it and what does it say about its own MTU? Physically, ethernet will accommodate 1518-1522 to allow VLAN tagging but you shouldn't have that without knowing about it (and your swiitch ports configured to trunk).
Are you just bridging to the NIC interface? I don't see why that would need to change the packets at all. What happens if you ping with a large -s value through the bridge (host or external box to guest)?
There are two NICs. The one with the bridge is also running a subnet with the virtual machines and one real machine on the NIC. The other NIC is connected to the wider world of our local LAN where the NFS servers reside, so the host has to operate as a gateway for the traffic from the LAN to the virtual machine subnet.
I think dropping the packet is actually the correct thing in that scenario. It should not forward something larger than the next interface's MTU and if the DF bit is set it can't fragment there. If you have IP's to spare on the NFS subnet, you might get away with bridging there and adding a virtual NIC to the guest(s) that need access.
In article 20140814141900.777d6f0c@tomh, Tom Horsley horsley1953@gmail.com wrote:
If you look inside the ICMP packet in wireshark, it will tell you who sent it and what MTU they said was acceptable.
Well, I'm definitely drowning in network confusion here :-).
Everyone's MTU is the default 1500, I checked all systems in the path.
The wireshark display says 1516 in the Length column for the NFS packet that always shows up before the ICMP errors. If I expand the "IP V4" line in the packet, it says "Total Length: 1500" for that READDIRPLUS Reply which says 1516 for the capture length. It also has the "Don't fragment" flag set.
It looks like the 16 byte extra is confusing it, but I have no idea why that is different than the IPv4 length info.
The 1516 is the total length of the ethernet frame, and is normal for a 1500 MTU. The 16 bytes is the link-layer header.
When looking at the ICMP Frag-needed packet in Wireshark, look particularly at (a) its source and destination addresses, (b) the "MTU of next hop" field (in expansion of ICMP), and (c) the source and destination addresses of the packet it was complaining about.
Here's an example from one of my recent traces:
Frame 235: 72 bytes on wire (576 bits), 72 bytes captured (576 bits) Linux cooked capture Internet Protocol Version 4, Src: 10.30.0.245 (10.30.0.245), Dst: 172.22.21.48 (172.22.21.48) (a) ^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^ Internet Control Message Protocol Type: 3 (Destination unreachable) Code: 4 (Fragmentation needed) Checksum: 0x81df [correct] MTU of next hop: 1476 (b) ^^^^ Internet Protocol Version 4, Src: 172.22.21.48 (172.22.21.48), Dst: 172.27.60.31 (172.27.60.31) (c) ^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^ Transmission Control Protocol, Src Port: ssh (22), Dst Port: 56199 (56199)
Cheers Tony
I think I have my answer: The kernel is busted (or something isn't loaded that I need, but don't know about :-).
I copied my Fedora 20 desktop 3.15.8-200.fc20.x86_64 kernel and /lib/module files to the centos7 KVM host, rebuilt grub.cfg, and rebooted into the 3.15.8-200 kernel, and with no other changes the UDP packet forwarding is now working perfectly.
I guess it is time to make yet another bugzilla account and submit a bug...
On Fri, Aug 15, 2014 at 4:50 AM, Tom Horsley horsley1953@gmail.com wrote:
I think I have my answer: The kernel is busted (or something isn't loaded that I need, but don't know about :-).
I copied my Fedora 20 desktop 3.15.8-200.fc20.x86_64 kernel and /lib/module files to the centos7 KVM host, rebuilt grub.cfg, and rebooted into the 3.15.8-200 kernel, and with no other changes the UDP packet forwarding is now working perfectly.
It is much easier if you use ELRepo's kernel-ml (http://elrepo.org/tiki/kernel-ml).
I guess it is time to make yet another bugzilla account and submit a bug...
Yes, good idea.
Akemi
It is much easier if you use ELRepo's kernel-ml (http://elrepo.org/tiki/kernel-ml).
Does look like a better long term solution, fedora was just a hack for testing :-).
I guess it is time to make yet another bugzilla account and submit a bug...
Yes, good idea.
And here it is: http://bugs.centos.org/view.php?id=7505
Nope. The kernel is not busted.
You just need to add a few rules to your firewall in order to tell it to forward the packets appropriately. While you do need "net.ipv4.ip_forward = 1" line in /etc/sysctl.conf, and you also need to set /proc/sys/net/ipv4/ip_forward to 1 if you have not rebooted after setting the line in sysctl.conf, firewall rules are required to make it work.
Unfortunately the specific firewall rules you require will depend upon the release level of the distribution you use. IPTables has changed a bit over the years and so the specific rules and their syntax has changed as well. Here is what I use now with CentOS 6.5+ on my own network.
# Generated by iptables-save v1.4.7 on Fri Aug 15 09:11:28 2014 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [825:47118] :fail2ban-SSH - [0:0] -A INPUT -p tcp -m tcp --dport 22 -j fail2ban-SSH -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -i eth+ -j ACCEPT -A INPUT -p tcp -m conntrack --ctstate NEW -m tcp --dport 22 -j ACCEPT -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -p icmp -j ACCEPT -A FORWARD -i lo -j ACCEPT -A FORWARD -i eth0 -j ACCEPT -A FORWARD -i eth1 -j ACCEPT -A FORWARD -j REJECT --reject-with icmp-host-prohibited -A fail2ban-SSH -j RETURN COMMIT # Completed on Fri Aug 15 09:11:28 2014 # Generated by iptables-save v1.4.7 on Fri Aug 15 09:11:28 2014 *nat :PREROUTING ACCEPT [80965:6238336] :POSTROUTING ACCEPT [37811:2251658] :OUTPUT ACCEPT [838:63592] -A PREROUTING -d 24.199.159.56/29 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.0.53:80 -A PREROUTING -d 24.199.159.56/29 -p tcp -m tcp --dport 25 -j DNAT --to-destination 192.168.0.53:25 -A POSTROUTING -s 192.168.0.0/24 -j MASQUERADE COMMIT # Completed on Fri Aug 15 09:11:28 2014
The FORWARD rules in the filter table allow forwarding from your internal networks on eth0 and eth1 to the outside world. The Destination NATing PREROUTING rules allow incoming packets for SMTP and HTTP to be routed to the appropriate server on my inside network.
I hope this helps.
On 08/15/2014 07:50 AM, Tom Horsley wrote:
I think I have my answer: The kernel is busted (or something isn't loaded that I need, but don't know about :-).
I copied my Fedora 20 desktop 3.15.8-200.fc20.x86_64 kernel and /lib/module files to the centos7 KVM host, rebuilt grub.cfg, and rebooted into the 3.15.8-200 kernel, and with no other changes the UDP packet forwarding is now working perfectly.
I guess it is time to make yet another bugzilla account and submit a bug... _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Fri, 15 Aug 2014 09:19:29 -0400 David Both wrote:
I hope this helps.
Nah, all the forwarding rules were in place. They all worked before I switched to centos7, and they all worked after I booted the fedora kernel. No sysctl or iptables changes were made when switching from centos to fedora kernel, yet the forwarding started working after booting fedora.
I suspect if I backed up to the kernel centos 6.5 uses that would work as well. I betcha someone has a < that should be a <= somewhere in an MTU size check in the centos7 kernel :-).