I have been trying to figure out why my domU NIC becomes unreachable (could not even ping) at various times. (Normally when the server was trying to update clamav from the various busy mirrors at 4am). There also seemed to be some latency when connecting which I chalked up to it being a virtual machine.
When I checked my logs, I found thousands of : Nov 17 04:07:52 nomad kernel: Neighbour table overflow. and applications reporting errors such as: Nov 17 04:08:05 nomad freshclam[4085]: nonblock_connect: connect(): fd=5 errno=105: No buffer space available
I am running a routed (not bridged) configuration.
What I figured out is that each Centos 5.4 domU is maintaining an ARP table. That table is filling up which causes the network to be unreachable until entries are purged from the cache. Since this is a routed configuration, the ARP table should really only consist of two or three entries, my domU, my dom0, and the gateway.
It appears the networking-scripts until Centos are ignoring the GATEWAY entry. I end up with route of: 169.254.0.0 * 255.255.0.0 U 0 0 0 eth0 default * 0.0.0.0 U 0 0 0 eth0
The default route should be the specific IP address in my /etc/sysconfig/network file. When I manually add the route, the arp table issue is fixed. The network stack no longer trys to query an arp entry for every IP address.
I found this bug at Xen which was closed as INVALID saying 'Centos is broken'. That was from 2006. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=596
Any ideas on what is broken and what the correct fix is? Right now, I just added
/sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw x.x.x.x
to my /etc/rc.local which seems like a hack solution.
On Wed, Nov 18, 2009 at 11:39:24AM -0500, Ken Bass wrote:
I have been trying to figure out why my domU NIC becomes unreachable (could not even ping) at various times. (Normally when the server was trying to update clamav from the various busy mirrors at 4am). There also seemed to be some latency when connecting which I chalked up to it being a virtual machine.
When I checked my logs, I found thousands of : Nov 17 04:07:52 nomad kernel: Neighbour table overflow. and applications reporting errors such as: Nov 17 04:08:05 nomad freshclam[4085]: nonblock_connect: connect(): fd=5 errno=105: No buffer space available
I am running a routed (not bridged) configuration.
What I figured out is that each Centos 5.4 domU is maintaining an ARP table. That table is filling up which causes the network to be unreachable until entries are purged from the cache. Since this is a routed configuration, the ARP table should really only consist of two or three entries, my domU, my dom0, and the gateway.
It appears the networking-scripts until Centos are ignoring the GATEWAY entry. I end up with route of: 169.254.0.0 * 255.255.0.0 U 0 0 0 eth0 default * 0.0.0.0 U 0 0 0 eth0
The default route should be the specific IP address in my /etc/sysconfig/network file. When I manually add the route, the arp table issue is fixed. The network stack no longer trys to query an arp entry for every IP address.
I found this bug at Xen which was closed as INVALID saying 'Centos is broken'. That was from 2006. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=596
Any ideas on what is broken and what the correct fix is? Right now, I just added
/sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw x.x.x.x
to my /etc/rc.local which seems like a hack solution.
I usually specify the default gateway in /etc/sysconfig/network-scripts/ifcfg-eth0 and it works just fine.
-- Pasi
Pasi Kärkkäinen wrote:
I usually specify the default gateway in /etc/sysconfig/network-scripts/ifcfg-eth0 and it works just fine.
Actually, I tried putting the GATEWAY in the specific ifcfg-eth0, as well as the global /etc/sysconfig/network and it seems to be ignored. Of course things 'appear' to work just fine, but the route that is setup is
default * 0.0.0.0 U 0 0 0 eth0
INSTEAD OF:
default router.example.com 0.0.0.0 UG 0 0 0 eth0
The former seems to cause all arp entries to be queried and cached. The latter works correctly. Both 'appear to work'. Does the route on your domU look like the second entry?
On Thu, Nov 19, 2009 at 10:03:05AM -0500, Ken Bass wrote:
Pasi Kärkkäinen wrote:
I usually specify the default gateway in /etc/sysconfig/network-scripts/ifcfg-eth0 and it works just fine.
Actually, I tried putting the GATEWAY in the specific ifcfg-eth0, as well as the global /etc/sysconfig/network and it seems to be ignored. Of course things 'appear' to work just fine, but the route that is setup is
default * 0.0.0.0 U 0 0 0 eth0
INSTEAD OF:
default router.example.com 0.0.0.0 UG 0 0 0 eth0
The former seems to cause all arp entries to be queried and cached. The latter works correctly. Both 'appear to work'. Does the route on your domU look like the second entry?
Yes, the routing table is correct for my domUs. I have never noticed/seen GATEWAY getting ignored..
Maybe your netmask is wrong, so the GATEWAY IP is unreachable?
-- Pasi
Ken, I think Pasi's on to something there, I bet the GATEWAY command in ifcfg-eth0 is mistyped or has a syntax error. In the interem, however, a better hack might be to move the route statement from rc.local, which only runs at boot, to /etc/sysconfig/network-scripts/route-eth0 . That will enable the network service to restart or the eth0 interface to downup without removing the default route.
-Chris
On Thu, Nov 19, 2009 at 8:04 AM, Pasi Kärkkäinen pasik@iki.fi wrote:
On Thu, Nov 19, 2009 at 10:03:05AM -0500, Ken Bass wrote:
Pasi Kärkkäinen wrote:
I usually specify the default gateway in /etc/sysconfig/network-scripts/ifcfg-eth0 and it works just fine.
Actually, I tried putting the GATEWAY in the specific ifcfg-eth0, as well as the global /etc/sysconfig/network and it seems to be ignored. Of course things 'appear' to work just fine, but the route that is setup
is
default * 0.0.0.0 U 0 0 0
eth0
INSTEAD OF:
default router.example.com 0.0.0.0 UG 0 0
0 eth0
The former seems to cause all arp entries to be queried and cached. The
latter works correctly.
Both 'appear to work'. Does the route on your domU look like the second
entry?
Yes, the routing table is correct for my domUs. I have never noticed/seen GATEWAY getting ignored..
Maybe your netmask is wrong, so the GATEWAY IP is unreachable?
-- Pasi
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
Christopher Hunt wrote:
Ken, I think Pasi's on to something there, I bet the GATEWAY command in ifcfg-eth0 is mistyped or has a syntax error. In the interem, however, a better hack might be to move the route statement from rc.local, which only runs at boot, to /etc/sysconfig/network-scripts/route-eth0 . That will enable the network service to restart or the eth0 interface to downup without removing the default route.
Thanks for this tip. I removed the GATEWAY and GATEWAYDEV from /etc/sysconfig/network which got rid of the initial (incorrect route) and added
default via 192.168.144.6 dev eth0 onlink
to the /etc/sysconfig/network-scripts/route-eth0
Note the use of the 'onlink' option. It appears to be working fine without the arp traffic.
I guess I thought this was a more common configuration when using Xen in a routed configuration. I'm not sure if the network scripts allow a mechanism to add the 'onlink' option so that GATEWAY/GATEWAYDEV specified are actually used. Grepping through the network-scripts does not yield anything.
If other people are using this in the same way I am, you will need to do this change or else there will be all the unnecessary arp cache activity for every single ip address which might fill up your arp table.
Pasi Kärkkäinen wrote:
Yes, the routing table is correct for my domUs. I have never noticed/seen GATEWAY getting ignored..
Maybe your netmask is wrong, so the GATEWAY IP is unreachable?
Well, the netmask for the domU is 255.255.255.255 since the domU is allocated a single official IP address and packets are routed to it from the dom0.
It is a bit confusing, but my dom0 is 192.168.144.6/30 [ip address changed for privacy] My domU is 192.168.139.4/32 (single host) Another domU is 192.168.139.128/29 (a few hosts)
There is a router in the ISP that routes the domU's via my dom0 "router".
The domU gateway is 192.168.144.6 which is outside the subnet mask of the domU. But that should be legal right? Is this just some Centos/RHEL network script that is not flexible enough? Obviously I can add the
/sbin/route add -net 0.0.0.0 netmask 0.0.0.0 gw 192.168.144.6
which works. Since these IP addresses and netmasks are official assigned to me, I dont think it is proper to be using IP address outside the range allocated to me. Does this make sense?
Ken Bass wrote:
Pasi Kärkkäinen wrote:
Yes, the routing table is correct for my domUs. I have never noticed/seen GATEWAY getting ignored..
Maybe your netmask is wrong, so the GATEWAY IP is unreachable?
Well, the netmask for the domU is 255.255.255.255 since the domU is allocated a single official IP address and packets are routed to it from the dom0.
It is a bit confusing, but my dom0 is 192.168.144.6/30 [ip address changed for privacy] My domU is 192.168.139.4/32 (single host) Another domU is 192.168.139.128/29 (a few hosts)
There is a router in the ISP that routes the domU's via my dom0 "router".
The domU gateway is 192.168.144.6 which is outside the subnet mask of the domU. But that should be legal right? Is this just some Centos/RHEL network script that is not flexible enough? Obviously I can add the
Gateway needs to be in the same subnet. And for everything to work correctly both router and client should have the same netmask for the subnet. So you should assign the same netmask to the DomU as the 192.168.144.6 has.
-vpk
I was slightly confused about this thread until I realized you were using static IP config on your VM's...
Why do people do that? I have an extra step of picking up the HW address (or setting the HW address when creating the VM) and putting it into my dhcp configuration, but then I have all of my hosts in a single file and I can change the network configuration of my whole network in a single place.
I realize that my DHCP server becomes a single point of failure, but with a reasonably long retrain time the DHCP server going down won't effect any workstations for as much as several hours (as long as nothing reboots). Also, there are ways of having fault tolerance with DHCP, the easiest would be to have a non running VM with a copy of the data.