[CentOS] Arp problems I think ...

Wed Dec 5 22:58:21 UTC 2007
Alan Bunch <Alan.Bunch at udfc.com>

Please bear with me as I know I have included a lot of detail.

Description
Redhat AS 3
Kernel 2.4.21-47.0.1.ELsmp
eth0      HWaddr 00:07:E9:11:30:76
          inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0

eth0:1    HWaddr 00:07:E9:11:30:76
          inet addr:192.168.3.1  Bcast:192.168.3.255  Mask:255.255.255.0
          
This interface is up but with no IP address
eth1      HWaddr 00:07:E9:11:30:77

eth1.101  10.10.1.1  Bcast:10.10.1.255  Mask:255.255.255.0  vlan 101
eth1.102  10.10.2.1  Bcast:10.10.2.255  Mask:255.255.255.0  vlan 102
eth1.103  10.10.3.1  Bcast:10.10.3.255  Mask:255.255.255.0  vlan 103
eth1.104  10.10.4.1  Bcast:10.10.4.255  Mask:255.255.255.0  vlan 104
eth1.105  10.10.5.1  Bcast:10.10.5.255  Mask:255.255.255.0  vlan 105
eth1.106  10.10.6.1  Bcast:10.10.6.255  Mask:255.255.255.0  vlan 106
eth1.107  10.10.7.1  Bcast:10.10.7.255  Mask:255.255.255.0  vlan 107
eth1.108  10.10.8.1  Bcast:10.10.8.255  Mask:255.255.255.0  vlan 108
eth1.109  10.10.9.1  Bcast:10.10.9.255  Mask:255.255.255.0  vlan 109
eth2      Link encap:Ethernet  HWaddr 00:06:5B:FE:56:C2
          BROADCAST MULTICAST  MTU:1500  Metric:1      
eth3      Link encap:Ethernet  HWaddr 00:06:5B:FE:56:C3
          BROADCAST MULTICAST  MTU:1500  Metric:1


This machine is routing between the vlans.  When I ping 10.10.5.105 I 
get Host Unreachable.  Here is tcpdump from the ping on the router.

tcpdump -i eth1.105
14:10:28.254008 arp who-has 10.10.5.105 tell 10.10.5.1
14:10:29.250067 arp who-has 10.10.5.105 tell 10.10.5.1
14:10:30.250143 arp who-has 10.10.5.105 tell 10.10.5.1

Ok now I go to the device via a serial port and ping back to 10.10.5.1 ( 
the router ) and here is the tcpdump output

tcpdump -i eth1.105 -n
tcpdump: listening on eth1.105
14:12:06.706722 arp who-has 10.10.5.1 tell 10.10.5.105
14:12:06.706798 arp reply 10.10.5.1 is-at 0:7:e9:11:30:77
14:12:06.707715 10.10.5.105 > 10.10.5.1: icmp: echo request (DF)
14:12:06.707762 10.10.5.1 > 10.10.5.105: icmp: echo reply
14:12:07.723100 10.10.5.105 > 10.10.5.1: icmp: echo request (DF)
14:12:07.723136 10.10.5.1 > 10.10.5.105: icmp: echo reply

Now of course I can ping 10.10.5.10 (the suspect device) from 10.10.5.1 
( the router )

ping 10.10.5.1
PING 10.10.5.1 (10.10.5.1) 56(84) bytes of data.
64 bytes from 10.10.5.1: icmp_seq=0 ttl=64 time=0.068 ms
64 bytes from 10.10.5.1: icmp_seq=1 ttl=64 time=0.039 ms         

This is fine untill the arp entry ages out.  Then I back to not being 
able to ping the device.

If I manually insert an arp table entry all is well.  No filtering in 
the switches.  Switches are SMC 6826 for the 10/100 and 8724 for the 
core and gig e.

I have several similar symptoms like this in various places.  I have 
some devices on vlans that I see the dhcp discover messages and I see 
the dhcp offer then the device sends another dhcp discover.  This goes 
on for a few times and the device just waits and starts the process over.

I feel that the problem lies in the handling of arp requests but I guess 
I just don't know enough about how linux handles them or how to control 
them to find a solution.

Any ideas ?

Alan