Please bear with me as I know I have included a lot of detail.
Description Redhat AS 3 Kernel 2.4.21-47.0.1.ELsmp eth0 HWaddr 00:07:E9:11:30:76 inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
eth0:1 HWaddr 00:07:E9:11:30:76 inet addr:192.168.3.1 Bcast:192.168.3.255 Mask:255.255.255.0
This interface is up but with no IP address eth1 HWaddr 00:07:E9:11:30:77
eth1.101 10.10.1.1 Bcast:10.10.1.255 Mask:255.255.255.0 vlan 101 eth1.102 10.10.2.1 Bcast:10.10.2.255 Mask:255.255.255.0 vlan 102 eth1.103 10.10.3.1 Bcast:10.10.3.255 Mask:255.255.255.0 vlan 103 eth1.104 10.10.4.1 Bcast:10.10.4.255 Mask:255.255.255.0 vlan 104 eth1.105 10.10.5.1 Bcast:10.10.5.255 Mask:255.255.255.0 vlan 105 eth1.106 10.10.6.1 Bcast:10.10.6.255 Mask:255.255.255.0 vlan 106 eth1.107 10.10.7.1 Bcast:10.10.7.255 Mask:255.255.255.0 vlan 107 eth1.108 10.10.8.1 Bcast:10.10.8.255 Mask:255.255.255.0 vlan 108 eth1.109 10.10.9.1 Bcast:10.10.9.255 Mask:255.255.255.0 vlan 109 eth2 Link encap:Ethernet HWaddr 00:06:5B:FE:56:C2 BROADCAST MULTICAST MTU:1500 Metric:1 eth3 Link encap:Ethernet HWaddr 00:06:5B:FE:56:C3 BROADCAST MULTICAST MTU:1500 Metric:1
This machine is routing between the vlans. When I ping 10.10.5.105 I get Host Unreachable. Here is tcpdump from the ping on the router.
tcpdump -i eth1.105 14:10:28.254008 arp who-has 10.10.5.105 tell 10.10.5.1 14:10:29.250067 arp who-has 10.10.5.105 tell 10.10.5.1 14:10:30.250143 arp who-has 10.10.5.105 tell 10.10.5.1
Ok now I go to the device via a serial port and ping back to 10.10.5.1 ( the router ) and here is the tcpdump output
tcpdump -i eth1.105 -n tcpdump: listening on eth1.105 14:12:06.706722 arp who-has 10.10.5.1 tell 10.10.5.105 14:12:06.706798 arp reply 10.10.5.1 is-at 0:7:e9:11:30:77 14:12:06.707715 10.10.5.105 > 10.10.5.1: icmp: echo request (DF) 14:12:06.707762 10.10.5.1 > 10.10.5.105: icmp: echo reply 14:12:07.723100 10.10.5.105 > 10.10.5.1: icmp: echo request (DF) 14:12:07.723136 10.10.5.1 > 10.10.5.105: icmp: echo reply
Now of course I can ping 10.10.5.10 (the suspect device) from 10.10.5.1 ( the router )
ping 10.10.5.1 PING 10.10.5.1 (10.10.5.1) 56(84) bytes of data. 64 bytes from 10.10.5.1: icmp_seq=0 ttl=64 time=0.068 ms 64 bytes from 10.10.5.1: icmp_seq=1 ttl=64 time=0.039 ms
This is fine untill the arp entry ages out. Then I back to not being able to ping the device.
If I manually insert an arp table entry all is well. No filtering in the switches. Switches are SMC 6826 for the 10/100 and 8724 for the core and gig e.
I have several similar symptoms like this in various places. I have some devices on vlans that I see the dhcp discover messages and I see the dhcp offer then the device sends another dhcp discover. This goes on for a few times and the device just waits and starts the process over.
I feel that the problem lies in the handling of arp requests but I guess I just don't know enough about how linux handles them or how to control them to find a solution.
Any ideas ?
Alan
In article 47572D0D.4050107@udfc.com, Alan Bunch Alan.Bunch@udfc.com wrote:
Please bear with me as I know I have included a lot of detail.
Description Redhat AS 3 Kernel 2.4.21-47.0.1.ELsmp eth0 HWaddr 00:07:E9:11:30:76 inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
eth0:1 HWaddr 00:07:E9:11:30:76 inet addr:192.168.3.1 Bcast:192.168.3.255 Mask:255.255.255.0
This interface is up but with no IP address eth1 HWaddr 00:07:E9:11:30:77
eth1.101 10.10.1.1 Bcast:10.10.1.255 Mask:255.255.255.0 vlan 101 eth1.102 10.10.2.1 Bcast:10.10.2.255 Mask:255.255.255.0 vlan 102 eth1.103 10.10.3.1 Bcast:10.10.3.255 Mask:255.255.255.0 vlan 103 eth1.104 10.10.4.1 Bcast:10.10.4.255 Mask:255.255.255.0 vlan 104 eth1.105 10.10.5.1 Bcast:10.10.5.255 Mask:255.255.255.0 vlan 105 eth1.106 10.10.6.1 Bcast:10.10.6.255 Mask:255.255.255.0 vlan 106 eth1.107 10.10.7.1 Bcast:10.10.7.255 Mask:255.255.255.0 vlan 107 eth1.108 10.10.8.1 Bcast:10.10.8.255 Mask:255.255.255.0 vlan 108 eth1.109 10.10.9.1 Bcast:10.10.9.255 Mask:255.255.255.0 vlan 109 eth2 Link encap:Ethernet HWaddr 00:06:5B:FE:56:C2 BROADCAST MULTICAST MTU:1500 Metric:1 eth3 Link encap:Ethernet HWaddr 00:06:5B:FE:56:C3 BROADCAST MULTICAST MTU:1500 Metric:1
I assume the above machine is "the router". I don't know whether leaving eth1 without an IP address could be the source of any problems...
This machine is routing between the vlans. When I ping 10.10.5.105 I get Host Unreachable.
Presumably, 10.10.5.105 is what you refer to below as "the device".
Here is tcpdump from the ping on the router.
tcpdump -i eth1.105 14:10:28.254008 arp who-has 10.10.5.105 tell 10.10.5.1 14:10:29.250067 arp who-has 10.10.5.105 tell 10.10.5.1 14:10:30.250143 arp who-has 10.10.5.105 tell 10.10.5.1
This suggests either: a) the arp request is not being heard/understood by the device, or b) the arp reply is not being heard by the router, or even perhaps c) the lack of -n is causing tcpdump not to display some packets while it is trying to resolve addresses to hostnames.
Try again using this: "tcpdump -n -e -i any" - this will include the ethernet address and will monitor all interfaces instead of just 105 (in case a packet is going to the wrong interface).
Also, what is the routing table shown by "netstat -rn"?
Ok now I go to the device via a serial port and ping back to 10.10.5.1 ( the router ) and here is the tcpdump output
tcpdump -i eth1.105 -n tcpdump: listening on eth1.105 14:12:06.706722 arp who-has 10.10.5.1 tell 10.10.5.105 14:12:06.706798 arp reply 10.10.5.1 is-at 0:7:e9:11:30:77 14:12:06.707715 10.10.5.105 > 10.10.5.1: icmp: echo request (DF) 14:12:06.707762 10.10.5.1 > 10.10.5.105: icmp: echo reply 14:12:07.723100 10.10.5.105 > 10.10.5.1: icmp: echo request (DF) 14:12:07.723136 10.10.5.1 > 10.10.5.105: icmp: echo reply
OK...
Now of course I can ping 10.10.5.10 (the suspect device) from 10.10.5.1 ( the router )
Is 10.10.5.10 a typo for 10.10.5.105?
What kind of unit is the "suspect device"? Can you display "ifconfig -a" and "netstat -rn" or the equivalent on it?
ping 10.10.5.1 PING 10.10.5.1 (10.10.5.1) 56(84) bytes of data. 64 bytes from 10.10.5.1: icmp_seq=0 ttl=64 time=0.068 ms 64 bytes from 10.10.5.1: icmp_seq=1 ttl=64 time=0.039 ms
But this appears to be pinging TO the router, not pinging FROM the router!
This is fine untill the arp entry ages out. Then I back to not being able to ping the device.
If I manually insert an arp table entry all is well. No filtering in the switches. Switches are SMC 6826 for the 10/100 and 8724 for the core and gig e.
I have several similar symptoms like this in various places. I have some devices on vlans that I see the dhcp discover messages and I see the dhcp offer then the device sends another dhcp discover. This goes on for a few times and the device just waits and starts the process over.
I feel that the problem lies in the handling of arp requests but I guess I just don't know enough about how linux handles them or how to control them to find a solution.
Any ideas ?
Sounds more like a general broadcast issue to me, since DHCP discovers and offers are sent as broadcasts and therefore don't need ARP first. If you use -e in tcpdump you will see whether the broadcast ethernet address is being used (ff:ff:ff:ff:ff:ff) or not. If ARP and DHCP packets are not using the broadcast ethernet address, then something is not right with the netmask or the broadcast address.
Intersting problem - let us know how you get on.
Cheers Tony