I am having an extremely strange issue with a BL460c1 (G1) blade on a c7000 enclosure. I could not for the life of me get the machine to ping the gateway or any other blade on the same enclosure (yes, the subnet mask was correct & quadruple-checked); although pinging to the local IP works. I was almost convinced that it was a network or hardware issue; until I asked someone to try installing Windows on that blade to verify that it was not working as well; however to my surprise it worked fine in Windows after installing the network driver; by just setting the IP address (which was the same IP I was trying to configure CentOS with).
The interfaces come up fine (eth0 & eth1); the cards are: 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12) 07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
NetworkManager is disabled. There is no bonding configured. The network firmware is the latest available from the HP website (bootcode 4.4.1, CLP 1.3.6). The network is supposed to be simple; if I do a:
ifconfig eth0 172.12.34.112 netmask 255.255.255.0
I am supposed to be able to:
ping 172.12.34.1
However, I get:
PING 172.12.34.1 (172.12.34.1) 56(84) bytes of data. From 172.12.34.112 icmp_seq=2 Destination Host Unreachable From 172.12.34.112 icmp_seq=3 Destination Host Unreachable From 172.12.34.112 icmp_seq=4 Destination Host Unreachable From 172.12.34.112 icmp_seq=5 Destination Host Unreachable
I cannot even ping blades in the same enclosure. Yet in windows server 2008 after putting the same IP pinging the same IPs works.
I tried: - the disable_msi=1 parameter of the bnx2 driver - a fresh install of CentOS 6.3 - a fresh install of RHEL 6.2 - a live CD of CentOS 5.5 - Clonezilla-ubuntu and systemrescuecd (kernel 3.2) live CDs - CentOS 6.3 with the kernel(-headers,-firmware,-devel) updated to the latest (RPMs were copied with a USB/iLO) - the bnx2 network driver available from the HP website - the bnx2 network driver available from Broadcom
Other than a check_ncic warning with the stock 6.3 driver (doesn't appear when trying other drivers); the bnx2 isn't logging anything problematic when I checked with dmesg or /var/log/messages. With tcpdump; strangely, I get random traffic destined to different IPs (probably from the same enclosure), but those IPs do not ping either.
However, I got a feeling of deja vu in the midst of all this; as I recall setting up an RHEL machine somewhere else 2-3 years ago; and had the issue of the network working with Windows but not Linux; it turned out that it was because Autonegotiation was disabled with the gigabit network (the few attempts with playing with ethtool did not work); and searching online led me to people saying it ought to be enabled anyway as a standard requires it for gigabit. When the network guys enabled autoneg; it started working in Linux.
I am wondering if I'm facing the same issue here, as I see ethtool saying: Advertised auto-negotiation: No Speed: 1000Mb/s I tried doing some ethtool settings (setting autoneg off and forcing 1000, other random options, etc) but it didn't help; but possibly I was trying the wrong things. I am not very familiar with HP blades; but it seems I cannot enable Autoneg for this blade (?); I do not have direct access to the hardware though (everything was done through iLO).
Any of you faced issues with a gigabit network with Autoneg disabled? Or any other ideas? This is the only Linux machine in the entire network.
Here are some extra info:
ethtool eth0:
Settings for eth0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 2500baseX/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 2500baseX/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 1000Mb/s Duplex: Full Port: FIBRE PHYAD: 2 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Link detected: yes -----------------------
modinfo bnx2:
filename: /lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/bnx2.ko ... version: 2.2.1 license: GPL description: Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver ... vermagic: 2.6.32-279.11.1.el6.x86_64 SMP mod_unload modversions ... -----------------------
route -n:
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 172.12.34.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0 0.0.0.0 172.12.34.1 0.0.0.0 UG 0 0 0 eth0 ---------------------
ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:19:BB:34:FA:70 inet addr:172.12.34.112 Bcast:172.12.34.255 Mask:255.255.255.0 inet6 addr: fe80::219:bbff:fe34:fa70/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6800 errors:0 dropped:0 overruns:0 frame:0 TX packets:34 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4616962 (4.4 MiB) TX bytes:4827 (4.7 KiB) Interrupt:16 Memory:f6000000-f6012800
eth1 Link encap:Ethernet HWaddr 00:19:BB:34:FA:78 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:16 Memory:fa000000-fa012800
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 ....
traceroute 172.12.34.1:
traceroute to 172.12.34.1 (172.12.34.1), 30 hops max, 60 byte packets 1 172.12.34.112 (172.12.34.112) 3000.686 ms !H 3000.673 ms !H 3000.661 ms !H ....
traceroute 172.12.34.112
traceroute to 172.12.34.112 (172.12.34.112), 30 hops max, 60 byte packets 1 172.12.34.112 (172.12.34.112) 0.041 ms 0.011 ms 0.009 ms
-xrx
apparently, your device driver doesn't like your forced setup, as you can see from the traceroute. I think you can try to install another driver to see if it works fine for you.
Also, you can check the switch side. Seeing what the switch says about the interface you connected to. Dropping or some others.
------------ Banyan He Blog: http://www.rootong.com Email: banyan@rootong.com
On 2012-10-28 1:54 AM, xrx wrote:
I am having an extremely strange issue with a BL460c1 (G1) blade on a c7000 enclosure. I could not for the life of me get the machine to ping the gateway or any other blade on the same enclosure (yes, the subnet mask was correct & quadruple-checked); although pinging to the local IP works. I was almost convinced that it was a network or hardware issue; until I asked someone to try installing Windows on that blade to verify that it was not working as well; however to my surprise it worked fine in Windows after installing the network driver; by just setting the IP address (which was the same IP I was trying to configure CentOS with).
The interfaces come up fine (eth0 & eth1); the cards are: 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12) 07:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
NetworkManager is disabled. There is no bonding configured. The network firmware is the latest available from the HP website (bootcode 4.4.1, CLP 1.3.6). The network is supposed to be simple; if I do a:
ifconfig eth0 172.12.34.112 netmask 255.255.255.0
I am supposed to be able to:
ping 172.12.34.1
However, I get:
PING 172.12.34.1 (172.12.34.1) 56(84) bytes of data. From 172.12.34.112 icmp_seq=2 Destination Host Unreachable From 172.12.34.112 icmp_seq=3 Destination Host Unreachable From 172.12.34.112 icmp_seq=4 Destination Host Unreachable From 172.12.34.112 icmp_seq=5 Destination Host Unreachable
I cannot even ping blades in the same enclosure. Yet in windows server 2008 after putting the same IP pinging the same IPs works.
I tried:
- the disable_msi=1 parameter of the bnx2 driver
- a fresh install of CentOS 6.3
- a fresh install of RHEL 6.2
- a live CD of CentOS 5.5
- Clonezilla-ubuntu and systemrescuecd (kernel 3.2) live CDs
- CentOS 6.3 with the kernel(-headers,-firmware,-devel) updated to the
latest (RPMs were copied with a USB/iLO)
- the bnx2 network driver available from the HP website
- the bnx2 network driver available from Broadcom
Other than a check_ncic warning with the stock 6.3 driver (doesn't appear when trying other drivers); the bnx2 isn't logging anything problematic when I checked with dmesg or /var/log/messages. With tcpdump; strangely, I get random traffic destined to different IPs (probably from the same enclosure), but those IPs do not ping either.
However, I got a feeling of deja vu in the midst of all this; as I recall setting up an RHEL machine somewhere else 2-3 years ago; and had the issue of the network working with Windows but not Linux; it turned out that it was because Autonegotiation was disabled with the gigabit network (the few attempts with playing with ethtool did not work); and searching online led me to people saying it ought to be enabled anyway as a standard requires it for gigabit. When the network guys enabled autoneg; it started working in Linux.
I am wondering if I'm facing the same issue here, as I see ethtool saying: Advertised auto-negotiation: No Speed: 1000Mb/s I tried doing some ethtool settings (setting autoneg off and forcing 1000, other random options, etc) but it didn't help; but possibly I was trying the wrong things. I am not very familiar with HP blades; but it seems I cannot enable Autoneg for this blade (?); I do not have direct access to the hardware though (everything was done through iLO).
Any of you faced issues with a gigabit network with Autoneg disabled? Or any other ideas? This is the only Linux machine in the entire network.
Here are some extra info:
ethtool eth0:
Settings for eth0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 2500baseX/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 2500baseX/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 1000Mb/s Duplex: Full Port: FIBRE PHYAD: 2 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Link detected: yes
modinfo bnx2:
filename: /lib/modules/2.6.32-279.11.1.el6.x86_64/kernel/drivers/net/bnx2.ko ... version: 2.2.1 license: GPL description: Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver ... vermagic: 2.6.32-279.11.1.el6.x86_64 SMP mod_unload modversions ...
route -n:
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 172.12.34.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0 0.0.0.0 172.12.34.1 0.0.0.0 UG 0 0 0 eth0
ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:19:BB:34:FA:70 inet addr:172.12.34.112 Bcast:172.12.34.255 Mask:255.255.255.0 inet6 addr: fe80::219:bbff:fe34:fa70/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6800 errors:0 dropped:0 overruns:0 frame:0 TX packets:34 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4616962 (4.4 MiB) TX bytes:4827 (4.7 KiB) Interrupt:16 Memory:f6000000-f6012800
eth1 Link encap:Ethernet HWaddr 00:19:BB:34:FA:78 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:16 Memory:fa000000-fa012800
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 ....
traceroute 172.12.34.1:
traceroute to 172.12.34.1 (172.12.34.1), 30 hops max, 60 byte packets 1 172.12.34.112 (172.12.34.112) 3000.686 ms !H 3000.673 ms !H 3000.661 ms !H ....
traceroute 172.12.34.112
traceroute to 172.12.34.112 (172.12.34.112), 30 hops max, 60 byte packets 1 172.12.34.112 (172.12.34.112) 0.041 ms 0.011 ms 0.009 ms
-xrx
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos .
On 10/27/12 21:54, xrx wrote:
I am having an extremely strange issue with a BL460c1 (G1) blade on a c7000 enclosure. I could not for the life of me get the machine to ping the gateway or any other blade on the same enclosure (yes, the subnet mask was correct & quadruple-checked); although pinging to the local IP works. I was almost convinced that it was a network or hardware issue; until I asked someone to try installing Windows on that blade to verify that it was not working as well; however to my surprise it worked fine in Windows after installing the network driver; by just setting the IP address (which was the same IP I was trying to configure CentOS with).
I finally solved it; although a mystery remains. After getting a hint between differences in the network traffic between windows and linux, it turns out that if I specify the VLAN manually in CentOS; everything pings and works fine.
However, the strange part is that Windows did not require specifying the VLAN. I grudgingly installed Windows myself to verify that it works out of the box (the settings show it's set to VLAN 0 or disabled). I did a wireshark comparison between Windows Server 2008 R2 & CentOS on the same blade. What's unusual is that although both have pretty much identical ARP requests down to the mac addresses, the reply for the Windows one is normal with no mention of the VLAN; while the reply for the CentOS one has a 802.1Q Virtual LAN header with the ID (which I guess CentOS does not understand when its network is not configured for VLANs; and so repeats the ARP requests several times).
Anyway, I'm glad it's working fine now; hope this helps anyone in a similar situation at some point.
-xrx
On 10/29/2012 01:47 AM, xrx wrote:
I finally solved it; although a mystery remains.
Linux, apparently, does not currently support 802.1Q priority tags by default. A patch was suggested to add such support, but I can't tell from the following thread whether it made it to general release, or when it did if so.
http://comments.gmane.org/gmane.linux.network/163762
For now, I expect that you'd need to manually configure a "0" VLAN on interfaces attached to networks where priority tagged packets are used. If that patch was accepted, it may be sufficient to simply load the 8021q kernel module.