Dear All,
I am having a HP xw4400 with following ethernet controller as reported by lspci
Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express (rev 02)
This machine was running CentOS 5.2 without any problem. After updating the machine with yum update on 8 April, after which it is showing to be CentOS 5.3, this machine stops communicating intermittantly and I see the following message correspondingly in /var/log/messages
Apr 18 10:30:07 kernel: tg3: eth0: Link is down. Apr 18 10:30:10 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:30:10 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:30:24 kernel: tg3: eth0: Link is down. Apr 18 10:30:27 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:30:27 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:30:29 kernel: tg3: eth0: Link is down. Apr 18 10:30:32 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:30:32 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:30:46 kernel: tg3: eth0: Link is down. Apr 18 10:30:49 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:30:49 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:30:50 kernel: tg3: eth0: Link is down. Apr 18 10:30:52 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:30:52 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:36:46 kernel: tg3: eth0: Link is down. Apr 18 10:36:49 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:36:49 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:37:43 kernel: tg3: eth0: Link is down.
The interval for which it happens is about order of few minutes.The kernel it is running now is 2.6.18-128.1.6.el5. Can anybody help.
Regards, Mangesh
2009/4/21 Mangesh S. Umbarje mangesh@gmrt.ncra.tifr.res.in
Dear All,
I am having a HP xw4400 with following ethernet controller
as reported by lspci
Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express (rev 02)
This machine was running CentOS 5.2 without any problem. After
updating the machine with yum update on 8 April, after which it is showing to be CentOS 5.3, this machine stops communicating intermittantly and I see the following message correspondingly in /var/log/messages
Apr 18 10:30:07 kernel: tg3: eth0: Link is down. Apr 18 10:30:10 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:30:10 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:30:24 kernel: tg3: eth0: Link is down.
etc...
The interval for which it happens is about order of few minutes.The
kernel it is running now is 2.6.18-128.1.6.el5. Can anybody help.
Have you tried running with the old kernel from before the update to verify that it doesn't occur with that version?
It is possible that the update is just a coincidence and you are actually facing link problems due to the card, the cable, the switch or any of the connections between...
d
I have checked the physical connectivity which is perfectly fine. This machine is very critical which we need to keep running as much as possible. So I had gone to older kernel 2.6.18-92.1.17.el5. But along with this kernel, I had added one more Dlink ethernet card which shows to be
05:09.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
and it uses the skge driver.
So I could test the machine with this new card and kernel-2.6.18-92.1.17.el5 togather. Still I do get the network breaks but the frequency reduced to factor of Ten. The corresponding messages seen in the /var/log/messages are as follow.
Apr 21 10:18:32 kernel: skge eth1: Link is down. Apr 21 10:18:36 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:55 kernel: skge eth1: Link is down. Apr 21 10:18:57 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:58 kernel: skge eth1: Link is down. Apr 21 10:19:01 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none
Regards, Mangesh
On Tue, 21 Apr 2009, D Tucny wrote:
2009/4/21 Mangesh S. Umbarje mangesh@gmrt.ncra.tifr.res.in Dear All,
I am having a HP xw4400 with following ethernet controller as reported by lspci Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express (rev 02) This machine was running CentOS 5.2 without any problem. After updating the machine with yum update on 8 April, after which it is showing to be CentOS 5.3, this machine stops communicating intermittantly and I see the following message correspondingly in /var/log/messages Apr 18 10:30:07 kernel: tg3: eth0: Link is down. Apr 18 10:30:10 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex. Apr 18 10:30:10 kernel: tg3: eth0: Flow control is off for TX and off for RX. Apr 18 10:30:24 kernel: tg3: eth0: Link is down.
etc...
The interval for which it happens is about order of few minutes.The kernel it is running now is 2.6.18-128.1.6.el5. Can anybody help.
Have you tried running with the old kernel from before the update to verify that it doesn't occur with that version?
It is possible that the update is just a coincidence and you are actually facing link problems due to the card, the cable, the switch or any of the connections between...
d
2009/4/21 Mangesh S. Umbarje mangesh@gmrt.ncra.tifr.res.in
I have checked the physical connectivity which is perfectly fine.
This machine is very critical which we need to keep running as much as possible. So I had gone to older kernel 2.6.18-92.1.17.el5. But along with this kernel, I had added one more Dlink ethernet card which shows to be
05:09.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
and it uses the skge driver.
So I could test the machine with this new card and
kernel-2.6.18-92.1.17.el5 togather. Still I do get the network breaks but the frequency reduced to factor of Ten. The corresponding messages seen in the /var/log/messages are as follow.
Apr 21 10:18:32 kernel: skge eth1: Link is down. Apr 21 10:18:36 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:55 kernel: skge eth1: Link is down. Apr 21 10:18:57 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:58 kernel: skge eth1: Link is down. Apr 21 10:19:01 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none
So, with a preupdate kernel and a different NIC, which uses different drivers, you're still getting the problems...
I'm more convinced of a physical problem... Have you tried a different switch port? Does the cabling run past a laser printer? Does the cabling run parallel to some electrical cabling with intermittent high load, such as to cooling or heating systems? If you run the port at 100Mbps do you still get the same problem? How did you confirm the physical connectivity was fine?
d
When the date was Tuesday 21 April 2009, Mangesh S. Umbarje wrote:
I have checked the physical connectivity which is perfectly
fine. This machine is very critical which we need to keep running as much as possible. So I had gone to older kernel 2.6.18-92.1.17.el5. But along with this kernel, I had added one more Dlink ethernet card which shows to be
05:09.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
and it uses the skge driver.
So I could test the machine with this new card and
kernel-2.6.18-92.1.17.el5 togather. Still I do get the network breaks but the frequency reduced to factor of Ten. The corresponding messages seen in the /var/log/messages are as follow.
Apr 21 10:18:32 kernel: skge eth1: Link is down. Apr 21 10:18:36 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:55 kernel: skge eth1: Link is down. Apr 21 10:18:57 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:58 kernel: skge eth1: Link is down. Apr 21 10:19:01 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none
Try to disable auto-negotiation for the NIC, using ethtool. Set manually the speed to 100 and then 1000 Mbps and see what happens.
The machine has static IP or it uses DHCP? It's a good idea to do some traffic capture and check if there is any correlation between the traffic and the link state.
The machine has a static IP. If I try to change speed or autoneg or both the network completely stops. It starts working only after I issue service network restart.
Regards, Mangesh
On Tue, 21 Apr 2009, Michael Iatrou wrote:
When the date was Tuesday 21 April 2009, Mangesh S. Umbarje wrote:
I have checked the physical connectivity which is perfectly
fine. This machine is very critical which we need to keep running as much as possible. So I had gone to older kernel 2.6.18-92.1.17.el5. But along with this kernel, I had added one more Dlink ethernet card which shows to be
05:09.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
and it uses the skge driver.
So I could test the machine with this new card and
kernel-2.6.18-92.1.17.el5 togather. Still I do get the network breaks but the frequency reduced to factor of Ten. The corresponding messages seen in the /var/log/messages are as follow.
Apr 21 10:18:32 kernel: skge eth1: Link is down. Apr 21 10:18:36 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:55 kernel: skge eth1: Link is down. Apr 21 10:18:57 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none Apr 21 10:18:58 kernel: skge eth1: Link is down. Apr 21 10:19:01 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none
Try to disable auto-negotiation for the NIC, using ethtool. Set manually the speed to 100 and then 1000 Mbps and see what happens.
The machine has static IP or it uses DHCP? It's a good idea to do some traffic capture and check if there is any correlation between the traffic and the link state.
2009/4/22 Mangesh S. Umbarje mangesh@gmrt.ncra.tifr.res.in
The machine has a static IP. If I try to change speed or
autoneg or both the network completely stops. It starts working only after I issue service network restart.
You can't have a gigabit connection with autoneg off and to manually set the speed you'd need to set it on both sides of the connection...
d