[CentOS] tg3 BCM5755 intermittantly stops working after upgrade to 5.3.

Tue Apr 21 08:52:44 UTC 2009
Mangesh S. Umbarje <mangesh at gmrt.ncra.tifr.res.in>

           I have checked the physical connectivity which is perfectly 
fine. This machine is very critical which we need to keep running as much 
as possible. So I had gone to older kernel 2.6.18-92.1.17.el5. But along 
with this kernel, I had added one more Dlink ethernet card which shows to 
be

05:09.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)

and it uses the skge driver.

           So I could test the machine with this new card and 
kernel-2.6.18-92.1.17.el5 togather. Still I do get the network breaks but 
the frequency reduced to factor of Ten. The corresponding messages seen in 
the /var/log/messages are as follow.

Apr 21 10:18:32 kernel: skge eth1: Link is down.
Apr 21 10:18:36 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none
Apr 21 10:18:55 kernel: skge eth1: Link is down.
Apr 21 10:18:57 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none
Apr 21 10:18:58 kernel: skge eth1: Link is down.
Apr 21 10:19:01 kernel: skge eth1: Link is up at 1000 Mbps, full duplex, flow control none

Regards,
Mangesh


On Tue, 21 Apr 2009, D Tucny wrote:

> 2009/4/21 Mangesh S. Umbarje <mangesh at gmrt.ncra.tifr.res.in>
>       Dear All,
>
>                   I am having a HP xw4400 with following ethernet controller
>       as reported by lspci
>
>       Broadcom Corporation NetXtreme BCM5755 Gigabit Ethernet PCI Express (rev 02)
>
>                   This machine was running CentOS 5.2 without any problem. After
>       updating the machine with yum update on 8 April, after which it is showing
>       to be CentOS 5.3, this machine stops communicating intermittantly and I
>       see the following message correspondingly in /var/log/messages
>
>       Apr 18 10:30:07 kernel: tg3: eth0: Link is down.
>       Apr 18 10:30:10 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
>       Apr 18 10:30:10 kernel: tg3: eth0: Flow control is off for TX and off for RX.
>       Apr 18 10:30:24 kernel: tg3: eth0: Link is down.
> 
> 
> etc...
>  
>
>              The interval for which it happens is about order of few minutes.The
>       kernel it is running now is 2.6.18-128.1.6.el5. Can anybody help.
> 
> 
> Have you tried running with the old kernel from before the update to verify that it doesn't occur with that version?
> 
> It is possible that the update is just a coincidence and you are actually facing link problems due to the card, the
> cable, the switch or any of the connections between...
> 
> d
> 
>