[CentOS] Experiencing continual eth0 link up/down on a 10G Chelsio NIC (cxgb3 driver)

Mon Feb 8 22:18:09 UTC 2010
Jobst Schmalenbach <jobst at barrett.com.au>

IMHO link failures are never a LOCAL problem ONLY but BOTH sides,
and one of the people answering this question has already
explained the stuff with the (auto) negotiation.

Before that check:

 * other side
 * if the other is a switch, use different port
 * cable (make sure connectors sit properly and are clean)
 * lock the protocol (i.e. 100mb full duplex) both sides.

Jobst



On Tue, Feb 09, 2010 at 12:13:53AM +0530, Arun Khan (knura9 at gmail.com) wrote:
> File Server OS: CentOS 5.3 (x86_64)
> Kernel: CentOS  Plus kernel (need XFS fs drivers)
> 
> The file server has a Chelsio T310 10GBASE-CX4 RNIC (rev 3) PCI
> Express x8 MSI-X (eth0), driver and firmware is stock from the CentOS
> Plus kernel.
> 
> Using ethtool  I have verified driver association with the 3 NICs on
> the system (eth1 and eth2 are not connected to any switch)
> 
> Driver for eth0
> driver: cxgb3
> version: 1.1.3-ko
> firmware-version: T 7.4.0 TP 1.1.0
> 
> Driver for eth1
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 1.0-0
> 
> Driver for eth2
> driver: e1000e
> version: 1.0.2-k2
> firmware-version: 1.0-0
> 
> 
> The last 3-4 weeks, I have noticed that the eth0 link keeps going up
> and down, confirmed by "dmesg" output as well in /var/log/messages
> (dmesg sample shown below).
> 
> eth0: link down
> eth0: link up, 10Gbps, full-duplex
> eth0: link down
> eth0: link up, 10Gbps, full-duplex
> eth0: link down
> eth0: link up, 10Gbps, full-duplex
> 
> The kernel RPM verification shows no errors
> 
> # uname --kernel-release
> 2.6.18-164.2.1.el5.plus
> 
> # rpm --verify kernel-2.6.18-164.2.1.el5.plus
> 
> The hardware vendor tells me that the card either fails completely
> (kaput) or works - there is no grey area.  He is of the opinion that
> the problem is with the driver.
> 
> Verification of the kernel rpm tells me that all files including the
> cxgb3 driver file md5sum are OK.
> 
> I would like to hear from anyone with the same NIC or another rev.
> using the same driver.
>     Are you seeing similar link up/down in your system?
>     How did you solve the problem?
> 
> TIA
> -- Arun Khan
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos

-- 
I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... "Are you sure?" ... YES ... Phew ... I'm out!

  | |0| |   Jobst Schmalenbach, jobst at barrett.com.au, General Manager
  | | |0|   Barrett Consulting Group P/L & The Meditation Room P/L
  |0|0|0|   +61 3 9532 7677, POBox 277, Caulfield South, 3162, Australia