We have an HP DL360 server with dual on-board Tigon3 ethernet cards. We are using eth0, eth1 is unused at the moment. Sometimes when the network interface is under heavy load, for example moving large file transfers over rsync or NFS, the network interface stops working and we lose all connection to the server. The only solution at this point is to jump on the console and restart the network interface. I have not found anything in the log files to indicate what is causing this. Has anyone else experienced something similar? Or perhaps you know how I could troubleshoot this?
On Thu, Oct 09, 2008 at 07:44:41AM -0500, Sean Carolan wrote:
We have an HP DL360 server with dual on-board Tigon3 ethernet cards.
...
how I could troubleshoot this?
why don't you start with the kernel version and architecture? -> uname -a -> /var/log/messages relevant lines? -> /sbin/ifconfig -a -> ethtool eth0 and ethtol eth1
Tru
why don't you start with the kernel version and architecture? -> uname -a
This server is running centos 3.9 Linux server.domain.com 2.4.21-57.ELsmp #1 SMP Wed May 7 06:10:55 EDT 2008 i686 i686 i386 GNU/Linux
-> /var/log/messages relevant lines?
There was nothing out of the ordinary in /var/log/messages. The logging just stops after the network card drops offline. dmesg also shows nothing out of the ordinary when the driver is loaded. The network card works fine until it is under heavy load.
-> /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF inet addr:10.100.1.200 Bcast:10.100.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:56261 errors:0 dropped:0 overruns:0 frame:0 TX packets:30199 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5969478 (5.6 Mb) TX bytes:3305868 (3.1 Mb) Interrupt:26
(MAC address was changed by me)
-> ethtool eth0 and ethtool eth1
Settings for eth0: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x000000ff (255) Link detected: yes
Settings for eth1: Supported ports: [ MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! (0) Duplex: Half Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Current message level: 0x000000ff (255) Link detected: no
Hi,
please leave the attribution when you reply ;)
On Thu, Oct 09, 2008 at 08:27:34AM -0500, Sean Carolan wrote:
This server is running centos 3.9 Linux server.domain.com 2.4.21-57.ELsmp #1 SMP Wed May 7 06:10:55 EDT 2008 i686 i686 i386 GNU/Linux
3.9 32 bits SMP latest kernel version.
There was nothing out of the ordinary in /var/log/messages. The logging just stops after the network card drops offline. dmesg also shows nothing out of the ordinary when the driver is loaded. The network card works fine until it is under heavy load.
-> /sbin/ifconfig -a
eth0 Link encap:Ethernet HWaddr FF:FF:FF:FF:FF:FF inet addr:10.100.1.200 Bcast:10.100.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:56261 errors:0 dropped:0 overruns:0 frame:0 TX packets:30199 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5969478 (5.6 Mb) TX bytes:3305868 (3.1 Mb) Interrupt:26
not much my compute node running the 64 bits version are showing: ... 2.4.21-57.ELsmp #1 SMP Wed May 7 05:32:23 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux /etc/modules.conf: ... alias eth1 tg3 ... eth1 Link encap:Ethernet HWaddr 00:E0:81:xx:xx:xx inet addr:157.99.90.xxx Bcast:157.99.90.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1016657124 errors:0 dropped:0 overruns:0 frame:0 TX packets:831373335 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:211598527896 (201796.0 Mb) TX bytes:217466051363 (207391.7 Mb) Interrupt:24
[tru@aaricia ~]$ uptime 16:12:45 up 126 days, 19:17, 1 user, load average: 4.07, 3.88, 3.12 [tru@aaricia ~]$ lspci 02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) 02:09.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
<snip ethtool results: look fine >
check for a loose network cable?
Tru
on 10-9-2008 6:27 AM Sean Carolan spake the following:
why don't you start with the kernel version and architecture? -> uname -a
This server is running centos 3.9 Linux server.domain.com 2.4.21-57.ELsmp #1 SMP Wed May 7 06:10:55 EDT 2008 i686 i686 i386 GNU/Linux
-> /var/log/messages relevant lines?
There was nothing out of the ordinary in /var/log/messages. The logging just stops after the network card drops offline. dmesg also shows nothing out of the ordinary when the driver is loaded. The network card works fine until it is under heavy load.
Since you are running CentOS 3 I am assuming this server has been in production for some time. Did these symptoms just start? Did you do any updates before this started happening? Did you install this equipment, or did you assume admin duties from someone else? Maybe it had the HP net driver installed, and a kernel update broke that.
There was nothing out of the ordinary in /var/log/messages. The logging just stops after the network card drops offline. dmesg also shows nothing out of the ordinary when the driver is loaded. The network card works fine until it is under heavy load.
Since you are running CentOS 3 I am assuming this server has been in production for some time. Did these symptoms just start? Did you do any updates before this started happening? Did you install this equipment, or did you assume admin duties from someone else? Maybe it had the HP net driver installed, and a kernel update broke that.
It appears to be a faulty NIC card. We moved the connection over to the other card and have not had any problems since. Sometimes it's just plain "broken"! :)
Sean Carolan wrote:
We have an HP DL360 server with dual on-board Tigon3 ethernet cards. We are using eth0, eth1 is unused at the moment. Sometimes when the network interface is under heavy load, for example moving large file transfers over rsync or NFS, the network interface stops working and we lose all connection to the server. The only solution at this point is to jump on the console and restart the network interface. I have not found anything in the log files to indicate what is causing this. Has anyone else experienced something similar? Or perhaps you know how I could troubleshoot this?
Upgrade the driver? Back in my RHEL3 days(I noticed you were running CentOS 3.9), we didn't even bother using the broadcom NICs and instead installed e1000 on all the systems due to driver issues with the broadcom chips(this was back in 2003-2005, DL360G2-G3).
Get the latest drivers at www.broadcom.com
nate