[CentOS] Problems with Intel Ethernet and module e1000e

Fri Sep 23 10:54:20 UTC 2011
Volker Poplawski <volker at openbios.org>

Hi all,

I'm facing a serious problem with the e100e kernel module for Intel 
82574L gigabit nics on Centos 6.

The device eth0 suddenly stops working i.e. no more networking. When I 
do ifconfig from console I get

eth0      Link encap:Ethernet  HWaddr 00:xx:xx:xx:xx:EA
           inet6 addr: fe80::225:90ff:fe50:8fea/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:37984 errors:54245436935850 dropped:9040906155975 
overruns:0 frame:36163624623900
           TX packets:20884 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:4431149 (4.2 MiB)  TX bytes:4628666 (4.4 MiB)
           Memory:fb900000-fb920000

Reported byte count and  RX TX packet count is reasonable. Howerver the 
incredible large number of errors is not. Also they don't pile up but 
simply appear when the device stopped working.

In /var/log/messages I get:

Sep 23 12:21:09 wader2 kernel: ------------[ cut here ]------------
Sep 23 12:21:09 wader2 kernel: WARNING: at net/sched/sch_generic.c:261 
dev_watchdog+0x26d/0x280() (Not tainted)
Sep 23 12:21:09 wader2 kernel: Hardware name: X9SCL/X9SCM
Sep 23 12:21:09 wader2 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit 
queue 0 timed out
Sep 23 12:21:09 wader2 kernel: Modules linked in: tun ebtable_nat 
ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat 
sunrpc bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 kvm_intel kvm 
serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support sg e1000e ext4 
mbcache jbd2 sd_mod crc_t10dif ahci megaraid_sas dm_mod [last unloaded: 
scsi_wait_scan]
Sep 23 12:21:09 wader2 kernel: Pid: 0, comm: swapper Not tainted 
2.6.32-71.29.1.el6.x86_64 #1
Sep 23 12:21:09 wader2 kernel: Call Trace:
Sep 23 12:21:09 wader2 kernel: <IRQ>  [<ffffffff8106b947>] 
warn_slowpath_common+0x87/0xc0
Sep 23 12:21:09 wader2 kernel: [<ffffffff8106ba36>] 
warn_slowpath_fmt+0x46/0x50
Sep 23 12:21:09 wader2 kernel: [<ffffffff8142a07d>] dev_watchdog+0x26d/0x280
Sep 23 12:21:09 wader2 kernel: [<ffffffff8107d3c5>] ? 
internal_add_timer+0xb5/0x110
Sep 23 12:21:09 wader2 kernel: [<ffffffff81429e10>] ? dev_watchdog+0x0/0x280
Sep 23 12:21:09 wader2 kernel: [<ffffffff8107dfc7>] 
run_timer_softirq+0x197/0x340
Sep 23 12:21:09 wader2 kernel: [<ffffffff810a0e90>] ? 
tick_sched_timer+0x0/0xc0
Sep 23 12:21:09 wader2 kernel: [<ffffffff8102f52d>] ? 
lapic_next_event+0x1d/0x30
Sep 23 12:21:09 wader2 kernel: [<ffffffff81073d67>] __do_softirq+0xb7/0x1e0
Sep 23 12:21:09 wader2 kernel: [<ffffffff81095c50>] ? 
hrtimer_interrupt+0x140/0x250
Sep 23 12:21:09 wader2 kernel: [<ffffffff810142cc>] call_softirq+0x1c/0x30
Sep 23 12:21:09 wader2 kernel: [<ffffffff81015f35>] do_softirq+0x65/0xa0
Sep 23 12:21:09 wader2 kernel: [<ffffffff81073b65>] irq_exit+0x85/0x90
Sep 23 12:21:09 wader2 kernel: [<ffffffff814d0a31>] 
smp_apic_timer_interrupt+0x71/0x9c
Sep 23 12:21:09 wader2 kernel: [<ffffffff81013c93>] 
apic_timer_interrupt+0x13/0x20
Sep 23 12:21:09 wader2 kernel: <EOI>  [<ffffffff812dac0f>] ? 
acpi_idle_enter_bm+0x28f/0x2c3
Sep 23 12:21:09 wader2 kernel: [<ffffffff812dac08>] ? 
acpi_idle_enter_bm+0x288/0x2c3
Sep 23 12:21:09 wader2 kernel: [<ffffffff813df687>] 
cpuidle_idle_call+0xa7/0x140
Sep 23 12:21:09 wader2 kernel: [<ffffffff81011e96>] cpu_idle+0xb6/0x110
Sep 23 12:21:09 wader2 kernel: [<ffffffff814b1a0a>] rest_init+0x7a/0x80
Sep 23 12:21:09 wader2 kernel: [<ffffffff818c3f19>] start_kernel+0x413/0x41f
Sep 23 12:21:09 wader2 kernel: [<ffffffff818c333a>] 
x86_64_start_reservations+0x125/0x129
Sep 23 12:21:09 wader2 kernel: [<ffffffff818c3438>] 
x86_64_start_kernel+0xfa/0x109
Sep 23 12:21:09 wader2 kernel: ---[ end trace 69b6c5e494cffe4d ]---
Sep 23 12:21:10 wader2 kernel: 0000:04:00.0: eth0: Error reading PHY 
register
Sep 23 12:21:10 wader2 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps 
Full Duplex, Flow Control: RX/TX


The last line falsely reports the link to be 1000Mbit but it is actually 
100Mbit. Same does ethtool.

Bringing down the interface with ifconfig eth0 down and then ifconfig 
eth0 up does not help. A reboot gets the interface back to normal. The 
problem returns after some minutes, hours or a day.



Any ideas?
Regards
.......Volker