[CentOS] kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

Wed May 21 17:55:40 UTC 2014
Steve Clark <sclark at netwolves.com>

Hi,

anybody know how to fix this.

May 20 12:16:15 wolfpac kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
May 20 12:16:15 wolfpac kernel: Modules linked in: pf_ring(U) af_key iptable_nat ipt_LOG iptable_filter ip_tables nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ip6t_LOG xt_limit ip6table_filter ip6_tables bridge stp llc nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack ip6_tunnel tunnel6 ip_gre ipv6 ext3 jbd plcm_drv(U) sled_drv(U) wd_drv(U) ppdev parport_pc parport r8169 mii microcode serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support shpchp igb ixgbe dca ptp(T) pps_core mdio ext4 jbd2 mbcache sd_mod crc_t10dif ahci i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
May 20 12:16:15 wolfpac kernel: Pid: 0, comm: swapper Tainted: G           ---------------  T 2.6.32-358.23.2.el6.centos.plus.x86_64 #1
May 20 12:16:15 wolfpac kernel: Call Trace:
May 20 12:16:15 wolfpac kernel: <IRQ> [<ffffffff8106e3e7>] ? warn_slowpath_common+0x87/0xc0
May 20 12:16:15 wolfpac kernel: [<ffffffff8106e4d6>] ? warn_slowpath_fmt+0x46/0x50
May 20 12:16:15 wolfpac kernel: [<ffffffff8146f35d>] ? dev_watchdog+0x26d/0x280
May 20 12:16:15 wolfpac kernel: [<ffffffff81012c09>] ? sched_clock+0x9/0x10
May 20 12:16:15 wolfpac kernel: [<ffffffff8146f0f0>] ? dev_watchdog+0x0/0x280
May 20 12:16:15 wolfpac kernel: [<ffffffff81081937>] ? run_timer_softirq+0x197/0x340
May 20 12:16:15 wolfpac kernel: [<ffffffff810a8060>] ? tick_sched_timer+0x0/0xc0
May 20 12:16:15 wolfpac kernel: [<ffffffff8102ea2d>] ? lapic_next_event+0x1d/0x30
May 20 12:16:15 wolfpac kernel: [<ffffffff810770b1>] ? __do_softirq+0xc1/0x1e0
May 20 12:16:15 wolfpac kernel: [<ffffffff8109b87b>] ? hrtimer_interrupt+0x14b/0x260
May 20 12:16:15 wolfpac kernel: [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30
May 20 12:16:15 wolfpac kernel: [<ffffffff8100de05>] ? do_softirq+0x65/0xa0
May 20 12:16:15 wolfpac kernel: [<ffffffff81076e95>] ? irq_exit+0x85/0x90
May 20 12:16:15 wolfpac kernel: [<ffffffff8151ec20>] ? smp_apic_timer_interrupt+0x70/0x9b
May 20 12:16:15 wolfpac kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
May 20 12:16:15 wolfpac kernel: <EOI> [<ffffffff812da8fe>] ? intel_idle+0xde/0x170
May 20 12:16:15 wolfpac kernel: [<ffffffff812da8e1>] ? intel_idle+0xc1/0x170
May 20 12:16:15 wolfpac kernel: [<ffffffff8141c2a7>] ? cpuidle_idle_call+0xa7/0x140
May 20 12:16:15 wolfpac kernel: [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
May 20 12:16:15 wolfpac kernel: [<ffffffff8150e9c0>] ? start_secondary+0x2ac/0x2ef
May 20 12:16:15 wolfpac kernel: ---[ end trace 2426f74a18da7744 ]---
May 20 12:16:15 wolfpac kernel: r8169 0000:05:00.0: eth0: link up
May 20 12:16:24 wolfpac flash_the_led.pl: Both ping sites failed flash red-green
May 20 12:16:36 wolfpac flash_the_led.pl: Both ping sites failed flash red-green
May 20 12:16:48 wolfpac flash_the_led.pl: Both ping sites failed flash red-green
May 20 12:17:00 wolfpac flash_the_led.pl: Both ping sites failed flash red-green
May 20 12:17:09 wolfpac flash_the_led.pl: Both ping sites failed flash red-green
May 20 12:17:21 wolfpac flash_the_led.pl: Both ping sites failed flash red-green
May 20 12:17:33 wolfpac kernel: r8169 0000:05:00.0: eth0: link up

05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
         Subsystem: Realtek Semiconductor Co., Ltd. Device 0123
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 64 (8000ns min, 16000ns max), Cache Line Size: 32 bytes
         Interrupt: pin A routed to IRQ 16
         Region 0: I/O ports at b000 [size=256]
         Region 1: Memory at f7820000 (32-bit, non-prefetchable) [size=256]
         Expansion ROM at dff00000 [disabled] [size=128K]
         Capabilities: [dc] Power Management version 2
                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
         Kernel driver in use: r8169
         Kernel modules: r8169

Even though it says the link is up - the link is dead.

This a remote unit and this NIC is the management port so it is a real pain when this happens.
So far it has happened twice an we have had to call someone and have them power cycle the system.

Based on what I have found on the net this seems to happen a lot with this nic.
We have upgraded to the latest stock CentOS kernel and added the following
to the kernel command line in grub.
pcie_aspm=off

I've also taken the draconian measure of adding a ping to the default route in the watchdog.conf
file to cause a reboot if it happens again.

I have looked at the driver version in the latest long term kernel (3.10.40-1.el6.elrepo)
and it shows as the same as this kernel. From modinfo r8169
version:        2.3LK-NAPI

Thanks,
Steve


-- 
Stephen Clark
*NetWolves Managed Services, LLC.*
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark at netwolves.com
http://www.netwolves.com