Failing Network card - Discuss

20 Jun 2012


      Everyone,
Most of the time I am over my head in trying to troubleshoot problems.
However, after reading manuals, man pages, and getting advice from this
list I have been able to work my way through difficulties, and at the
end, I usually have a better understanding of what 'is going on'.  I can
only hope this method will work on this problem too.
I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit
card.  After adding the card to a machine with a new Centos 6.2 install
and naming it 'eth4' it works well for 6 to 12 hours and then fails.
The failure is characterized by dropping its connection speed from 1000
to 100 while not allowing any data to flow in or out.  When this happens
a shutdown and reboot does not solve the problem, but shutting down and
then removing the power does solve the problem.
I wrote a perl script that uses the  eth4 interface by pinging another
machine every 60 seconds to try to figure out the relationship of the
message log entries with the time of failure, and I think there is a
corelation of the failure of eth4 to function with the below entry.
Unfortunately, I am way over my head on this one.  If any of you can
help I would surely appreciate your thoughts.
Some additional information that may be useful.  The TrendNet card is
the second TrendNet card I have used.  The first card had the same
symptoms, and I deduced the card was bad, and purchased another one. The
symptoms are the same with the second card.
Before I purchase a third card from a different manufacturer I thought I
would post this to see what some of you think.  This is the first pci-e
card I have used; are there problems with the pci-e interfaces as
opposed to pci?  Do you think the motherboard could be the problem, and
moving eth4 to a different slot on the motherboard would be worthwhile.
Any ideas ???
Greg Ennis
P.S.  Here is the appropriate log entry in the /var/log/message file.
Jun 20 03:08:38 Mail kernel: ------------[ cut here ]------------
Jun 20 03:08:38 Mail kernel: WARNING: at net/sched/sch_generic.c:261
dev_watchdog+0x26d/0x280() (Not tainted)
Jun 20 03:08:38 Mail kernel: Hardware name: p7-1220
Jun 20 03:08:38 Mail kernel: NETDEV WATCHDOG: eth4 (r8169): transmit
queue 0 timed out
Jun 20 03:08:38 Mail kernel: Modules linked in: ipt_REDIRECT ipt_LOG
xt_limit ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat
xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc bnx2fc cnic uio fcoe
libfcoe libfc 8021q scsi_transport_fc garp stp llc scsi_tgt
cpufreq_ondemand powernow_k8 freq_table mperf ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 vhost_net macvtap macvlan tun kvm uinput sg btusb
bluetooth rfkill microcode snd_hda_codec_realtek snd_hda_intel
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd
soundcore snd_page_alloc i2c_piix4 r8169 mii ext4 mbcache jbd2 sr_mod
cdrom sd_mod crc_t10dif usb_storage sdhci_pci sdhci mmc_core ahci radeon
ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: scsi_wait_scan]
Jun 20 03:08:38 Mail kernel: Pid: 0, comm: swapper Not tainted
2.6.32-220.23.1.el6.centos.plus.x86_64 #1
Jun 20 03:08:38 Mail kernel: Call Trace:
Jun 20 03:08:38 Mail kernel: <IRQ>  [<ffffffff81069c97>] ?
warn_slowpath_common+0x87/0xc0
Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt
+0x46/0x50
Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt
+0x46/0x50
Jun 20 03:08:38 Mail kernel: [<ffffffff81451c0d>] ? dev_watchdog
+0x26d/0x280
Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog
+0x0/0x280
Jun 20 03:08:38 Mail kernel: [<ffffffff810efbf3>] ?
trace_nowake_buffer_unlock_commit+0x43/0x60
Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog
+0x0/0x280
Jun 20 03:08:38 Mail kernel: [<ffffffff8107cab7>] ? run_timer_softirq
+0x197/0x340
Jun 20 03:08:38 Mail kernel: [<ffffffff81072291>] ? __do_softirq
+0xc1/0x1d0
Jun 20 03:08:38 Mail kernel: [<ffffffff810958b0>] ? hrtimer_interrupt
+0x140/0x250
Jun 20 03:08:38 Mail kernel: [<ffffffff8100c24c>] ? call_softirq
+0x1c/0x30
Jun 20 03:08:38 Mail kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Jun 20 03:08:38 Mail kernel: [<ffffffff81072075>] ? irq_exit+0x85/0x90
Jun 20 03:08:38 Mail kernel: [<ffffffff814fc550>] ?
smp_apic_timer_interrupt+0x70/0x9b
Jun 20 03:08:38 Mail kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt
+0x13/0x20
Jun 20 03:08:38 Mail kernel: <EOI>  [<ffffffff812f5f9c>] ?
acpi_idle_enter_simple+0x114/0x14b
Jun 20 03:08:38 Mail kernel: [<ffffffff812f5f98>] ?
acpi_idle_enter_simple+0x110/0x14b
Jun 20 03:08:38 Mail kernel: [<ffffffff814014a7>] ? cpuidle_idle_call
+0xa7/0x140
Jun 20 03:08:38 Mail kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
Jun 20 03:08:38 Mail kernel: [<ffffffff814ed686>] ? start_secondary
+0x202/0x245
Jun 20 03:08:38 Mail kernel: ---[ end trace 24f15998c117ac8f ]---
Jun 20 03:08:38 Mail kernel: r8169 0000:01:00.0: eth4: link up
Jun 20 03:08:39 Mail abrtd: Directory 'oops-2012-06-20-03:08:39-2420-0'
creation detected
Jun 20 03:08:39 Mail abrt-dump-oops: Reported 1 kernel oopses to Abrt
Jun 20 03:08:39 Mail abrtd: Can't open file
'/var/spool/abrt/oops-2012-06-20-03:08:39-2420-0/uid': No such file or
directory