Everyone,
Most of the time I am over my head in trying to troubleshoot problems. However, after reading manuals, man pages, and getting advice from this list I have been able to work my way through difficulties, and at the end, I usually have a better understanding of what 'is going on'. I can only hope this method will work on this problem too.
I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit card. After adding the card to a machine with a new Centos 6.2 install and naming it 'eth4' it works well for 6 to 12 hours and then fails. The failure is characterized by dropping its connection speed from 1000 to 100 while not allowing any data to flow in or out. When this happens a shutdown and reboot does not solve the problem, but shutting down and then removing the power does solve the problem.
I wrote a perl script that uses the eth4 interface by pinging another machine every 60 seconds to try to figure out the relationship of the message log entries with the time of failure, and I think there is a corelation of the failure of eth4 to function with the below entry. Unfortunately, I am way over my head on this one. If any of you can help I would surely appreciate your thoughts.
Some additional information that may be useful. The TrendNet card is the second TrendNet card I have used. The first card had the same symptoms, and I deduced the card was bad, and purchased another one. The symptoms are the same with the second card.
Before I purchase a third card from a different manufacturer I thought I would post this to see what some of you think. This is the first pci-e card I have used; are there problems with the pci-e interfaces as opposed to pci? Do you think the motherboard could be the problem, and moving eth4 to a different slot on the motherboard would be worthwhile.
Any ideas ???
Greg Ennis P.S. Here is the appropriate log entry in the /var/log/message file.
Jun 20 03:08:38 Mail kernel: ------------[ cut here ]------------ Jun 20 03:08:38 Mail kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Jun 20 03:08:38 Mail kernel: Hardware name: p7-1220 Jun 20 03:08:38 Mail kernel: NETDEV WATCHDOG: eth4 (r8169): transmit queue 0 timed out Jun 20 03:08:38 Mail kernel: Modules linked in: ipt_REDIRECT ipt_LOG xt_limit ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc garp stp llc scsi_tgt cpufreq_ondemand powernow_k8 freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm uinput sg btusb bluetooth rfkill microcode snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 r8169 mii ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif usb_storage sdhci_pci sdhci mmc_core ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Jun 20 03:08:38 Mail kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.23.1.el6.centos.plus.x86_64 #1 Jun 20 03:08:38 Mail kernel: Call Trace: Jun 20 03:08:38 Mail kernel: <IRQ> [<ffffffff81069c97>] ? warn_slowpath_common+0x87/0xc0 Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt +0x46/0x50 Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt +0x46/0x50 Jun 20 03:08:38 Mail kernel: [<ffffffff81451c0d>] ? dev_watchdog +0x26d/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog +0x0/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff810efbf3>] ? trace_nowake_buffer_unlock_commit+0x43/0x60 Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog +0x0/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff8107cab7>] ? run_timer_softirq +0x197/0x340 Jun 20 03:08:38 Mail kernel: [<ffffffff81072291>] ? __do_softirq +0xc1/0x1d0 Jun 20 03:08:38 Mail kernel: [<ffffffff810958b0>] ? hrtimer_interrupt +0x140/0x250 Jun 20 03:08:38 Mail kernel: [<ffffffff8100c24c>] ? call_softirq +0x1c/0x30 Jun 20 03:08:38 Mail kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Jun 20 03:08:38 Mail kernel: [<ffffffff81072075>] ? irq_exit+0x85/0x90 Jun 20 03:08:38 Mail kernel: [<ffffffff814fc550>] ? smp_apic_timer_interrupt+0x70/0x9b Jun 20 03:08:38 Mail kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt +0x13/0x20 Jun 20 03:08:38 Mail kernel: <EOI> [<ffffffff812f5f9c>] ? acpi_idle_enter_simple+0x114/0x14b Jun 20 03:08:38 Mail kernel: [<ffffffff812f5f98>] ? acpi_idle_enter_simple+0x110/0x14b Jun 20 03:08:38 Mail kernel: [<ffffffff814014a7>] ? cpuidle_idle_call +0xa7/0x140 Jun 20 03:08:38 Mail kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Jun 20 03:08:38 Mail kernel: [<ffffffff814ed686>] ? start_secondary +0x202/0x245 Jun 20 03:08:38 Mail kernel: ---[ end trace 24f15998c117ac8f ]--- Jun 20 03:08:38 Mail kernel: r8169 0000:01:00.0: eth4: link up Jun 20 03:08:39 Mail abrtd: Directory 'oops-2012-06-20-03:08:39-2420-0' creation detected Jun 20 03:08:39 Mail abrt-dump-oops: Reported 1 kernel oopses to Abrt Jun 20 03:08:39 Mail abrtd: Can't open file '/var/spool/abrt/oops-2012-06-20-03:08:39-2420-0/uid': No such file or directory