Everyone,
Most of the time I am over my head in trying to troubleshoot problems. However, after reading manuals, man pages, and getting advice from this list I have been able to work my way through difficulties, and at the end, I usually have a better understanding of what 'is going on'. I can only hope this method will work on this problem too.
I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit card. After adding the card to a machine with a new Centos 6.2 install and naming it 'eth4' it works well for 6 to 12 hours and then fails. The failure is characterized by dropping its connection speed from 1000 to 100 while not allowing any data to flow in or out. When this happens a shutdown and reboot does not solve the problem, but shutting down and then removing the power does solve the problem.
I wrote a perl script that uses the eth4 interface by pinging another machine every 60 seconds to try to figure out the relationship of the message log entries with the time of failure, and I think there is a corelation of the failure of eth4 to function with the below entry. Unfortunately, I am way over my head on this one. If any of you can help I would surely appreciate your thoughts.
Some additional information that may be useful. The TrendNet card is the second TrendNet card I have used. The first card had the same symptoms, and I deduced the card was bad, and purchased another one. The symptoms are the same with the second card.
Before I purchase a third card from a different manufacturer I thought I would post this to see what some of you think. This is the first pci-e card I have used; are there problems with the pci-e interfaces as opposed to pci? Do you think the motherboard could be the problem, and moving eth4 to a different slot on the motherboard would be worthwhile.
Any ideas ???
Greg Ennis P.S. Here is the appropriate log entry in the /var/log/message file.
Jun 20 03:08:38 Mail kernel: ------------[ cut here ]------------ Jun 20 03:08:38 Mail kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) Jun 20 03:08:38 Mail kernel: Hardware name: p7-1220 Jun 20 03:08:38 Mail kernel: NETDEV WATCHDOG: eth4 (r8169): transmit queue 0 timed out Jun 20 03:08:38 Mail kernel: Modules linked in: ipt_REDIRECT ipt_LOG xt_limit ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc garp stp llc scsi_tgt cpufreq_ondemand powernow_k8 freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm uinput sg btusb bluetooth rfkill microcode snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 r8169 mii ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif usb_storage sdhci_pci sdhci mmc_core ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Jun 20 03:08:38 Mail kernel: Pid: 0, comm: swapper Not tainted 2.6.32-220.23.1.el6.centos.plus.x86_64 #1 Jun 20 03:08:38 Mail kernel: Call Trace: Jun 20 03:08:38 Mail kernel: <IRQ> [<ffffffff81069c97>] ? warn_slowpath_common+0x87/0xc0 Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt +0x46/0x50 Jun 20 03:08:38 Mail kernel: [<ffffffff81069d86>] ? warn_slowpath_fmt +0x46/0x50 Jun 20 03:08:38 Mail kernel: [<ffffffff81451c0d>] ? dev_watchdog +0x26d/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog +0x0/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff810efbf3>] ? trace_nowake_buffer_unlock_commit+0x43/0x60 Jun 20 03:08:38 Mail kernel: [<ffffffff814519a0>] ? dev_watchdog +0x0/0x280 Jun 20 03:08:38 Mail kernel: [<ffffffff8107cab7>] ? run_timer_softirq +0x197/0x340 Jun 20 03:08:38 Mail kernel: [<ffffffff81072291>] ? __do_softirq +0xc1/0x1d0 Jun 20 03:08:38 Mail kernel: [<ffffffff810958b0>] ? hrtimer_interrupt +0x140/0x250 Jun 20 03:08:38 Mail kernel: [<ffffffff8100c24c>] ? call_softirq +0x1c/0x30 Jun 20 03:08:38 Mail kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 Jun 20 03:08:38 Mail kernel: [<ffffffff81072075>] ? irq_exit+0x85/0x90 Jun 20 03:08:38 Mail kernel: [<ffffffff814fc550>] ? smp_apic_timer_interrupt+0x70/0x9b Jun 20 03:08:38 Mail kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt +0x13/0x20 Jun 20 03:08:38 Mail kernel: <EOI> [<ffffffff812f5f9c>] ? acpi_idle_enter_simple+0x114/0x14b Jun 20 03:08:38 Mail kernel: [<ffffffff812f5f98>] ? acpi_idle_enter_simple+0x110/0x14b Jun 20 03:08:38 Mail kernel: [<ffffffff814014a7>] ? cpuidle_idle_call +0xa7/0x140 Jun 20 03:08:38 Mail kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110 Jun 20 03:08:38 Mail kernel: [<ffffffff814ed686>] ? start_secondary +0x202/0x245 Jun 20 03:08:38 Mail kernel: ---[ end trace 24f15998c117ac8f ]--- Jun 20 03:08:38 Mail kernel: r8169 0000:01:00.0: eth4: link up Jun 20 03:08:39 Mail abrtd: Directory 'oops-2012-06-20-03:08:39-2420-0' creation detected Jun 20 03:08:39 Mail abrt-dump-oops: Reported 1 kernel oopses to Abrt Jun 20 03:08:39 Mail abrtd: Can't open file '/var/spool/abrt/oops-2012-06-20-03:08:39-2420-0/uid': No such file or directory
Gregory P. Ennis wrote: <snip>
I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit card. After adding the card to a machine with a new Centos 6.2 install and naming it 'eth4' it works well for 6 to 12 hours and then fails. The failure is characterized by dropping its connection speed from 1000 to 100 while not allowing any data to flow in or out. When this happens a shutdown and reboot does not solve the problem, but shutting down and then removing the power does solve the problem.
<snip>
Some additional information that may be useful. The TrendNet card is the second TrendNet card I have used. The first card had the same symptoms, and I deduced the card was bad, and purchased another one. The symptoms are the same with the second card.
<snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
mark
Gregory P. Ennis wrote: <snip>
I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit card. After adding the card to a machine with a new Centos 6.2 install and naming it 'eth4' it works well for 6 to 12 hours and then fails. The failure is characterized by dropping its connection speed from 1000 to 100 while not allowing any data to flow in or out. When this happens a shutdown and reboot does not solve the problem, but shutting down and then removing the power does solve the problem.
<snip>
Some additional information that may be useful. The TrendNet card is the second TrendNet card I have used. The first card had the same symptoms, and I deduced the card was bad, and purchased another one. The symptoms are the same with the second card.
<snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
mark
---------------------------------------------------------------------
Mark,
I have several machines on that network, and only one machine is having the problem. The machine is being used as a mail server, web server, and gateway for the network. After this problem surfaced with the failure of the eth4 card (internal network), I created a gateway out of one of the other machines that is working without incident.
I did purchase both TrendNet Cards from Fry's. Fry's was good about taking the first one back without question, but now that the second one has failed, I thought it best to look deeper. I don't have the previous card's MAC address, but my first thought was that this was a bad card too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card from a different manufacturer I wanted to post things here to see if there was an obvious problem I am missing.
Thanks for your help!!!
Greg
On 6/20/2012 10:27 AM, Gregory P. Ennis wrote:
Gregory P. Ennis wrote:
<snip> > I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit > card. After adding the card to a machine with a new Centos 6.2 install > and naming it 'eth4' it works well for 6 to 12 hours and then fails. > The failure is characterized by dropping its connection speed from 1000 > to 100 while not allowing any data to flow in or out. When this happens > a shutdown and reboot does not solve the problem, but shutting down and > then removing the power does solve the problem. <snip> > Some additional information that may be useful. The TrendNet card is > the second TrendNet card I have used. The first card had the same > symptoms, and I deduced the card was bad, and purchased another one. The > symptoms are the same with the second card. <snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
mark
Mark,
I have several machines on that network, and only one machine is having the problem. The machine is being used as a mail server, web server, and gateway for the network. After this problem surfaced with the failure of the eth4 card (internal network), I created a gateway out of one of the other machines that is working without incident.
I did purchase both TrendNet Cards from Fry's. Fry's was good about taking the first one back without question, but now that the second one has failed, I thought it best to look deeper. I don't have the previous card's MAC address, but my first thought was that this was a bad card too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card from a different manufacturer I wanted to post things here to see if there was an obvious problem I am missing.
Thanks for your help!!!
Greg
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
If you are having to fully 'cold boot' the system before it will work again I can't help but wonder if it is a conflict between special motherboard functions/settings and the card. I've seen this with some high end video cards under Winders. I am totally speculating here and have nothing to draw from, but wake on lan functions and such.... just leaves me wondering. Do you have a different machine/motherboard around where it wouldn't be hard to set up this testing? Maybe Googling a bit on motherboard model and eth card model might give a helpful return?
John Hinton wrote:
On 6/20/2012 10:27 AM, Gregory P. Ennis wrote:
Gregory P. Ennis wrote:
<snip> > I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit > card. After adding the card to a machine with a new Centos 6.2 install
<snip>
If you are having to fully 'cold boot' the system before it will work again I can't help but wonder if it is a conflict between special motherboard functions/settings and the card. I've seen this with some high end video cards under Winders. I am totally speculating here and have nothing to draw from, but wake on lan functions and such.... just leaves me wondering. Do you have a different machine/motherboard around where it wouldn't be hard to set up this testing? Maybe Googling a bit on motherboard model and eth card model might give a helpful return?
Interesting questions. Is wake-on-lan enabled (try turning it off, so it's always on). Also, if it's 6.2, check that udev rule.
mark
Gregory P. Ennis wrote:
<snip> > I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit > card. After adding the card to a machine with a new Centos 6.2 install > and naming it 'eth4' it works well for 6 to 12 hours and then fails. > The failure is characterized by dropping its connection speed from 1000 > to 100 while not allowing any data to flow in or out. When this happens > a shutdown and reboot does not solve the problem, but shutting down and > then removing the power does solve the problem. <snip> > Some additional information that may be useful. The TrendNet card is > the second TrendNet card I have used. The first card had the same > symptoms, and I deduced the card was bad, and purchased another one. The > symptoms are the same with the second card. <snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
mark
Mark,
I have several machines on that network, and only one machine is having the problem. The machine is being used as a mail server, web server, and gateway for the network. After this problem surfaced with the failure of the eth4 card (internal network), I created a gateway out of one of the other machines that is working without incident.
I did purchase both TrendNet Cards from Fry's. Fry's was good about taking the first one back without question, but now that the second one has failed, I thought it best to look deeper. I don't have the previous card's MAC address, but my first thought was that this was a bad card too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card from a different manufacturer I wanted to post things here to see if there was an obvious problem I am missing.
Thanks for your help!!!
Greg
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
If you are having to fully 'cold boot' the system before it will work again I can't help but wonder if it is a conflict between special motherboard functions/settings and the card. I've seen this with some high end video cards under Winders. I am totally speculating here and have nothing to draw from, but wake on lan functions and such.... just leaves me wondering. Do you have a different machine/motherboard around where it wouldn't be hard to set up this testing? Maybe Googling a bit on motherboard model and eth card model might give a helpful return?
------------------------------------------------------------------------
John,
That is a good idea !!!
I have appended the output of 'ethtool eth4' below. Is there a way to change the wake setting from the command line as opposed to changing the bios setting at boot.
Greg
Settings for eth4: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Link partner advertised pause frame use: No Link partner advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: pumbg Current message level: 0x00000033 (51) Link detected: yes
On 6/20/2012 11:13 AM, Gregory P. Ennis wrote:
Gregory P. Ennis wrote:
<snip> > I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit > card. After adding the card to a machine with a new Centos 6.2 install > and naming it 'eth4' it works well for 6 to 12 hours and then fails. > The failure is characterized by dropping its connection speed from 1000 > to 100 while not allowing any data to flow in or out. When this happens > a shutdown and reboot does not solve the problem, but shutting down and > then removing the power does solve the problem. <snip> > Some additional information that may be useful. The TrendNet card is > the second TrendNet card I have used. The first card had the same > symptoms, and I deduced the card was bad, and purchased another one. The > symptoms are the same with the second card. <snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
mark
Mark,
I have several machines on that network, and only one machine is having the problem. The machine is being used as a mail server, web server, and gateway for the network. After this problem surfaced with the failure of the eth4 card (internal network), I created a gateway out of one of the other machines that is working without incident.
I did purchase both TrendNet Cards from Fry's. Fry's was good about taking the first one back without question, but now that the second one has failed, I thought it best to look deeper. I don't have the previous card's MAC address, but my first thought was that this was a bad card too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card from a different manufacturer I wanted to post things here to see if there was an obvious problem I am missing.
Thanks for your help!!!
Greg
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
If you are having to fully 'cold boot' the system before it will work again I can't help but wonder if it is a conflict between special motherboard functions/settings and the card. I've seen this with some high end video cards under Winders. I am totally speculating here and have nothing to draw from, but wake on lan functions and such.... just leaves me wondering. Do you have a different machine/motherboard around where it wouldn't be hard to set up this testing? Maybe Googling a bit on motherboard model and eth card model might give a helpful return?
John,
That is a good idea !!!
I have appended the output of 'ethtool eth4' below. Is there a way to change the wake setting from the command line as opposed to changing the bios setting at boot.
Greg
Settings for eth4: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Link partner advertised pause frame use: No Link partner advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: pumbg Current message level: 0x00000033 (51) Link detected: yes
I always disable wake on lan on the motherboard and so far have never had an issue. To me this 'feature' should never be on by default but most of my experience has shown the opposite. I suppose there is good use for this, but I sure don't have one. At the mb bios level, it just seems like another level of security to worry about with little info on 'knowing' the potential. I have no experience with disabling wake on lan on the cards themselves. If this is a mailserver, it seems it should never go to sleep... so if there is a switch to turn off wake on lan in the motherboard bios, I'd turn it off first and see if the issue goes away. If not, I'd hit the lan manufacture site to find this info as it would be specific to each. Or, it might be easier to just try a different manufacturer?
On 6/20/2012 11:13 AM, Gregory P. Ennis wrote:
Gregory P. Ennis wrote:
<snip> > I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit > card. After adding the card to a machine with a new Centos 6.2 install > and naming it 'eth4' it works well for 6 to 12 hours and then fails. > The failure is characterized by dropping its connection speed from 1000 > to 100 while not allowing any data to flow in or out. When this happens > a shutdown and reboot does not solve the problem, but shutting down and > then removing the power does solve the problem. <snip> > Some additional information that may be useful. The TrendNet card is > the second TrendNet card I have used. The first card had the same > symptoms, and I deduced the card was bad, and purchased another one. The > symptoms are the same with the second card. <snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
mark
Mark,
I have several machines on that network, and only one machine is having the problem. The machine is being used as a mail server, web server, and gateway for the network. After this problem surfaced with the failure of the eth4 card (internal network), I created a gateway out of one of the other machines that is working without incident.
I did purchase both TrendNet Cards from Fry's. Fry's was good about taking the first one back without question, but now that the second one has failed, I thought it best to look deeper. I don't have the previous card's MAC address, but my first thought was that this was a bad card too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card from a different manufacturer I wanted to post things here to see if there was an obvious problem I am missing.
Thanks for your help!!!
Greg
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
If you are having to fully 'cold boot' the system before it will work again I can't help but wonder if it is a conflict between special motherboard functions/settings and the card. I've seen this with some high end video cards under Winders. I am totally speculating here and have nothing to draw from, but wake on lan functions and such.... just leaves me wondering. Do you have a different machine/motherboard around where it wouldn't be hard to set up this testing? Maybe Googling a bit on motherboard model and eth card model might give a helpful return?
John,
That is a good idea !!!
I have appended the output of 'ethtool eth4' below. Is there a way to change the wake setting from the command line as opposed to changing the bios setting at boot.
Greg
Settings for eth4: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Link partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Link partner advertised pause frame use: No Link partner advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: pumbg Current message level: 0x00000033 (51) Link detected: yes
I always disable wake on lan on the motherboard and so far have never had an issue. To me this 'feature' should never be on by default but most of my experience has shown the opposite. I suppose there is good use for this, but I sure don't have one. At the mb bios level, it just seems like another level of security to worry about with little info on 'knowing' the potential. I have no experience with disabling wake on lan on the cards themselves. If this is a mailserver, it seems it should never go to sleep... so if there is a switch to turn off wake on lan in the motherboard bios, I'd turn it off first and see if the issue goes away. If not, I'd hit the lan manufacture site to find this info as it would be specific to each. Or, it might be easier to just try a different manufacturer? -----------------------------------------------------
John,
Well, this gives me something to change; I'll let you know what happens. I will not be able to do this until much latter in the day.
Thanks a bunch for your help!!!
Greg
Greg,
Gregory P. Ennis wrote:
Gregory P. Ennis wrote:
<snip>
Some additional information that may be useful. The TrendNet card is the second TrendNet card I have used. The first card had the same symptoms, and I deduced the card was bad, and purchased another one. The symptoms are the same with the second card.
<snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
I have several machines on that network, and only one machine is having the problem. The machine is being used as a mail server, web server, and gateway for the network. After this problem surfaced with the failure of the eth4 card (internal network), I created a gateway out of one of the other machines that is working without incident.
Good deal.
I did purchase both TrendNet Cards from Fry's. Fry's was good about taking the first one back without question, but now that the second one has failed, I thought it best to look deeper. I don't have the previous card's MAC address, but my first thought was that this was a bad card
Ah, but you should in your logs, or - if you're running 6.2 - possibly in /etc/udev/rules.d/70-persistant-net.rules.
too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card
<snip>
In that case, sounds like the OEM had a q/c problem.
mark
<snip>
Some additional information that may be useful. The TrendNet card is the second TrendNet card I have used. The first card had the same symptoms, and I deduced the card was bad, and purchased another one. The symptoms are the same with the second card.
<snip> Several questions: do you have another machine on the same network? Does *it* show the problem, around the same time?
And, finally, did you buy both TrendNet cards from the same vendor? Are their MACs close? If so, it could be the vendor got a bad batch, either OEM's fault, or the gorilla who un/loaded it during shipping.
I have several machines on that network, and only one machine is having the problem. The machine is being used as a mail server, web server, and gateway for the network. After this problem surfaced with the failure of the eth4 card (internal network), I created a gateway out of one of the other machines that is working without incident.
Good deal.
I did purchase both TrendNet Cards from Fry's. Fry's was good about taking the first one back without question, but now that the second one has failed, I thought it best to look deeper. I don't have the previous card's MAC address, but my first thought was that this was a bad card
Ah, but you should in your logs, or - if you're running 6.2 - possibly in /etc/udev/rules.d/70-persistant-net.rules.
too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card
<snip>
In that case, sounds like the OEM had a q/c problem.
mark
Mark,
That's interesting. Here are the log entries for the previous card as well as the eth4 that is currently installed.
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:f6:81", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:fc:6e", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
Looks like addresses are close.
Greg
Greg,
Gregory P. Ennis wrote:
<snip> >> Some additional information that may be useful. The TrendNet card is >> the second TrendNet card I have used. The first card had the same >> symptoms, and I deduced the card was bad, and purchased another one. >> The symptoms are the same with the second card. > <snip> Ah, but you should in your logs, or - if you're running 6.2 - possibly in /etc/udev/rules.d/70-persistant-net.rules.
too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card
<snip>
In that case, sounds like the OEM had a q/c problem.
That's interesting. Here are the log entries for the previous card as well as the eth4 that is currently installed.
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:f6:81", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:fc:6e", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
Looks like addresses are close.
So-so; not *that* close. I have some servers with two on-board NIC's whose MAC addresses end in things like fe:ab, fe:ac, fe;36, fe:37. Still....
Actually, I missed the beginning of this thread. Are there no on-board NICs? I've not seen a m/b in a long time without that; even Rasberry Pi has one.
mark
Gregory P. Ennis wrote:
<snip> >> Some additional information that may be useful. The TrendNet card is >> the second TrendNet card I have used. The first card had the same >> symptoms, and I deduced the card was bad, and purchased another one. >> The symptoms are the same with the second card. > <snip> Ah, but you should in your logs, or - if you're running 6.2 - possibly in /etc/udev/rules.d/70-persistant-net.rules.
too. Both the first and second cards did not appear to have any damage on the boxes or the card itself. Before I tried to get a third card
<snip>
In that case, sounds like the OEM had a q/c problem.
That's interesting. Here are the log entries for the previous card as well as the eth4 that is currently installed.
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:f6:81", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:fc:6e", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
Looks like addresses are close.
So-so; not *that* close. I have some servers with two on-board NIC's whose MAC addresses end in things like fe:ab, fe:ac, fe;36, fe:37. Still....
Actually, I missed the beginning of this thread. Are there no on-board NICs? I've not seen a m/b in a long time without that; even Rasberry Pi has one.
mark
Mark,
There is an on board nic with the m/b. Here is the mac entry of it.
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="38:60:77:ed:41:a0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
Both nic's apparently have the same chipset :
"lspci | grep net" outputs : 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev ff) 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Greg
Gregory P. Ennis wrote:
Gregory P. Ennis wrote:
<snip> >> Some additional information that may be useful. The TrendNet card is >> the second TrendNet card I have used. The first card had the same >> symptoms, and I deduced the card was bad, and purchased another one. >> The symptoms are the same with the second card.
<snip>
Looks like addresses are close.
So-so; not *that* close. I have some servers with two on-board NIC's whose MAC addresses end in things like fe:ab, fe:ac, fe;36, fe:37. Still....
Actually, I missed the beginning of this thread. Are there no on-board NICs? I've not seen a m/b in a long time without that; even Rasberry Pi has one.
There is an on board nic with the m/b. Here is the mac entry of it.
<snip> Are those in use? If not, why not use them?
mark "I must be missing something"
Gregory P. Ennis wrote:
Gregory P. Ennis wrote:
<snip> >> Some additional information that may be useful. The TrendNet card is >> the second TrendNet card I have used. The first card had the same >> symptoms, and I deduced the card was bad, and purchased another one. >> The symptoms are the same with the second card.
<snip>
Looks like addresses are close.
So-so; not *that* close. I have some servers with two on-board NIC's whose MAC addresses end in things like fe:ab, fe:ac, fe;36, fe:37. Still....
Actually, I missed the beginning of this thread. Are there no on-board NICs? I've not seen a m/b in a long time without that; even Rasberry Pi has one.
There is an on board nic with the m/b. Here is the mac entry of it.
<snip> Are those in use? If not, why not use them?
mark "I must be missing something"
----------------------------------------------------------------
Mark,
I have the m/b nic set as the external (open to the internet) card. The pci-e nic was set for the internal network card. I had this machine set to be a gateway for the rest of the internal machines. I only have two nics on this system, eth0 and eth4. The reason it is labeled eth4 is related to some installation problems I had during the installation of the pci-e card. Once I got eth4 to work, I have been too lazy to go back and modify things to relabel it as eth1. Now that it is failing, I am glad I left it alone.
Greg
On 6/20/2012 11:09 AM, Gregory P. Ennis wrote:
That's interesting. Here are the log entries for the previous card as well as the eth4 that is currently installed.
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:f6:81", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
# PCI device 0x10ec:0x8168 (r8169) SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:e0:b3:10:fc:6e", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
have you deleted all the information from udev of the old card you pulled out. Could be an issue, not sure, if you are using the same slot ? Sometimes you get bad batches though and one failure can mean many more too.
if both cards had the same issue, then I doubt udev or any of that is at fault. Having to unplug power to the machine is odd, but would support a bad card idea.
Try instead of pulling plug, rebooting but unplugging network cable first, see if that has an effect.
I would just return it and get a different type of card...or try an extra one you have lying around.
All I know is with computers is come down to two things 1) its broke, return it 2) its something really silly, usually one misconfiguration or error, something simple but overlooked.
On 6/20/2012 9:34 AM, Gregory P. Ennis wrote:
I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit card. After adding the card to a machine with a new Centos 6.2 install and naming it 'eth4' it works well for 6 to 12 hours and then fails.
Try moving the network card to a new slot, especially if you can swap the network card with another card which is known to work. Also, try swapping the card into a spare server.
If the problem follows the network card, then the card is probably bad. If a known-good card misbehaves in the slot where you previously had the network card, then the slot may be bad as well.
On Wed, Jun 20, 2012 at 12:18 PM, Chris Beattie cbeattie@geninfo.com wrote:
On 6/20/2012 9:34 AM, Gregory P. Ennis wrote:
I have been chasing a problem with a pci-e TrendNet(TEG-ECTX) gigabit card. After adding the card to a machine with a new Centos 6.2 install and naming it 'eth4' it works well for 6 to 12 hours and then fails.
Try moving the network card to a new slot, especially if you can swap the network card with another card which is known to work. Also, try swapping the card into a spare server.
If the problem follows the network card, then the card is probably bad. If a known-good card misbehaves in the slot where you previously had the network card, then the slot may be bad as well.
Or it could mean that the PCI-e slots are not providing enough power for this card, or the slots are specialized to run only certain types of cards. What motherboard does the OP have?
On 06/20/12 11:17 AM, Dale Dellutri wrote:
Or it could mean that the PCI-e slots are not providing enough power for this card, or the slots are specialized to run only certain types of cards. What motherboard does the OP have?
more likely, it means once again Fry's is selling junk that belongs in a scrap pile.
On 06/20/12 11:17 AM, Dale Dellutri wrote:
Or it could mean that the PCI-e slots are not providing enough power for this card, or the slots are specialized to run only certain types of cards. What motherboard does the OP have?
more likely, it means once again Fry's is selling junk that belongs in a scrap pile.
-------------------------------------------------------------------------
John,
I am being persuaded that you are right. I'll have to look at the mother board to answer Dale's question; the machine and the nic card came from Fry's. I have had pretty good luck with Fry's in the past, but this has turned out to be a real pain.
What chip set, or what pci-e nic card would you recommend?
Greg
On 06/20/12 12:21 PM, Gregory P. Ennis wrote:
I am being persuaded that you are right. I'll have to look at the mother board to answer Dale's question; the machine and the nic card came from Fry's. I have had pretty good luck with Fry's in the past, but this has turned out to be a real pain.
What chip set, or what pci-e nic card would you recommend?
I've had good luck with Intel ethernet chips/cards, especially the server oriented ones.
one of my recent servers has these NICs...
03:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 03:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
another has...
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
these are both recent Xeon 5600 class CPUs. The first is a HP DL380g6, the 2nd is a whitebox server with a SuperMicro X8DTE-F motherboard.
John R Pierce wrote:
On 06/20/12 12:21 PM, Gregory P. Ennis wrote:
I am being persuaded that you are right. I'll have to look at the mother board to answer Dale's question; the machine and the nic card came from Fry's. I have had pretty good luck with Fry's in the past, but this has turned out to be a real pain.
What chip set, or what pci-e nic card would you recommend?
I've had good luck with Intel ethernet chips/cards, especially the server oriented ones.
<snip> We've got a lot of Broadcom ones. IIRC, Realtek tends towards consumer grade, not server grade.
mark