Hi all,
Our intranet's WAN interface just stopped working yesterday, and I can't figure it out.
The machine has a Gigabyte motherboard, with on-board RTL-8110SC/8169SC Gigabit Ethernet and D-Link PCI NIC for the LAN side. I can get into the LAN side without an issue, but can't see the WAN side at all.
[root@intranet ~]# ifconfig eth0 up eth0: unknown interface: No such device
lspci -v
01:01.0 Ethernet controller: D-Link System Inc DGE-528T Gigabit Ethernet Adapter (rev 10) Subsystem: D-Link System Inc DGE-528T Gigabit Ethernet Adapter Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 185 I/O ports at a000 [size=256] Memory at e1000000 (32-bit, non-prefetchable) [size=256] [virtual] Expansion ROM at 80000000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2
01:05.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10) Subsystem: Giga-byte Technology GA-MA69G-S3H Motherboard Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 201 I/O ports at a400 [size=256] Memory at e1001000 (32-bit, non-prefetchable) [size=256] [virtual] Expansion ROM at 80020000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2
scanpci -v
pci bus 0x0001 cardnum 0x01 function 0x00: vendor 0x1186 device 0x4300 D-Link System Inc DGE-528T Gigabit Ethernet Adapter STATUS 0x02b0 COMMAND 0x0017 CLASS 0x02 0x00 0x00 REVISION 0x10 BIST 0x00 HEADER 0x00 LATENCY 0x40 CACHE 0x08 BASE0 0x0000a001 addr 0x0000a000 I/O BASE1 0xe1000000 addr 0xe1000000 MEM MAX_LAT 0x40 MIN_GNT 0x20 INT_PIN 0x01 INT_LINE 0x0b
pci bus 0x0001 cardnum 0x05 function 0x00: vendor 0x10ec device 0x8167 Realtek Semiconductor Co., Ltd. Device unknown CardVendor 0x1458 card 0xe000 (Card unknown) STATUS 0x02b0 COMMAND 0x0017 CLASS 0x02 0x00 0x00 REVISION 0x10 BIST 0x00 HEADER 0x00 LATENCY 0x40 CACHE 0x08 BASE0 0x0000a401 addr 0x0000a400 I/O BASE1 0xe1001000 addr 0xe1001000 MEM MAX_LAT 0x40 MIN_GNT 0x20 INT_PIN 0x01 INT_LINE 0x0a
[root@intranet ~]# lsmod | grep 8169 r8169 77125 0 mii 38849 1 r8169
So the driver is there, or so it seems. rmmod r8169 dropped both NIC's and I had to reboot the (headless) server to get the D-Link working again. Removing, and re-adding the eth0 interface with system-config-network doesn't actually re-create the NIC either.
So, how do I re-create it? Google search didn't reveal much, other that using the r8169 module, which is already loaded.
No changes were made to the server in a long time, that I know of, and it's running kernel 2.6.18-194.11.4.el5
[root@intranet ~]# uname -a Linux intranet.lan 2.6.18-194.11.4.el5 #1 SMP Tue Sep 21 05:04:09 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@intranet ~]# cat /etc/redhat-release CentOS release 5.5 (Final)
Here's something interesting though:
[root@intranet ~]# ifconfig -a | more __tmp1613210867 Link encap:Ethernet HWaddr FE:FF:FF:FF:FF:FF BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:2883 errors:0 dropped:0 overruns:0 frame:0 TX packets:2198 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:276073 (269.6 KiB) TX bytes:315508 (308.1 KiB) Interrupt:201 Base address:0x2000
eth1 Link encap:Ethernet HWaddr 00:1C:F0:6E:B8:B4 inet addr:192.168.2.250 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21c:f0ff:fe6e:b8b4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2883 errors:0 dropped:0 overruns:0 frame:0 TX packets:2198 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:276073 (269.6 KiB) TX bytes:315508 (308.1 KiB) Interrupt:185
lo Link encap:Local Loopback
On Sunday 09 January 2011 13:33, Rudi Ahlers wrote:
Our intranet's WAN interface just stopped working yesterday, and I can't figure it out.
Look in /etc/sysconfig/network-scripts. There you should see ifcfg-eth# If ifcfg-eth0 isn't there copy ifcfg-eth1 to ifccfg-eth0 and then configure ifcfg-eth0 to the information needed for your WAN link.
On Sun, Jan 9, 2011 at 11:13 PM, Robert Spangler mlists@zoominternet.net wrote:
On Sunday 09 January 2011 13:33, Rudi Ahlers wrote:
Our intranet's WAN interface just stopped working yesterday, and I can't figure it out.
Look in /etc/sysconfig/network-scripts. There you should see ifcfg-eth# If ifcfg-eth0 isn't there copy ifcfg-eth1 to ifccfg-eth0 and then configure ifcfg-eth0 to the information needed for your WAN link.
--
Regards Robert
The device file exists, but it's like asif the network card itself doesn't exist.
[root@intranet ~]# vi /etc/sysconfig/network-scripts/ifcfg-eth0 && /etc/init.d/network restart Shutting down interface eth1: [ OK ] Shutting down loopback interface: [ OK ] Disabling IPv4 packet forwarding: net.ipv4.ip_forward = 0 [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: r8169 device eth0 does not seem to be present, delaying initialization. [FAILED] Bringing up interface eth1: [ OK ]
On 10/01/11 05:41, Rudi Ahlers wrote:
On Sun, Jan 9, 2011 at 11:13 PM, Robert Spangler mlists@zoominternet.net wrote:
On Sunday 09 January 2011 13:33, Rudi Ahlers wrote:
Our intranet's WAN interface just stopped working yesterday, and I can't figure it out.
Look in /etc/sysconfig/network-scripts. There you should see ifcfg-eth# If ifcfg-eth0 isn't there copy ifcfg-eth1 to ifccfg-eth0 and then configure ifcfg-eth0 to the information needed for your WAN link.
The device file exists, but it's like asif the network card itself doesn't exist.
My immediate hunch is ... and I'm sorry to say it ... but your NIC is often referred to as Realcrap NICs - unfortunately that's not without a reason.
However, check what lspci says. If you don't see your NIC there, it is most likely a hardware issue (or caused by BIOS changes). If you see it, then look closely in dmesg for anything related to loading the kernel module for this NIC. See if that spits out any error messages. You may also try to reload your NICs kernel module (modprobe -r <module> && modprobe <module>).
Another thing is to figure out what you did before it stopped working. If you want to say "I did nothing" and that means you rebooted your box, upgraded packages or other things which might sound safe and innocent, it might just as well be connected.
The only times I've experienced issues and where I really did nothing, it was related to physical hardware issues. But those times where I did "nothing" (rebooting, upgrading, innocent configuration changes) and got troubles ... it was always connected to that I did the "nothing" thing. Sometimes even disabling "useless features" in BIOS turned out to disable quite a useful feature after all.
So no rock is too small to be turned around now. Go carefully through all your changes you did before it stopped working.
kind regards,
David Sommerseth
On Mon, Jan 10, 2011 at 10:05 AM, David Sommerseth dazo@users.sourceforge.net wrote:
On 10/01/11 05:41, Rudi Ahlers wrote:
On Sun, Jan 9, 2011 at 11:13 PM, Robert Spangler mlists@zoominternet.net wrote:
On Sunday 09 January 2011 13:33, Rudi Ahlers wrote:
Our intranet's WAN interface just stopped working yesterday, and I can't figure it out.
Look in /etc/sysconfig/network-scripts. There you should see ifcfg-eth# If ifcfg-eth0 isn't there copy ifcfg-eth1 to ifccfg-eth0 and then configure ifcfg-eth0 to the information needed for your WAN link.
The device file exists, but it's like asif the network card itself doesn't exist.
My immediate hunch is ... and I'm sorry to say it ... but your NIC is often referred to as Realcrap NICs - unfortunately that's not without a reason.
Thank you for the discrimination, but it's not appreciated. This is not a multi-million dollar enterprise cluster, so please don't see it as such. It's an in-house development server and really doesn't justify thousands of dollars' worth of hardware. The NIC was working fine for about 2 years now without a hiccup, out of the box when we first installed CentOS. Something went wrong, I just don't know how to actually fix it without re-installing CentOS :)
However, check what lspci says. If you don't see your NIC there, it is most likely a hardware issue (or caused by BIOS changes). If you see it, then look closely in dmesg for anything related to loading the kernel module for this NIC. See if that spits out any error messages. You may also try to reload your NICs kernel module (modprobe -r <module> && modprobe <module>).
Another thing is to figure out what you did before it stopped working. If you want to say "I did nothing" and that means you rebooted your box, upgraded packages or other things which might sound safe and innocent, it might just as well be connected.
The kernel & CentOS itself was upgraded last year sometime, when CentOS 5.5. was released and it was running fine since then. I really did nothing. We were working on a client's stuff, in fact, accessing data over SMB from the server. Would that have caused an issue? The network just dropped and hooked a KVM onto it to see what's up. eth0 was still using IP 192.168.1.250 (configured when we installed it) I then restarted the network scripts (/etc/init.d/network restart) and eth0 didn't come back up. So, it could either be a faulty network (yes, expensive card can also fail) or the driver doesn't load properly anymore. BUT, I don't know where to start fixing the problem.
The only times I've experienced issues and where I really did nothing, it was related to physical hardware issues. But those times where I did "nothing" (rebooting, upgrading, innocent configuration changes) and got troubles ... it was always connected to that I did the "nothing" thing. Sometimes even disabling "useless features" in BIOS turned out to disable quite a useful feature after all.
So are you saying a spook accessed the BIOS of a machine which was running for about 3 years, without any hardware changes? I don't, ever, change BIOS settings once a machine is setup. Why should I? Besides, the machine doesn't have a monitor or keyboard and I need to take the one off my desk, walk over to the server room plug it in and then access it that way. I don't know about you, but I don't do this randomly every day, it's a waste of time.
So no rock is too small to be turned around now. Go carefully through all your changes you did before it stopped working.
kind regards,
David Sommerseth
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
I love realtek - the resources they use tend not to conflict with other cards or hardware, they don't use much cpu time, the drivers are mature, and they don't cost much. What could be better? There does seem to be at least one onboard realtek chipset that can have driver issues, but I use the 8169 without problems.
But hardware does fail. And any brand of nic can fail in odd ways. I'm guessing you've swapped it out?
Bios settings can change if the on-board battery is dead and the system loses power. (It can set to defaults) But bios settings rarely affect nics - you're more likely to see boot problems from a change in drive boot sequence.
I don't suppose you have a vpn on your lan? I noticed you use the 192.168.1.x address range, which is one of the most common ranges in the world. If someone connects to your vpn from home or workplace, and if they use the same range, and if theres a bridge, addresses are going to conflict.
If you delete your ifcfg-eth0 or ifcfg-eth1 files, centos will recreate them if it sees the nics at boot. But it tends to enable eth0 and disable eth1 or higher. You should have backups of your originals for that reason...
I bet you wish you had a tcp/ip based kvm switch system about now...
On Mon, Jan 10, 2011 at 11:49 AM, compdoc compdoc@hotrodpc.com wrote:
I love realtek - the resources they use tend not to conflict with other cards or hardware, they don't use much cpu time, the drivers are mature, and they don't cost much. What could be better? There does seem to be at least one onboard realtek chipset that can have driver issues, but I use the 8169 without problems.
But hardware does fail. And any brand of nic can fail in odd ways. I'm guessing you've swapped it out?
Yes, the NIC might have failed, but how do I tell? lspci still shows it as active.
Bios settings can change if the on-board battery is dead and the system loses power. (It can set to defaults) But bios settings rarely affect nics - you're more likely to see boot problems from a change in drive boot sequence.
I already checked, BIOS settings didn't change :)
I don't suppose you have a vpn on your lan? I noticed you use the 192.168.1.x address range, which is one of the most common ranges in the world. If someone connects to your vpn from home or workplace, and if they use the same range, and if theres a bridge, addresses are going to conflict.
This is purely cause the ADSL router in the office is on the 192.168.1.0 subnet, so it's less hassle when it needs to be swapped out to get it back up again. No VPN.
If you delete your ifcfg-eth0 or ifcfg-eth1 files, centos will recreate them if it sees the nics at boot. But it tends to enable eth0 and disable eth1 or higher. You should have backups of your originals for that reason...
I've already tried that, but eth0 doesn't automatically get detected.
I bet you wish you had a tcp/ip based kvm switch system about now...
Yes, I supposed I could take one from a client server, or open a sealed one, but it's not really necessary. For now I put in another D-Link and got the server up that way, but would prefer to use the onboard one since I had to take everything out of the 1U chassis, which doesn't support more than 1 additional NIC.
Thank you for the discrimination, but it's not appreciated. This is not a multi-million dollar enterprise cluster, so please don't see it as such. It's an in-house development server and really doesn't justify thousands of dollars' worth of hardware. The NIC was working fine for about 2 years now without a hiccup, out of the box when we first installed CentOS. Something went wrong, I just don't know how to actually fix it without re-installing CentOS :)
I would boot the server from a LiveCD or two and test network connectivity. If it works from any one of these LiveCDs than the network card works and it could be a configuration issue in your installed CentOS. If it doesn't work on any of these LiveCDs that are all using different drivers, then it might be the card. Also .. since this should be easy to do, switch network cables (in place, keeping their existing switch ports) with another system that is running fine. If the other system starts exhibiting strange issues and this system magically starts working fine, it could be a cable or switch configuration issue. Although most people don't have these sitting around, you could connect a USB nic to the machine and see if the problem occurs with the installed CentOS but using a different NIC without cracking the case.
Hope this helps, Barry
On 1/10/11 3:12 AM, Rudi Ahlers wrote:
My immediate hunch is ... and I'm sorry to say it ... but your NIC is often referred to as Realcrap NICs - unfortunately that's not without a reason.
Thank you for the discrimination, but it's not appreciated. This is not a multi-million dollar enterprise cluster, so please don't see it as such. It's an in-house development server and really doesn't justify thousands of dollars' worth of hardware. The NIC was working fine for about 2 years now without a hiccup, out of the box when we first installed CentOS. Something went wrong, I just don't know how to actually fix it without re-installing CentOS :)
A quick check would be to boot a live-cd distro or the centos install disk in rescue mode. If the nic comes up that way it's something in your software or configs; if it doesn't, it's hardware.
So are you saying a spook accessed the BIOS of a machine which was running for about 3 years, without any hardware changes? I don't, ever, change BIOS settings once a machine is setup.
Stuff like that happens. We've had a bunch of IBM servers that after running several years would start crashing randomly - and would be fixed with a bios update.
I like your analogy David..." rock is too small to be turned around now" I think you put it in the right context. It is always a good idea to check on dmesg upon boot and make sure those modules are loaded as David mentioned. Try to start from the beginning to troubleshoot the problem. Tow cents.
David Sommerseth dazo@users.sourceforge.net 1/10/2011 3:05 AM >>>
On 10/01/11 05:41, Rudi Ahlers wrote:
On Sun, Jan 9, 2011 at 11:13 PM, Robert Spangler mlists@zoominternet.net wrote:
On Sunday 09 January 2011 13:33, Rudi Ahlers wrote:
Our intranet's WAN interface just stopped working yesterday, and I can't figure it out.
Look in /etc/sysconfig/network-scripts. There you should see ifcfg-eth# If ifcfg-eth0 isn't there copy ifcfg-eth1 to ifccfg-eth0 and then configure ifcfg-eth0 to the information needed for your WAN link.
The device file exists, but it's like asif the network card itself doesn't exist.
My immediate hunch is ... and I'm sorry to say it ... but your NIC is often referred to as Realcrap NICs - unfortunately that's not without a reason.
However, check what lspci says. If you don't see your NIC there, it is most likely a hardware issue (or caused by BIOS changes). If you see it, then look closely in dmesg for anything related to loading the kernel module for this NIC. See if that spits out any error messages. You may also try to reload your NICs kernel module (modprobe -r <module> && modprobe <module>).
Another thing is to figure out what you did before it stopped working. If you want to say "I did nothing" and that means you rebooted your box, upgraded packages or other things which might sound safe and innocent, it might just as well be connected.
The only times I've experienced issues and where I really did nothing, it was related to physical hardware issues. But those times where I did "nothing" (rebooting, upgrading, innocent configuration changes) and got troubles ... it was always connected to that I did the "nothing" thing. Sometimes even disabling "useless features" in BIOS turned out to disable quite a useful feature after all.
So no rock is too small to be turned around now. Go carefully through all your changes you did before it stopped working.
kind regards,
David Sommerseth
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos