[CentOS] Weird CentOS 5.3 problem

Tue May 19 18:44:22 UTC 2009
Randall Martin <wolf at clemson.edu>



> From: Robert Heller <heller at deepsoft.com>
> Organization: Deepwoods Software
> Reply-To: CentOS mailing list <centos at centos.org>
> Date: Tue, 19 May 2009 09:46:15 -0400
> To: CentOS mailing list <centos at centos.org>
> Cc: <centos at centos.org>
> Subject: Re: [CentOS] Weird CentOS 5.3 problem
> 
> At Tue, 19 May 2009 09:04:43 -0400 CentOS mailing list <centos at centos.org>
> wrote:
> 
>> 
>> 
>> 
>> I reimaged a compute node on our cluster with the latest 5.3 updates (we
>> were previously running 5.2), but we kept the kernel at 2.6.18-92.1.10.el5
>> until I can find time to rebuild some of our kernel modules.  After the
>> image install finishes and the system reboots, the eth0 ethernet interface
>> disappears.  If I do an ifconfig ­a, I see what should be eth0, but it¹s
>> listed as __tmp2081258173.
>> 
>> [root at node0770 ~]# ifconfig -a
>> __tmp2081258173 Link encap:Ethernet  HWaddr 00:1E:68:86:67:04
>>           BROADCAST MULTICAST  MTU:1500  Metric:1
>>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>>           Interrupt:66
>> 
>> The dmesg output isn¹t very helpful:
>> 
>> [root at node0770 ~]# dmesg|grep eth0
>> eth0: forcedeth.c: subsystem: 0108e:534b bound to 0000:00:08.0
>> 
>> 
>> If I remove our lustre modules that were built for the 2.6.18-92.1.10.el5
>> kernel and reboot, the eth0 interface reappears.  Another piece to this
>> puzzle is that this problem only seems to happen on our Sun X2200¹s.  Our
>> Dell 1950¹s work just fine after putting on the 5.3 updates.  Anyone know
>> what could cause this behavior?
> 
> Check /etc/modprobe.conf (and
> /etc/sysconfig/network-scripts/if-cfg-eth0) -- if you are doing a
> disk-to-disk backup type of install, the alias for eth0 is very likely
> wrong (and the HW address in /etc/sysconfig/network-scripts/if-cfg-eth0
> is also wrong).  You may have to manually update these two files on the
> 'new' machine, since it likely has a different NIC, requiring a
> different driver.  It will also have a different MAC (HW) address as
> well. In the old days, kudzu would detect this and pop up during the
> boot process.
> 
> What does lspci display?
> 


We add the two lines at the end of modprobe.conf for lustre.

alias eth0 tg3
alias eth1 tg3
alias eth2 forcedeth
alias eth3 forcedeth
alias scsi_hostadapter sata_nv
options lnet networks="tcp0(eth0)"
options ksocklnd enable_irq_affinity=0


The /etc/sysconfig/network-scripts/ifcfg-eth0 has the correct settings for
this host.  We actually generate this file during the post-install.  Here's
what it looks like:

DEVICE=eth0
BOOTPROTO=none
STARTMODE=onboot
ONBOOT=yes
USERCTL=no
TYPE=Ethernet
IPV6INIT=no
IPADDR=192.168.3.91
BROADCAST=192.168.255.255
NETMASK=255.255.0.0
GATEWAY=192.168.100.1


Here's the lspci output:

00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron,
Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ASPEED Technology, Inc. AST2000
02:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC
05:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev b5)
06:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 Gigabit
Ethernet (rev a3)
06:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5715 Gigabit
Ethernet (rev a3)

We tried upgrading to the latest tg3 ethernet driver, but no change in the
symptoms.

-Randy