[CentOS] PXE problem after CentOS reboot

James Pearson james-p at moving-picture.com
Mon Jan 7 12:19:33 UTC 2008


Andrey Slepuhin wrote:
> Dear folks,
> 
> We are installing a large diskless cluster using CentOS 5.1. The 
> hardware is pretty new - Supermicro X7DWT boards with Harpertown CPUs. 
> Unfortunately we have some PXE-related problems described by the 
> following scenario:
> 1) Set up DHCP, TFTP and NFS on a server, prepare PXE kernel and initrd 
> - fine.
> 2) Start up the node using PXE for the first time - fine.
> 3) Reboot the node - PXE boot fails for all next attempts. We see that a 
> server gets DHCP requests and answers them, but a node doesn't response 
> with DHCP ack. The typical DHCP log is:
> Jan  5 09:14:34 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via 
> eth1
> Jan  5 09:14:34 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 
> 00:30:48:7e:24:a6 via eth1
> Jan  5 09:14:36 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via 
> eth1
> Jan  5 09:14:36 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 
> 00:30:48:7e:24:a6 via eth1
> Jan  5 09:14:40 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via 
> eth1
> Jan  5 09:14:40 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 
> 00:30:48:7e:24:a6 via eth1
> Jan  5 09:14:48 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via 
> eth1
> Jan  5 09:14:48 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 
> 00:30:48:7e:24:a6 via eth1
> 4) Anything like DHCP server restart, node reset, node power on/off 
> doesn't help
> 5) The only thing that will enable system to boot again over PXE is to 
> perform "bmc reset cold" command on a node using ipmitool - yes, we have 
> IPMI card sharing the same Ethernet interface. After that we can boot 
> CentOS again.
> 6) When Linux is loaded, if we reboot a node using "bmc power cycle" 
> instead of reboot or shutdown, a node will boot for the next time 
> without problems
> 7) There are no problems with a second GbE interface (without IPMI)
> 8) So our guess is that Linux on a reboot leaves Ethernet device in some 
> state that cause brain damage for IPMI+PXE combination. We tried to play 
> with some e1000 driver options, we are also tried latest Intel driver - 
> nothing helps.
> Do you have any idea what goes wrong? Any help will be much appreciated. 

I don't, but we don't share the IPMI interface with PXE and the OS i.e. 
we set things up so IPMI uses the first interface and it boots via PXE 
off the 2nd and the OS uses the 2nd interface only.

We do this as we had problems using the IPMI Serial-Over-LAN (SOL) and 
console redirection over SOL with PXE - the PXE boot would reset the NIC 
and break the SOL connection - so we gave up and decided to separate 
IPMI from PXE and the OS

James Pearson



More information about the CentOS mailing list