[CentOS] Kickstarts failing 30% of time on Dell 620 blades

Fri May 9 20:06:21 UTC 2014
m.roth at 5-cent.us <m.roth at 5-cent.us>

Dan Hyatt wrote:
>
> I have a large set of Dell 620 blades fully populated with memory and
> duel socket CPUs, Centos6.4 image.
>
> I have a kickstart that I am using to pxe boot 36 blades.
> I have two internal drives which are raid1  (two disks formed into one,
> no redundancy), not san attached
> In the first set, 9 successfully completed. 7 more built correctly after
> trying another pxe boot. 2 just wont pxeboot
<snip>
> Any idea why this would happen with identical hardware, identical
> kickstart/image, inside the same blade chassis.
> Any idea what to test.

Nasty thoughts: look at one that's gone to grub, and from the grub command
line, try root (<tab>). Then try kernel \vm<tab>

I'm just wondering if either they're not pointing to the same UUID, or if
they're looking at /dev/sda, and some of them have enumerated it so that
it's /dev/sdb, or whatever. Also, I wonder about the possibility of a race
issue, if they're all trying to come up at the same time.

        mark