[CentOS] kickstart problems
Romeo Ninov
rninov at gmail.com
Thu Sep 4 07:52:16 UTC 2008
Paolo Supino wrote / napísal(a):
>
> On Thu, Sep 4, 2008 at 8:27 AM, Romeo Ninov <rninov at gmail.com
> <mailto:rninov at gmail.com>> wrote:
>
>
>
> Paolo Supino wrote / napísal(a):
>
>
>
> On Wed, Sep 3, 2008 at 5:52 PM, Marco Fretz
> <mailinglist at blah.li <mailto:mailinglist at blah.li>
> <mailto:mailinglist at blah.li <mailto:mailinglist at blah.li>>> wrote:
>
> hi,
>
> we had the same problem with newer HP pcs and servers
> (broadcom nics).
> pxe works well on broadcom, the install not. doesn't matter
> if you're
> using kickstart or manual install.
>
> the problem was in centos 4.2. after updating the install
> environment to
> 4.5 the problem was gone... so it was a driver issue! the
> install
> kernel
> is not exactly the normal linux kernel i think.
>
> if anaconda just says that it cannot find install image,
> etc. the
> system
> has no connectivity at this time.
>
> hope this is helpful...
>
> bests
> marco
>
> Paolo Supino wrote:
> >
> >
> > On Tue, Sep 2, 2008 at 3:07 PM, Romeo Ninov
> <rninov at gmail.com <mailto:rninov at gmail.com>
> <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>
> > <mailto:rninov at gmail.com <mailto:rninov at gmail.com>
> <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>>> wrote:
> >
> >
> >
> > Paolo Supino wrote / napísal(a):
> >
> >
> >
> > On Tue, Sep 2, 2008 at 2:17 PM, Romeo Ninov
> <rninov at gmail.com <mailto:rninov at gmail.com>
> <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>
> > <mailto:rninov at gmail.com
> <mailto:rninov at gmail.com> <mailto:rninov at gmail.com
> <mailto:rninov at gmail.com>>>
> <mailto:rninov at gmail.com <mailto:rninov at gmail.com>
> <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>
> > <mailto:rninov at gmail.com
> <mailto:rninov at gmail.com> <mailto:rninov at gmail.com
> <mailto:rninov at gmail.com>>>>> wrote:
> >
> >
> >
> > Paolo Supino wrote / napísal(a):
> >
> >
> >
> > On Tue, Sep 2, 2008 at 8:14 AM, nate
> > <centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>
> <mailto:centos at linuxpowered.net <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>>
> > <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>
> > <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>>>
> > <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>
> > <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>>
> >
> > <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>
> > <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>
> <mailto:centos at linuxpowered.net
> <mailto:centos at linuxpowered.net>>>>>> wrote:
> >
> > Paolo Supino wrote:
> > > Hi Nate
> > >
> >
> > > 3: After the error comes up I get the
> HTTP setup
> > configuration
> > screen with
> > > the source website (in IP) and CentOS
> directory as I
> > entered
> > them in the
> > > pxeconfiguration file and as it
> appears in
> the kickstart
> > configuration file
> > > and all I have to do is press the
> 'OK' button to
> > continue the
> > installation
> > > to a successful completion.
> >
> > If that's the case the next most likely
> culprit is
> >
> > > url --url http://192.168.11.1/source
> >
> >
> > Just because the PXE boot loader can
> download the
> > kickstart
> > config does not mean that the
> installation process
> > will work
> > with that NIC.
> >
> > Also I've had lots of broadcom systems not
> work with
> > kickstart over
> > the years, it's not uncommon for newer
> systems
> to have
> > newer
> > revs of the chipsets and those revs not
> being
> > supported by the
> > installer.
> >
> > But it sounds like in your case it does
> work, so I
> > would look
> > at the url above, as it likely is the
> cause of the
> > problem.
> > Check
> > the http access logs on the server for
> 404s and
> > similar errors.
> >
> > nate
> >
> >
> _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org
> <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org>>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
> > <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org>>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>>
> > <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
> > <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org>>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>>>
> >
> >
> >
> http://lists.centos.org/mailman/listinfo/centos
> >
> >
> >
> > Hi Nate
> >
> > After figuring what I was doing wrong (see
> previous reply
> > ...) I started going through each of my
> systems
> in order to
> > boot them and install CentOS 5.2 on each.
> For the
> most
> > part it
> > works, but only for the most part? Because
> once
> in a few
> > boots
> > (not machine specific) anaconda stops and
> either
> asks me what
> > interface it needs to configure or fails
> to load
> 'stage2.img'
> > from the web server on 192.168.11.1
> <http://192.168.11.1>
> <http://192.168.11.1> <http://192.168.11.1>
> > <http://192.168.11.1>
> > <http://192.168.11.1> ... All cables are good
> cables. The
> > network switch is a Cisco 3750G with no
> configuration)
> > and all
> > the NICs are broadcom with firmware 3.8.9.
> <http://3.8.9.>
> <http://3.8.9.>
> > <http://3.8.9.> <http://3.8.9.>
> > <http://3.8.9.> Can you throw a guess
> where the
> problem might
> > be lying (I hate inconsistencies)?
> >
> >
> > Have you check apache logs for something.
> Check also
> the server
> > messages
> >
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
> > <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
> <mailto:CentOS at centos.org>>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>>
> > http://lists.centos.org/mailman/listinfo/centos
> >
> >
> > Hi Romeo
> >
> > Yes I did, and nothing shows up in either
> access_log or
> > error_log :-(
> > I just had a node that stopped asking me for IP
> configuration
> > (twice) and only on the second time (checked on the
> server using
> > tcpdump) did it actually try to contact the server to
> retrieve
> > network configuration continue and it
> successfully retrieved
> > 'stage2.img' from the web server :-(
> >
> > Paolo, what about DHCP or bootp servers. Check the logs,
> flush ARP
> > cache from server(s)
> >
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
> > http://lists.centos.org/mailman/listinfo/centos
> >
> >
> > Hi Romeo
> >
> > The more systems I boot the more I'm starting to feel
> that it's
> > hardware problem related ... I just booted a system in
> which the
> ELOM
> > says that NIC0 has 1 MAC address, but when I boot the
> system I
> saw on
> > the network a different MAC address altogether ...
> > I'm checking at the lowest level: on the wire (using
> tcpdump)
> so if
> > nothing shows in the capture I'm sure I won't find
> anything in
> the logs :-(
> >
> >
> >
> >
> > --
> > TIA
> > Paolo
> >
> >
> >
>
> ------------------------------------------------------------------------
> >
> > _______________________________________________
> > CentOS mailing list
> > CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
> > http://lists.centos.org/mailman/listinfo/centos
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org <mailto:CentOS at centos.org>
> <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
> http://lists.centos.org/mailman/listinfo/centos
>
>
>
> Hi Marco
>
> Thanx for the email. I've been debugging this problem for a
> few days and a few installs before I posted the first email in
> this thread I started sniffing the network interface on the
> server (dhcp, tftp, http are all on the same computer) and I
> noticed that no communication reaches the server between the
> PXE load and the retrieval error (and I think I wrote about it
> in my original post). Some people suggested that it might be
> that Linux gets confused in the interfaces (the Sun X2200 M2
> has 4 NICs), which I find hard to believe (Linux kernel is old
> enough and probably got rid of these kind of bugs a long time
> ago). In some of the failures the kernel loaded, retrieved the
> kickstart configuration file and than failed to retrieve
> 'stage2.img' (again nothing appeared on the wire). I have a
> sneaky feeling that the kickstart process assumes a lot of
> basic facts and doesn't do any/enough sanity checking. Right
> now I need to get this cluster up and running (I'm already 2
> weeks behind schedule). After it's up I will try to debug the
> process.
> The situation got me so aggravated that I was contemplating
> resurrecting my old private distro (not going to do that) that
> does things in a much simpler way.
>
>
> Paolo
> Unfortunately CentOS/RHEL have really problem in process of
> loading modules, especialy in case of two identical NICs, they
> change on random way. I personaly use this way to mitigate the
> problem: in /etc/modprobe.conf add 1st modprobe for NIC on 1st
> place and second on last place in the file and after reboot i have
> always NIC->eth? relation in place
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org <mailto:CentOS at centos.org>
> http://lists.centos.org/mailman/listinfo/centos
>
>
>
> Hi Marco
>
> I didn't finish testing the way Nate asked me to so right now I
> don't have any conclusive answers about what exactly is going on, but
> in pasting my original email (that started this thread) I wrote that
> what I see happening is:
> anaconda prints an error message that it fails to retrieve
> 'stage2.img' from the HTTP server. I press 'OK' in the error message
> screen. The screen that comes after it is the HTTP setup screen with
> the information given by the 'ks' directive from pxelinux already in
> place, so that the only thing left for me to do is press the 'OK'
> button. When I press the 'OK' button anaconda successfully retrieves
> 'stage2.img' from the http server and goes on to finish successfully
> the unattendded install (take a look at my original post). The only
> thing that makes sense is that the network configuration didn't finish
> (yet) before tring to retrieve 'stage2.img'.
> Along the way I tried to change configuration various times and I
> got all possible failures (or at least it feels like it): failed to
> retrieve kickstart config file, failed to retrieve 'stage2.img' file
> no matter how many times I pressed the 'OK' button in the HTTP setup
> screen, and probably a few more scenarios that I'm trying very hard to
> forget ;-)
> One thing I noticed is that anaconda reconfigures the network
> interface after the kernel already configured it and successfully
> retrieves the kickstart config file from the web server (proved by
> sniffing the network). The question that goes in my mind when I see it
> is: why is it doing that??? and makes me feel that something is wrong
> in the assumptions and install process ..,
> Maybe you're right about the module loading issue because (though it
> doesn't explain what I wrote in the original post): I resorected my
> old distro (a heavily modified Slackware) to test the issue and what I
> found is that a no module kernel (all needed drivers are statically
> compiled before) and no initrd to mess things up the issue simply
> didn't happen (tested 10 times).
> On the other hand if you were right about it than RHEL/CentOS/Fedora
> installation would be unsuitable in any multihome configuration
> because it would map ETH devices differently (albeit once in a while)
> which means one whould have to swtich the cables because of network
> device remapping!!! and that isn't something users and corporations
> that use REHL (and there are many of those) would be willing to live
> with :-)
>
Paolo, this problem occur only in RHEL/CentOS/other RH based distros and
not in Slack, SuSE, Debian, etc. I was not going deeper in the problem,
but that is the reality. BTW: You can play with MAC address in incfg
files, but this is applicable only on already installed machine.About
Your remarc for corporations and RH - you are right, but how often
servers are restarted? :-)
More information about the CentOS
mailing list