[CentOS] kickstart problems

Thu Sep 4 07:52:16 UTC 2008
Romeo Ninov <rninov at gmail.com>


Paolo Supino  wrote / napísal(a):
>
> On Thu, Sep 4, 2008 at 8:27 AM, Romeo Ninov <rninov at gmail.com 
> <mailto:rninov at gmail.com>> wrote:
>
>
>
>     Paolo Supino  wrote / napísal(a):
>
>
>
>         On Wed, Sep 3, 2008 at 5:52 PM, Marco Fretz
>         <mailinglist at blah.li <mailto:mailinglist at blah.li>
>         <mailto:mailinglist at blah.li <mailto:mailinglist at blah.li>>> wrote:
>
>            hi,
>
>            we had the same problem with newer HP pcs and servers
>         (broadcom nics).
>            pxe works well on broadcom, the install not. doesn't matter
>         if you're
>            using kickstart or manual install.
>
>            the problem was in centos 4.2. after updating the install
>            environment to
>            4.5 the problem was gone... so it was a driver issue! the
>         install
>            kernel
>            is not exactly the normal linux kernel i think.
>
>            if anaconda just says that it cannot find install image,
>         etc. the
>            system
>            has no connectivity at this time.
>
>            hope this is helpful...
>
>            bests
>             marco
>
>            Paolo Supino wrote:
>            >
>            >
>            > On Tue, Sep 2, 2008 at 3:07 PM, Romeo Ninov
>         <rninov at gmail.com <mailto:rninov at gmail.com>
>            <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>
>            > <mailto:rninov at gmail.com <mailto:rninov at gmail.com>
>         <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>>> wrote:
>            >
>            >
>            >
>            >     Paolo Supino  wrote / napísal(a):
>            >
>            >
>            >
>            >         On Tue, Sep 2, 2008 at 2:17 PM, Romeo Ninov
>            <rninov at gmail.com <mailto:rninov at gmail.com>
>         <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>
>            >         <mailto:rninov at gmail.com
>         <mailto:rninov at gmail.com> <mailto:rninov at gmail.com
>         <mailto:rninov at gmail.com>>>
>            <mailto:rninov at gmail.com <mailto:rninov at gmail.com>
>         <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>
>            >         <mailto:rninov at gmail.com
>         <mailto:rninov at gmail.com> <mailto:rninov at gmail.com
>         <mailto:rninov at gmail.com>>>>> wrote:
>            >
>            >
>            >
>            >            Paolo Supino  wrote / napísal(a):
>            >
>            >
>            >
>            >                On Tue, Sep 2, 2008 at 8:14 AM, nate
>            >         <centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>
>         <mailto:centos at linuxpowered.net <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>>
>            >                <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>
>            >         <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>>>
>            >                <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>
>            >         <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>>
>            >
>            >                <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>
>            >         <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>
>            <mailto:centos at linuxpowered.net
>         <mailto:centos at linuxpowered.net>>>>>> wrote:
>            >
>            >                   Paolo Supino wrote:
>            >                   > Hi Nate
>            >                   >
>            >
>            >                   > 3: After the error comes up I get the
>         HTTP setup
>            >                configuration
>            >                   screen with
>            >                   > the source website (in IP) and CentOS
>            directory as I
>            >         entered
>            >                   them in the
>            >                   > pxeconfiguration file and as it
>         appears in
>            the kickstart
>            >                   configuration file
>            >                   > and all I have to do is press the
>         'OK' button to
>            >         continue the
>            >                   installation
>            >                   > to a successful completion.
>            >
>            >                   If that's the case the next most likely
>         culprit is
>            >
>            >                   > url --url http://192.168.11.1/source
>            >
>            >
>            >                   Just because the PXE boot loader can
>         download the
>            >         kickstart
>            >                   config does not mean that the
>         installation process
>            >         will work
>            >                   with that NIC.
>            >
>            >                   Also I've had lots of broadcom systems not
>            work with
>            >                kickstart over
>            >                   the years, it's not uncommon for newer
>         systems
>            to have
>            >         newer
>            >                   revs of the chipsets and those revs not
>         being
>            >         supported by the
>            >                   installer.
>            >
>            >                   But it sounds like in your case it does
>         work, so I
>            >         would look
>            >                   at the url above, as it likely is the
>         cause of the
>            >         problem.
>            >                Check
>            >                   the http access logs on the server for
>         404s and
>            >         similar errors.
>            >
>            >                   nate
>            >
>            >                  
>         _______________________________________________
>            >                   CentOS mailing list
>            >                   CentOS at centos.org
>         <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org>>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
>            >         <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org>>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>>
>            >                <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
>            >         <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org>>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>>>
>            >
>            >
>            >                  
>         http://lists.centos.org/mailman/listinfo/centos
>            >
>            >
>            >
>            >                Hi Nate
>            >
>            >                 After figuring what I was doing wrong (see
>            previous reply
>            >                ...) I started going through each of my
>         systems
>            in order to
>            >                boot them and install CentOS 5.2 on each.
>         For the
>            most
>            >         part it
>            >                works, but only for the most part? Because
>         once
>            in a few
>            >         boots
>            >                (not machine specific) anaconda stops and
>         either
>            asks me what
>            >                interface it needs to configure or fails
>         to load
>            'stage2.img'
>            >                from the web server on 192.168.11.1
>         <http://192.168.11.1>
>            <http://192.168.11.1> <http://192.168.11.1>
>            >         <http://192.168.11.1>
>            >                <http://192.168.11.1> ... All cables are good
>            cables. The
>            >                network switch is a Cisco 3750G with no
>            configuration)
>            >         and all
>            >                the NICs are broadcom with firmware 3.8.9.
>         <http://3.8.9.>
>            <http://3.8.9.>
>            >         <http://3.8.9.> <http://3.8.9.>
>            >                <http://3.8.9.> Can you throw a guess
>         where the
>            problem might
>            >                be lying (I hate inconsistencies)?
>            >
>            >
>            >            Have you check apache logs for something.
>         Check also
>            the server
>            >            messages
>            >
>            >            _______________________________________________
>            >            CentOS mailing list
>            >            CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
>            >         <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
>         <mailto:CentOS at centos.org>>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>>
>            >            http://lists.centos.org/mailman/listinfo/centos
>            >
>            >
>            >         Hi Romeo
>            >
>            >          Yes I did, and nothing shows up in either
>         access_log or
>            >         error_log :-(
>            >         I just had a node that stopped asking me for IP
>            configuration
>            >         (twice) and only on the second time (checked on the
>            server using
>            >         tcpdump) did it actually try to contact the server to
>            retrieve
>            >         network configuration continue and it
>         successfully retrieved
>            >         'stage2.img' from the web server :-(
>            >
>            >     Paolo, what about DHCP or bootp servers. Check the logs,
>            flush ARP
>            >     cache from server(s)
>            >
>            >     _______________________________________________
>            >     CentOS mailing list
>            >     CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>            <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
>            >     http://lists.centos.org/mailman/listinfo/centos
>            >
>            >
>            > Hi Romeo
>            >
>            >   The more systems I boot the more I'm starting to feel
>         that it's
>            > hardware problem related ... I just booted a system in
>         which the
>            ELOM
>            > says that NIC0 has 1 MAC address, but when I boot the
>         system I
>            saw on
>            > the network a different MAC address altogether ...
>            >   I'm checking at the lowest level: on the wire (using
>         tcpdump)
>            so if
>            > nothing shows in the capture I'm sure I won't find
>         anything in
>            the logs :-(
>            >
>            >
>            >
>            >
>            > --
>            > TIA
>            > Paolo
>            >
>            >
>            >
>          
>          ------------------------------------------------------------------------
>            >
>            > _______________________________________________
>            > CentOS mailing list
>            > CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>            > http://lists.centos.org/mailman/listinfo/centos
>            _______________________________________________
>            CentOS mailing list
>            CentOS at centos.org <mailto:CentOS at centos.org>
>         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>            http://lists.centos.org/mailman/listinfo/centos
>
>
>
>         Hi Marco
>
>          Thanx for the email. I've been debugging this problem for a
>         few days and a few installs before I posted the first email in
>         this thread I started sniffing the network interface on the
>         server (dhcp, tftp, http are all on the same computer) and I
>         noticed that no communication reaches the server between the
>         PXE load and the retrieval error (and I think I wrote about it
>         in my original post). Some people suggested that it might be
>         that Linux gets confused in the interfaces (the Sun X2200 M2
>         has 4 NICs), which I find hard to believe (Linux kernel is old
>         enough and probably got rid of these kind of bugs a long time
>         ago). In some of the failures the kernel loaded, retrieved the
>         kickstart configuration file and than failed to retrieve
>         'stage2.img' (again nothing appeared on the wire). I have a
>         sneaky feeling that the kickstart process assumes a lot of
>         basic facts and doesn't do any/enough sanity checking. Right
>         now I need to get this cluster up and running (I'm already 2
>         weeks behind schedule). After it's up I will try to debug the
>         process.
>          The situation got me so aggravated that I was contemplating
>         resurrecting my old private distro (not going to do that) that
>         does things in a much simpler way.
>
>
>     Paolo
>     Unfortunately CentOS/RHEL have really problem in process of
>     loading modules, especialy in case of two identical NICs, they
>     change on random way. I personaly use this way to mitigate the
>     problem: in /etc/modprobe.conf add 1st modprobe for NIC on 1st
>     place and second on last place in the file and after reboot i have
>     always NIC->eth? relation in place
>
>     _______________________________________________
>     CentOS mailing list
>     CentOS at centos.org <mailto:CentOS at centos.org>
>     http://lists.centos.org/mailman/listinfo/centos
>
>
>
> Hi Marco
>
>   I didn't finish testing the way Nate asked me to so right now I 
> don't have any conclusive answers about what exactly is going on, but 
> in pasting my original email (that started this thread) I wrote that 
> what I see happening is:
> anaconda prints an error message that it fails to retrieve 
> 'stage2.img' from the HTTP server. I press 'OK' in the error message 
> screen. The screen that comes after it is the HTTP setup screen with 
> the information given by the 'ks' directive from pxelinux already in 
> place, so that the only thing left for me to do is press the 'OK' 
> button. When I press the 'OK' button anaconda successfully retrieves 
> 'stage2.img' from the http server and goes on to finish successfully 
> the unattendded install (take a look at my original post). The only 
> thing that makes sense is that the network configuration didn't finish 
> (yet) before tring to retrieve 'stage2.img'.
>   Along the way I tried to change configuration various times and I 
> got all possible failures (or at least it feels like it): failed to 
> retrieve kickstart config file, failed to retrieve 'stage2.img' file 
> no matter how many times I pressed the 'OK' button in the HTTP setup 
> screen, and probably a few more scenarios that I'm trying very hard to 
> forget ;-)
>   One thing I noticed is that anaconda reconfigures the network 
> interface after the kernel already configured it and successfully 
> retrieves the kickstart config file from the web server (proved by 
> sniffing the network). The question that goes in my mind when I see it 
> is: why is it doing that??? and makes me feel that something is wrong 
> in the assumptions and install process ..,
>   Maybe you're right about the module loading issue because (though it 
> doesn't explain what I wrote in the original post): I resorected my 
> old distro (a heavily modified Slackware) to test the issue and what I 
> found is that a no module kernel (all needed drivers are statically 
> compiled before) and no initrd to mess things up the issue simply 
> didn't happen (tested 10 times). 
>   On the other hand if you were right about it than RHEL/CentOS/Fedora 
> installation would be unsuitable in any multihome configuration 
> because it would map ETH devices differently (albeit once in a while) 
> which means one whould have to swtich the cables because of network 
> device remapping!!! and that isn't something users and corporations 
> that use REHL (and there are many of those) would be willing to live 
> with :-)
>
Paolo, this problem occur only in RHEL/CentOS/other RH based distros and 
not in Slack, SuSE, Debian, etc. I was not going deeper in the problem, 
but that is the reality. BTW: You can play with MAC address in incfg 
files, but this is applicable only on already installed machine.About 
Your remarc for corporations and RH - you are right, but how often 
servers are restarted? :-)