[CentOS] kickstart problems

Thu Sep 4 07:35:02 UTC 2008
Paolo Supino <paolo.supino at gmail.com>

On Thu, Sep 4, 2008 at 8:27 AM, Romeo Ninov <rninov at gmail.com> wrote:

>
>
> Paolo Supino  wrote / napísal(a):
>
>>
>>
>> On Wed, Sep 3, 2008 at 5:52 PM, Marco Fretz <mailinglist at blah.li <mailto:
>> mailinglist at blah.li>> wrote:
>>
>>    hi,
>>
>>    we had the same problem with newer HP pcs and servers (broadcom nics).
>>    pxe works well on broadcom, the install not. doesn't matter if you're
>>    using kickstart or manual install.
>>
>>    the problem was in centos 4.2. after updating the install
>>    environment to
>>    4.5 the problem was gone... so it was a driver issue! the install
>>    kernel
>>    is not exactly the normal linux kernel i think.
>>
>>    if anaconda just says that it cannot find install image, etc. the
>>    system
>>    has no connectivity at this time.
>>
>>    hope this is helpful...
>>
>>    bests
>>     marco
>>
>>    Paolo Supino wrote:
>>    >
>>    >
>>    > On Tue, Sep 2, 2008 at 3:07 PM, Romeo Ninov <rninov at gmail.com
>>    <mailto:rninov at gmail.com>
>>    > <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>> wrote:
>>    >
>>    >
>>    >
>>    >     Paolo Supino  wrote / napísal(a):
>>    >
>>    >
>>    >
>>    >         On Tue, Sep 2, 2008 at 2:17 PM, Romeo Ninov
>>    <rninov at gmail.com <mailto:rninov at gmail.com>
>>    >         <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>
>>    <mailto:rninov at gmail.com <mailto:rninov at gmail.com>
>>    >         <mailto:rninov at gmail.com <mailto:rninov at gmail.com>>>> wrote:
>>    >
>>    >
>>    >
>>    >            Paolo Supino  wrote / napísal(a):
>>    >
>>    >
>>    >
>>    >                On Tue, Sep 2, 2008 at 8:14 AM, nate
>>    >         <centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net> <mailto:centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net>>
>>    >                <mailto:centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net>
>>    >         <mailto:centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net>>>
>>    >                <mailto:centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net>
>>    >         <mailto:centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net>>
>>    >
>>    >                <mailto:centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net>
>>    >         <mailto:centos at linuxpowered.net
>>    <mailto:centos at linuxpowered.net>>>>> wrote:
>>    >
>>    >                   Paolo Supino wrote:
>>    >                   > Hi Nate
>>    >                   >
>>    >
>>    >                   > 3: After the error comes up I get the HTTP setup
>>    >                configuration
>>    >                   screen with
>>    >                   > the source website (in IP) and CentOS
>>    directory as I
>>    >         entered
>>    >                   them in the
>>    >                   > pxeconfiguration file and as it appears in
>>    the kickstart
>>    >                   configuration file
>>    >                   > and all I have to do is press the 'OK' button to
>>    >         continue the
>>    >                   installation
>>    >                   > to a successful completion.
>>    >
>>    >                   If that's the case the next most likely culprit is
>>    >
>>    >                   > url --url http://192.168.11.1/source
>>    >
>>    >
>>    >                   Just because the PXE boot loader can download the
>>    >         kickstart
>>    >                   config does not mean that the installation process
>>    >         will work
>>    >                   with that NIC.
>>    >
>>    >                   Also I've had lots of broadcom systems not
>>    work with
>>    >                kickstart over
>>    >                   the years, it's not uncommon for newer systems
>>    to have
>>    >         newer
>>    >                   revs of the chipsets and those revs not being
>>    >         supported by the
>>    >                   installer.
>>    >
>>    >                   But it sounds like in your case it does work, so I
>>    >         would look
>>    >                   at the url above, as it likely is the cause of the
>>    >         problem.
>>    >                Check
>>    >                   the http access logs on the server for 404s and
>>    >         similar errors.
>>    >
>>    >                   nate
>>    >
>>    >                   _______________________________________________
>>    >                   CentOS mailing list
>>    >                   CentOS at centos.org <mailto:CentOS at centos.org>
>>    <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>>    >         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>>    <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
>>    >                <mailto:CentOS at centos.org
>>    <mailto:CentOS at centos.org> <mailto:CentOS at centos.org
>>    <mailto:CentOS at centos.org>>
>>    >         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>>    <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>>
>>    >
>>    >
>>    >                   http://lists.centos.org/mailman/listinfo/centos
>>    >
>>    >
>>    >
>>    >                Hi Nate
>>    >
>>    >                 After figuring what I was doing wrong (see
>>    previous reply
>>    >                ...) I started going through each of my systems
>>    in order to
>>    >                boot them and install CentOS 5.2 on each. For the
>>    most
>>    >         part it
>>    >                works, but only for the most part? Because once
>>    in a few
>>    >         boots
>>    >                (not machine specific) anaconda stops and either
>>    asks me what
>>    >                interface it needs to configure or fails to load
>>    'stage2.img'
>>    >                from the web server on 192.168.11.1
>>    <http://192.168.11.1> <http://192.168.11.1>
>>    >         <http://192.168.11.1>
>>    >                <http://192.168.11.1> ... All cables are good
>>    cables. The
>>    >                network switch is a Cisco 3750G with no
>>    configuration)
>>    >         and all
>>    >                the NICs are broadcom with firmware 3.8.9.
>>    <http://3.8.9.>
>>    >         <http://3.8.9.> <http://3.8.9.>
>>    >                <http://3.8.9.> Can you throw a guess where the
>>    problem might
>>    >                be lying (I hate inconsistencies)?
>>    >
>>    >
>>    >            Have you check apache logs for something. Check also
>>    the server
>>    >            messages
>>    >
>>    >            _______________________________________________
>>    >            CentOS mailing list
>>    >            CentOS at centos.org <mailto:CentOS at centos.org>
>>    <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>>    >         <mailto:CentOS at centos.org <mailto:CentOS at centos.org>
>>    <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>>
>>    >            http://lists.centos.org/mailman/listinfo/centos
>>    >
>>    >
>>    >         Hi Romeo
>>    >
>>    >          Yes I did, and nothing shows up in either access_log or
>>    >         error_log :-(
>>    >         I just had a node that stopped asking me for IP
>>    configuration
>>    >         (twice) and only on the second time (checked on the
>>    server using
>>    >         tcpdump) did it actually try to contact the server to
>>    retrieve
>>    >         network configuration continue and it successfully retrieved
>>    >         'stage2.img' from the web server :-(
>>    >
>>    >     Paolo, what about DHCP or bootp servers. Check the logs,
>>    flush ARP
>>    >     cache from server(s)
>>    >
>>    >     _______________________________________________
>>    >     CentOS mailing list
>>    >     CentOS at centos.org <mailto:CentOS at centos.org>
>>    <mailto:CentOS at centos.org <mailto:CentOS at centos.org>>
>>    >     http://lists.centos.org/mailman/listinfo/centos
>>    >
>>    >
>>    > Hi Romeo
>>    >
>>    >   The more systems I boot the more I'm starting to feel that it's
>>    > hardware problem related ... I just booted a system in which the
>>    ELOM
>>    > says that NIC0 has 1 MAC address, but when I boot the system I
>>    saw on
>>    > the network a different MAC address altogether ...
>>    >   I'm checking at the lowest level: on the wire (using tcpdump)
>>    so if
>>    > nothing shows in the capture I'm sure I won't find anything in
>>    the logs :-(
>>    >
>>    >
>>    >
>>    >
>>    > --
>>    > TIA
>>    > Paolo
>>    >
>>    >
>>    >
>>
>>  ------------------------------------------------------------------------
>>    >
>>    > _______________________________________________
>>    > CentOS mailing list
>>    > CentOS at centos.org <mailto:CentOS at centos.org>
>>    > http://lists.centos.org/mailman/listinfo/centos
>>    _______________________________________________
>>    CentOS mailing list
>>    CentOS at centos.org <mailto:CentOS at centos.org>
>>    http://lists.centos.org/mailman/listinfo/centos
>>
>>
>>
>> Hi Marco
>>
>>  Thanx for the email. I've been debugging this problem for a few days and
>> a few installs before I posted the first email in this thread I started
>> sniffing the network interface on the server (dhcp, tftp, http are all on
>> the same computer) and I noticed that no communication reaches the server
>> between the PXE load and the retrieval error (and I think I wrote about it
>> in my original post). Some people suggested that it might be that Linux gets
>> confused in the interfaces (the Sun X2200 M2 has 4 NICs), which I find hard
>> to believe (Linux kernel is old enough and probably got rid of these kind of
>> bugs a long time ago). In some of the failures the kernel loaded, retrieved
>> the kickstart configuration file and than failed to retrieve 'stage2.img'
>> (again nothing appeared on the wire). I have a sneaky feeling that the
>> kickstart process assumes a lot of basic facts and doesn't do any/enough
>> sanity checking. Right now I need to get this cluster up and running (I'm
>> already 2 weeks behind schedule). After it's up I will try to debug the
>> process.
>>  The situation got me so aggravated that I was contemplating resurrecting
>> my old private distro (not going to do that) that does things in a much
>> simpler way.
>>
>>
>>  Paolo
> Unfortunately CentOS/RHEL have really problem in process of loading
> modules, especialy in case of two identical NICs, they change on random way.
> I personaly use this way to mitigate the problem: in /etc/modprobe.conf add
> 1st modprobe for NIC on 1st place and second on last place in the file and
> after reboot i have always NIC->eth? relation in place
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>


Hi Marco

  I didn't finish testing the way Nate asked me to so right now I don't have
any conclusive answers about what exactly is going on, but in pasting my
original email (that started this thread) I wrote that what I see happening
is:
anaconda prints an error message that it fails to retrieve 'stage2.img' from
the HTTP server. I press 'OK' in the error message screen. The screen that
comes after it is the HTTP setup screen with the information given by the
'ks' directive from pxelinux already in place, so that the only thing left
for me to do is press the 'OK' button. When I press the 'OK' button anaconda
successfully retrieves 'stage2.img' from the http server and goes on to
finish successfully the unattendded install (take a look at my original
post). The only thing that makes sense is that the network configuration
didn't finish (yet) before tring to retrieve 'stage2.img'.
  Along the way I tried to change configuration various times and I got all
possible failures (or at least it feels like it): failed to retrieve
kickstart config file, failed to retrieve 'stage2.img' file no matter how
many times I pressed the 'OK' button in the HTTP setup screen, and probably
a few more scenarios that I'm trying very hard to forget ;-)
  One thing I noticed is that anaconda reconfigures the network interface
after the kernel already configured it and successfully retrieves the
kickstart config file from the web server (proved by sniffing the network).
The question that goes in my mind when I see it is: why is it doing that???
and makes me feel that something is wrong in the assumptions and install
process ..,
  Maybe you're right about the module loading issue because (though it
doesn't explain what I wrote in the original post): I resorected my old
distro (a heavily modified Slackware) to test the issue and what I found is
that a no module kernel (all needed drivers are statically compiled before)
and no initrd to mess things up the issue simply didn't happen (tested 10
times).
  On the other hand if you were right about it than RHEL/CentOS/Fedora
installation would be unsuitable in any multihome configuration because it
would map ETH devices differently (albeit once in a while) which means one
whould have to swtich the cables because of network device remapping!!! and
that isn't something users and corporations that use REHL (and there are
many of those) would be willing to live with :-)








--
ttyl
Paolo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20080904/1a2e7156/attachment-0005.html>