a quick and dirty hack to 'fix' the problem in a large scale-- RE: [CentOS] Nic order detection

Sun Jan 13 03:32:39 UTC 2008
Guolin Cheng <guolin at alexa.com>

Michael,

 There are no points to argue about which are the best 'official' ways
which just like a war between vi or Emacs before. I may be stupid but
any methods fix users' problem are the best ones. I've tried the
official 'rename' or udev ways before, but finally I gave up and end up
the two ways I've mentioned. Espectially the seconds, it works perfectly
when I rerolled my Centos 5.0 and 5.1 initrd.img files for custom
Kickstart installation in a really large scale.

Good luck and have a new year.

--Guolin




-----Original Message-----
From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On
Behalf Of Michael D. Kralka
Sent: Saturday, January 12, 2008 5:41 AM
To: CentOS mailing list
Subject: Re: a quick and dirty hack to 'fix' the problem in a large
scale-- RE: [CentOS] Nic order detection

Guolin Cheng wrote:
> Les and Michael,

I am going to bite my tongue and not ask to you refrain from top
posting.

As your subject suggests, you are proposing a quick and dirty hack to
deal with interface assignment to physical NICs. Why bother with a quick
and dirty hack when a sensible solution exists within the distribution?
I see this a bad advice and hope no one follows it.

> There are a few ways to workaround the NIC detection issue. Each has
its
> own advantages and limits.
> 
> The first method is: suppose you or your team have full control of
> running kernel on your hundreds/thousands of boxes, your can then
build
> some NIC drivers statically in the kernel -- these statically built
NIC
> drivers will be detected as eth0 without glitches -- then leave other
> different NIC types on the same box still in dynamic kernel modules
> status. It works greatly if you know all the types of primary network
> NIC. Typically e100, tg3, etc. and you have already standardized the
2nd
> NIC on the boxes to one or two brands like e1000.

Although this may "work", I have just signed up for a lifetime of
chasing kernel versions. Every time RHEL/CentOS release a new kernel to
fix a bug or security vulnerability, I must recompile the kernel. How
does this make sense if I have hundreds/thousands of boxes to to keep up
to date? I'd rather "yum update" on all the boxes (which is easy to do)

> The second method is: suppose you or your team can not control
> rebuilding of kernel, or at least you have no full control, but you
> really know the types of primary/secondary NICs combinations on all
the
> Linux boxes in your kingdom. Then you can try the following hack:
> 
>  You can try to add/change lines in /lib/modules/`uname
-r`/modules.dep
> file according to your NICs combinations -- always load the drivers
> according to your predefined order. For example:
> 
> .../e1000.ko: .../tg3.ko .../3c59x.ko .../e100.ko .../forcedeth.ko
> .../forcedeth.ko: .../tg3.ko

Although this may "work", it is another accident waiting to happen. This
is a generated file and it is almost never a good idea to modify an
generated file; one will get burned. I install a shiny new module that
is not delivered as part of the kernel (drbd perhaps), and the
post-install script runs "depmod -a" (a sensible thing to do); now I
have just blown away the manual changes. Or ever time I install a new
kernel (whether I am foolishly[1] building my own or using the
distribution kernels), I have to remember to make this change. The worst
part about this is that the effects will not be visible until the next
time the server is rebooted (say 6 months when there is a power
failure); the network interface assignment will be wrong. Good luck
hunting down that problem in a pinch!

[1]  Don't get me wrong, there is a time and a place for building custom
kernels; this is just not one of them.

> The above means to load the module at left, system will first load
> modules at right! So tg3|3c59x|e100|forcedeth always load before
e1000,
> and tg3 load before forcedeth. The same idea can be applied to all NIC
> combination types your have and can be set only once and applied to
all
> your linux boxes if you set it up correctly. The side-effect is: you
> have waste few hundreds Kilobytes memory, but who cares?

The problem is not the wasted memory, it's the fragility of its design.

> There are also other tricks I tried before, some works and some not.
But
> I think the above should probably work for most general cases.

Why resort to "tricks" when there is a perfectly good solution supported
by the distribution? I've learned that it never pays to be clever. When
resorting to neat little tricks to get things to work, they get
forgotten, or worse when someone else must look into a problem, they
spend most of the time trying to understand the clever way things are
set up. When stability is a main concern, boring is always better.

Cheers,
Michael

_______________________________________________
CentOS mailing list
CentOS at centos.org
http://lists.centos.org/mailman/listinfo/centos