[CentOS] Re: Installation problem, possibly RAID -- [OT] Why OS installers always "suck"

Sun Sep 11 18:32:52 UTC 2005
Bryan J. Smith <b.j.smith at ieee.org>

On Sun, 2005-09-11 at 13:40 -0400, Edward Diener wrote:
> My thought does not involve having to know the exact sequence of events at any 
> level when an error occurs but rather having that error recognized at the 
> earliest possible time and propagating that error to the code which can put out 
> an intelligent, even very technical if necessary, message to end-user. That 
> message would then give the end-user at least a fighting chance of either 
> understanding what went wrong, or reporting the message so that a developer of 
> the install could explain it to the end-user.

You're thinking like an application/installer developer, not an
OS/installer developer.  They are _worlds_ of difference.

Remember why the OS exists, to provide a _standard_set_ of interfaces to
applications.  That means there are "known, standardized interfaces"
when you install an application, or when an application is running.

When the OS installer itself is running, you have _no_ standardization
in hardware.  The OS _provides_ the standardization for the hardware.
So until that OS is setup -- the installer can do little to provide why
something failed.

> I grant that OS code, especially in an installer, is almost certainly far more 
> complicated than any application code. Still there are techniques for reporting 
> errors from the point in which they are discovered. The modern way is exception 
> handling, but even if one uses the older error code technique, the error should 
> translate into a more narrow possibility thatn the vague and generalized message 
> which I received.

Again, you're _still_ thinking like an application/installer developer.
Monolithic kernel device drivers do _not_ have exception handling, they
can_not_ call some "user-space system function" to throw an exception --
they throw far more "low-level" interrupts.  There are no "rich
exception handling features" for them to use.

Otherwise device drivers would be slow and bloated.  We're talking the
kernel program and OS itself -- not some selection of programs that can
take advantage of the exception facilities of a pre-emptive OS.  In
other words, there is no "pre-emptive OS" for the OS itself.  ;->

So the common way to "find out what's wrong" in a kernel is to run a
_separate_ "host system" (be it a 2nd, physical system, or run the
"target" as a virtual system on the "host").

> Is there a Bugzilla data base for CentOS ? If so I sure do not see it anywhere 
> off of the home page at http://www.centos.org/.

CentOS is a 100% 1:1 rebuild of Red Hat Enterprise Linux.  File them at
Red Hat.  You do _not_ need to pay Red Hat one dime.

> I agree with all you say above. My argument is not that any OS must install on 
> my computer, or anyone elses computer, but that if it does not the end-user 
> should be given a decent indication why not. Even though OS code is generally 
> more complicated than application code I see no difference in its ability to 
> give intelligent error messages, or even error message numbers which translate 
> to an intelligent reason.

And I'm telling you it's impossible.

As someone who has built numerous embedded systems -- the "system-level"
developer standpoint is 100% different than the "application-level" (or
even library-level) developer standpoint.  Again, in a nutshell, there
is no "pre-emptive OS" to service the OS itself.

Which is why, in the embedded world, we use failsafe firmwares with a
failsafe mode, a "host" platform for remote debugging, etc...  In the
commodity PC world -- if the OS/installer screws up, it's very, very
difficult to get anything useable like application exception handling.

> I was spoiled by FC3 which installed and worked nearly flawlessly on my machine.

Stick with Fedora Core 3.  Alan Cox just came out the other day on how
many issues have been introduced on Fedora Core 4 by various changes,
and why he's staying with Fedora Core 3.

My History ...

I did _not_ upgrade to Red Hat Linux 5.0, waited for Red Hat Linux 5.2.
I did _not_ upgrade to Red Hat Linux 6.0, waited for Red Hat Linux 6.1.
I did _not_ upgrade to Red Hat Linux 7.0, waited for Red Hat Linux 7.2.
I did _not_ upgrade to Red Hat Linux 8.0, waited for Red Hat Linux 9.
I did _not_ upgrade to Fedora Core 2, waited for Fedora Core 3.
I am _not_ upgrading to Fedora Core 4, waiting for Fedora Core 5.

> You make a good point but it is annoying that what worked on a previous release 
> could not even install on the next one. C'est la vie.

Because things changed _radically_.  Anytime you do that, you introduce
issues.  But if Red Hat didn't adopt GLibC 2.0 (RHL5.0), GCC 2.96/3
(RHL7.0), 

Frankly, I continue to blame this on Red Hat.  I have point-by-point
recommended why they need to go back to the revisioning so people _know_
when there is a ".0" release when there is a ".1+" revision.

Last year I covered this _in_depth_ (through Fedora Core 3):  
  http://www.geocities.com/thebs413/RH-Distribution-FAQ-3.html  

> My question was a reply to your statement of
> "In fact, the main problem isn't Linux, but the increase in "superstore-
> designed hardware."  And that means cheap, poorly tested, Windows
> version _specific_ drivers, and absolutely _no_ public specifications."
> You seemed to be implying that poor Windows version _specific_ drivers was one 
> of the reasons why Linux has trouble creating device drivers for hardware. So I 
> asked the above as if to say that I find it hard to believe that Linux depends 
> on disassembling Windows driver code in order to get at the internal hardware 
> specs for a device.

Yes, that's _exactly_ it in many, many cases!

Furthermore, a lot of "superstore hardware" is now 99.9% software.  And
you can't grow those software subsystems overnight.  There are major 3rd
parties who sell software RAID, software modem, software audio, software
MAC (network), etc... software that make a _killing_ because hardware
vendors license it, and change a few things.

Eventually Linux creates replacement, unified subsystems, but it's not
always easy to figure out how to get them to interface to the specific
and endless variants of hardware out there.  E.g., there might be
literally 1,000 products with 1,000 _different_ Windows drivers, all
from the same codebase (but slightly different), and if the _single_
Linux driver can support 300 of those variants, that's a "good driver."

Not to mention the fact that when a new piece of "superstore hardware"
comes out, the Windows drivers have been written for it _prior_ to
release.  Linux drivers writers typically have to wait until the
hardware is on the shelf before they can start.  Hence the lag time.

This is _not_ going to get better.  It's one of the falicies in the
Linux world.  Only sheer volume matters, and that's most server hardware
(non-superstore) have good Linux drivers, because Linux is about 30+% of
new server shipments from OEMs.

> I do not doubt the difficulty but I have done some very difficult programming 
> myself so I do not doubt the solution also.

Have you done kernel-level and device-driver development?  It's not
about "difficult programming."  It's about system-level development.

E.g., have you written a boot-loader?

> Thanks for pointing this out to me.
> Microsoft does have a very good record in supporting hardware devices, 

Not Microsoft, but hardware vendors.
Microsoft does _not_ write device drivers.
If they did, Windows would have 1/100th the driver support of Linux.

> and their install programs are very solid.

I completely _disagree_.  Especially for those of us who _only_
supported Windows NT 3.5, 3.51 and 4.0 when Windows 95 and 98 were
popular.

Hardware is developed for specific Windows releases.  If you have
hardware that is not supported by your Windows version, you can find all
sorts of issues.

And the installer can get into a self-rebooting loop that it _never_
exits.  With Linux installers, I have a known quantity that I can
control.  Windows installers are a _joke_.

> If Linux is to compete against Windows

Linux does _not_ compete against Windows.
Windows is a distribution avenue for Microsoft.
Linux is an open solution for companies that want to avoid such.

> it must take the same attitude

What attitude?  Microsoft's attitude?
That we (like Microsoft) expect vendor to write device drivers?
If so, then there will be _few_ device drivers for Linux.

> that it needs to be as good, or at least needs to be more informative
> when an error occurs.

You can't give a hardware detection error in an installer if the
installer doesn't know how to detect the hardware.  And the only way an
installer knows what the heck a piece of hardware does is if there is a
driver telling it so!  It's a chicken-egg issue.

So no, that's not it at all.  What Linux needs is vendors to create
driver disks and include them with their products -- even if only for
select Linux distributions (just RHEL/RHD and NLD will do).  But they
won't.  That's the problem.

It has *0* to do with the installer.

E.g., the installer doesn't know what to do with an unsupported storage
device.  There is _no_ reference information on what type of storage
device it is.  It can't tell you, "this is brand X model Y" other than
what the PCI ID or other strings tell it -- hence the problem.

> I am now on CentOS 4.1 so I will leave bleeding edge Fedora behind. I did not 
> appreciate the Fedora 4 answers I got when I brought up the video card problem 
> on their forums so I will go for greater stability instead.

It's probably because of your poor assumptions like above.  No offense,
but that's probably it.

> I never wanted to be bleeding edge but Fedora 3 worked so easily I thought
> that Fedora 4 would easy to setup and use. I was wrong about that.

Fedora Core 2 was "bleeding edge."
Fedora Core 3 was an evolutionary revision after Fedora Core 2.

Fedora Core 4 is "bleeding edge."
Fedora Core 5 will be an evolutionary revision after Fedora Core 4.

The same was true prior, especially with Red Hat Linux.  Everytime Red
Hat changes things, it takes 1-2 revisions to get all the bugs worked
out.  Red Hat tends to "push the envelope."

In fact, Windows XP (NT5.1) is just an evolutionary revision from
Windows 2000 (NT5.0), which was an evolutionary revision from Windows NT
4.0.

And Windows 98 (MSDOS7.1) was an evolutionary revision from Windows 95
(MSDOS7.0/7.1) which was an evolutionary revision from MS-DOS 6.x /
Windows 3.x -- something Caldera proved in court, to complete technical
emulation (removing MSDOS7.0 from Windows 95) against Microsoft (i.e.,
so Windows 95 was illegal product bundling of DOS/Windows into one).

You're assumptions are based on other assumptions that just aren't true.

Microsoft doesn't write drivers, and doesn't even develop much of the
installer/upgrader.  And Microsoft works with superstore vendors so
anytime you upgrade just 1 -- applications, OS, PC or peripherals -- you
are forced to upgrade _all_ of the other 3 if they are more than 2-3
years "out-of-date."

In Linux, once a "core/base" hardware driver is developed, it is
perpetual for many, many products of the same "core/base".  Although the
superstore vendors regularly introduce new variants that differ
slightly, so it's always an issue, and not all are supported (just the
most popular/known).

-- 
Bryan J. Smith     b.j.smith at ieee.org     http://thebs413.blogspot.com
----------------------------------------------------------------------
The best things in life are NOT free - which is why life is easiest if
you save all the bills until you can share them with the perfect woman