[CentOS] Re: Demonizing generic Linux issues as Fedora Core-only issues -- WAS: Hi, Bryan

Thu May 26 00:18:50 UTC 2005

From: Les Mikesell <lesmikesell at gmail.com>
> Agreed.  But note that the standards are set long before that...

???  By "standard" what do you mean.  ???

Linux's history has been notorious for variance from ANSI, NIST, POSIX
and GNU standards.  Yes, far less than Microsoft and even some UNIX
vendors, but there are still major issues today with this.

When developers try to change things for standards compliance, they
get reamed by all the people who want backward compatibility instead.
That's why companies like Red Hat and SuSE try to not do it for several
versions until it comes to a head (typically future adoption required).

> If you are doing anything complicated, I'm not sure you have any
> other choice. But then you run into ugliness where you want to
> run perl programs that need perl=>5.8.3 to get character set handling
> right, and you can't, or you have to do a local install trying not
> to break the system version.

Internationalization is a reality and we English speaking Americans,
as well as Westerners in general, can't go around complaining when a
program really _does_ need to get off its American/Western-only duff.
That means getting away from ASCII as well as ISO8859.

In reality, there is a massive set of issues in the UNIX space where there
are no less than 15 common different interpretations of byte order,
endianess and organization just for 4-byte ISO character sets.
Pretty much all standards-based platforms are moving to UTF-8 because
it solves the multi-byte organizational issues.  ASCII is still 7-bit (1 byte),
while any 4-byte ISO character set can be accomodated in 2-6 bytes total.
And as 1-6 streamed bytes, the common endianness is network order
(big endian), no multi-byte endianness/order issues.

Ironically, being an American company, this is where Red Hat has done
a phenominal job of any "western" distro company, IMHO, of pushing
_hard_ for UTF-8.  Red Hat made this major shift with CL3.0 (Red Hat
Linux 8.0), which then went into EL3, which was based on  CL3.1 (Red
Hat Linux 9).  Typical bitching and moaning was present, condemnations
of both Perl maintainers of 5.8 and Red Hat Linux 8.0, etc...

More on the GUI front, I think Java and .NET/Mono have the right idea,
ISO10464 Unicode everything.

> Or you need mysql 4.x for transactions.

Again, there is a licensing issue with MySQL 4 that MySQL AB introduced
that most people, short of Red Hat, are ignoring.  MySQL AB is GPL,
_not_ LGPL or BSD, so you have to be very careful what you link to it.

I see that Fedora Core 4 Test does adopt MySQL 4 though.  I need to
read up on what Red Hat did, or is possibly excluding to make MySQL AB
happy (unless they changed their stance on some static linking) to find
out more.

> I guess my real complaint is that the bugfixes and improvements to
> the old releases that you are forced by the situation to run are
> minimal as soon as it becomes obvious that the future belongs to
> a different version - that may be a long time in stabilizing to
> a usable point. 

So what's your solution?

I'm still waiting for someone to show me one better than the:
  2-2-2 -> 6-6-6 model

In fact, the full model is really:  
  2-2-2 -> 6-6-6 -> 18-18-18 model

Whereby Red Hat and SuSE maintain up to 3 simultaneous versions of
their 18 month distros.  [ Actually, SuSE is dropping the first "enterprise"
release, SLES 7 just shy of 5 years. ]

So if you are talking about stability/maturity, then run RHEL2.1 or now.
Heck, Fedora Legacy is _still_ supporting CL2.3 (RHL7.3) too!  ;->

> It will be interesting to see how this works out for Ubuntu.  I think it
> would be possible to be more aggressive with the application versions
> but hold back more than RH on the kernel changes.

And that's the real question.  What is the "best balance"?

> Of course they'll have it easy for the first few since they don't have to
> be backwards compatible to a huge installed base.

Exactomundo!  ;->

In reality, Red Hat has probably the longest "backward compatible" run
of any vendor.  Why?  Because they adopt things early -- like GLibC 2,
GCC 3, NPTL, etc...

Heck, pretty much everything released for CL/EL2 (GLibC 2.2,
GCC 2.96/3.0 -- RHL7.x/RHEL2.1) still runs on my latest Fedora Core 4
Test systems.

> How was the CIPE author supposed to know that what would be
> released as FC2 would have a changed kernel interface?

My God, I actually can't believe you could even make such a statement.
At this point, it's _futile_ to even really debate you anymore, you keep
talking from a total standing of "assumption" and "unfamilarity." 
Something a maintainer of a package would not be, unless he honestly
didn't care.

Fedora Core 2 (2004May) was released 6 months _after_ Linux 2.6 (2003Dec).

Fedora Core 1 -- fka Red Hat Linux 10 -- had just been released (2003Nov)
before kernel 2.6.

Within 2 months, Rawhide/Development had Fedora Core 2's intended kernel
-- 2.6.  There was absolutely _no_question_ what it was going to be.
I know, I was checking out the packages on Fedora Development -- saw
2.6.1/2.6.2.  I believe 2.6.3 was the first one used in Fedora Core 2 Test 1 in
2004Mar. 

Red Hat's Rawhide, now Fedora Development (although many Red Hat
employees still call it "Rawhide" ;-), has been around since the Red Hat
Linux 5.x timeframe.  It's the packages that are being built for the next
revision -- be it same version or a major version change.  Rawhide/Dev
is the "package testing," so you can see what packages they are looking
at.  Beta/Test is the "regression testing" of the entire release as a whole
(packages against packages testing in all).

Again, the 2-2-2 model.  There is at least 4 months before release when
you can find out pretty much _anything_ that's going to be in the distro
for a good certainty.

Pretty much every major distro -- Red Hat, SuSE, etc... adopted Linux 2.6
within 6 months.  SuSE even released SuSE Linux 9.0 before 2.6 was
released, but with a lot of the backports (like Red Hat) in prepartion for
2.6 in SL9.1.

> He did respond with 1.6 as quickly as could be expected after a released
> distribution didn't work with it and a user reported problems on the mailing
> list.

How can you blame this on distributions?  Honestly, I don't see it at all!
There were months and months of pre-release 2.6 releases in 2003.
I saw the comments that many things weren't working (including CIPE).
There were 6 months between Linux 2.6's release and FC2.  Heck, I believe
SL9.1 came out with it before that.

> About the kernel, or all of the drivers' authors that assumed that
> kernel interfaces shouldn't change?

Many drivers were actually _deprecated_ in kernel 2.6, and not built by
default because people didn't come forward and take ownership.  I know,
the lack of the "advansys.c" SCSI driver got me good.  ;->

But who did I blame?  Myself for not volunteering!  At some point,
when there's lack of interest, it typically means there are not enough
people interested to hold up everything.

> And if you want the things that work to not change???

Then you do _not_ adopt the latest distro that just came out -- especially
not an "early adopter" release.  It was well known that Fedora Core 2
was changing a lot of things, just like SuSE Linux 9.0/9.1.

[ SIDE NOTE:  Again, I have stated that I am _disappointed_ that Red Hat
does not use revisioning to designation something like Fedora Core 2
as a ".0" revision.  But it was well known it was going to be kernel 2.6
4 months before its release. ]

> Where do you find the real info?

It takes me 5 minutes to figure out what Red Hat's up to.  Red Hat is
pretty explicit on what their plans are, and the packages are out there
for all to see during Development, even _before_ the Test hits (Development
is about 4 months before release).

Heck, when their Test 1 hits (a good 2+ months before release), they go
over _all_ the major changes in the "Release Notes."  SuSE and even
Mandrake are similar too.

> Specific example: I'm trying to make a software RAID1 partition with
> one internal IDE and a matching external drive in a firewire case work.
> FC1 worked flawlessly once the raid was established but would not
> autodetect the drive, either when hotplugged or at bootup.  Everything
> later that I've tried is worse. Some get the autodetect right on a
> hotplug but crash after some amount of runtime.
> Is any distro likely to autodetect at bootup in time to cleanly connect
> the RAID and then keep working?

When you disconnect a drive from the RAID array, there is no way
for the software to assume whether or not the drive is any good anymore
when it reconnects unless you _manually_ tell it (or configure it to always
assume it is good).

This is not a Red Hat issue either.

And LVM2 is still going through some maturity with RAID-1 and snapshots.

> Maybe it matters that the filesystem is reiserfs - I see some bug
> reports about that, but rarely have problems when the internal IDE
> is running as a broken mirror.

Filesystem shouldn't matter, it's a Disk Label (aka Partition Table)
consideration.

> Like letting it be a surprise to the CIPE author that it didn't work
> at release time.  I don't remember seeing a hint about this on the
> CIPE mail list when someone must have known well ahead that it was
> coming.

More assumptions.  CIPE has been breaking in 2.6 since the very early
2.5.2+ developments.  I'm sure the author was waiting until late in 2.5 (say
near 2.5.70+) as it came closer to 2.6.  I saw reports on all sorts of
network code breakage the second it hit in 2003Dec -- 6 months before.

In looking through Fedora Development, I note lots of references to all sorts
of issues with the 2.6-test releases as of summer of 2004 (over 9 months
before FC2 release, months before kernel 2.6's official release).  There were
efforts to get cipe to work -- both in FC2 and then in Fedora Extras for FC2,
but they were eventually dropped because of lack of interested by the
CIPE developers themselveas in getting "up-to-speed" on 2.5/2.6.

The first hints of recommendations to go IPSec instead were that fall.
By March of 2004, I see the last comments to not even bother with CIPE
for Fedora Core 2 (as well as SuSE Linux 9.1) were made from several.

> Buggy. I don't mean RH bugs, I mean bugs from the upstream packages
> that got their first huge real world exposure because RH bundled them
> on a CD that you could drop into just about any PC and have a working
> system.  I mean things like the bind, sendmail, and ssh versions that
> went out with RH 4.0 and were installed on more machines than ever
> before - all with remote exploits that weren't known yet.

Yes, Linux is not free from vunerabilities.  This is not new.

Either you pay for a subscription (and optional SLA) to an "enterprise"
distro that maintains 5+ years of support, or you rely on community
projects (like Fedora Legacy) that attempt to do the same.  Fedora
Legacy is still supporting CL2.3 (Red Hat Linux 7.3), even though they
dropped CL3.0 (Red Hat Linux 8.0) a long time ago because CL3.1/3.2
(Red Hat Linux 9 / Fedora Core 1) exist.

> The real world is a complicated place.  If you want to substitute
> a different program for an old but non-standard one you need to
> make sure it handles all the same things.

But what if those either conflict with standards or are broken?

> Until just recently, star didn't even come close to getting incrementals
> right and still. unlike gnutar, requires the runs to be on filesystem
> boundaries for incrementals to work.  And, it probably doesn't handle
> the options that amanda needs.  Theory, meet practice.

You assume "GNU Tar" is the "standard."  ;->
It's not, never has been, and is years late to the POSIX-2001/SUSv3
party.

As far as not crossing filesystem boundaries, that is easily accomodated.

> By 'first', I meant before RH 4.x, which in my opinion really drove
> the popularity of Linux simply because it had a decent installer
> that came up working on most hardware at the time.  Freebsd was
> around at the time but in a form that was much more difficult
> for a new user to get working.

> That depends on when you started.  Before SP6, NT was also unreliable.

Hell of a lot better than "Chicago," but yes, I agree.

> Remember that in the timeframe of RH4, you could kill an NT box
> by sending it an oversized ping packet - and have a fair chance of
> corrupting the filesystem beyond repair from the ungraceful shutdown.

Oh, you could hang Linux in various ways too, just not as many.

> It wasn't so much the cloning as it was setting the IP address
> with their GUI tool (and I did that because when I just made
> the change to the /etc/sysconfig/network-scripts/ifcfg-xxx file
> the way that worked in earlier RH versions it sometimes mysteriously
> didn't work).  Then I shipped the disks in their hot-swap carriers to
> the remote sites to be installed.

Again, I will repeat that "configuration management" is part of avoiding
"professional negligence."  You should have tested those carriers before
shipping them out by putting them in another box.

> That procedure can't be uncommon.

No, it's not uncommon, I didn't say that.  I'm just saying that vendors
can't test for a lot of different issues.

Don't get me started on the design of NTFS and the Registry SAM-SID
issues.  The current "hack" is to either put everything in a domain, or
use Dynamic Discs to store stuff outside of a NTFS filesystem because
you can _never_ safely access a NTFS filesystem _except_ for the NT
installation that created it (not even with another install of the same
NT version). 

> And the same thing would happen if you swapped a NIC card -
> hmmm, I wonder about PCMCIA cards now.

Yes, maybe that has something to do with it, eh?  ;->

For example, maybe when you want to switch between an unclassified
and classified network, you use a different card (and hard drive), which
will connect to a different subnet.  ;->

I'm sure some changes have been for "Common Criteria" compliance.
I've seen similiar on SuSE as well.  They might have been made mid-update.

> I'm not sure exactly when it happened because it was an update that
> did not include a new kernel so the machines weren't rebooted
> immediately.  I think the ones where I noticed it first were somewhere
> in the Centos 3.4 range with some subsequent updates (maybe a month
> or so ago).

I'll check it out then.

> I think this was just an update without a version number change.  As it
> turned out, I saw the screen of one that was rebooted at a nearby site
> and knew more or less what happened but before I got around to fixing
> them all a machine crashed at a distant site and I had to talk someone
> who didn't know vi through finding the file and removing the hardware
> address entry.

No offense, but when I ship out components to remote sites, I do my
due dilligence and test for every condition I can think of.  But maybe
I'm anal because I ship things out to places like White Sands, Edwards,
etc... and techs can't be expected debugging such concepts, so
I always test with at least a partially replicated environment (which
would be at least 1 change of physical system ;-).

> If I hadn't already known about it by then or if a lot of the remote
> servers had been rebooted and came up with no network
> access it might have ruined my whole day.

I think you're assigning blame in the wrong direction.  You might think
that's rude and arrogant, but in reality, if you keep blaming vendors for
those type of mistakes, you're going to have a lot more of them coming
your way until you change that attitude.  No offense.  ;->

--
Bryan J. Smith   mailto:b.j.smith at ieee.org