On Wed, 2005-05-25 at 22:43 -0500, Les Mikesell wrote:
I guess I could narrow down my complaint here to the specific RedHat policy of not shipping version number upgrades of applications within their own distribution versions. In this instance, it is about building OS versions with options that required UTF-8 (etc.) character set support along with a perl version that didn't handle it correctly, (which I can understand because that's the best they could do at the time), then *not* providing updates to those broken distributions to perl 5.8.3+ which would have fixed them in RH 8.0 -> RHEL3.x but instead expecting users to move to the next RH/Fedora release which introduces new broken things. Maybe the problems have been fixed in recent backed-in patches to RHEL3/centos3 but I don't think so.
The problem was a lot of CPAN programs were written for ASCII/ISO8859. The ones included with RHL/RHEL all worked fine and were tested, but I know people ran into issues with added programs.
The problem then becomes a Catch-22. Perl 5.8.3 fixed some issues in 2004, but also introduced other compatibility issues.
Fedora Core 1 did update to Perl 5.8.3 when it became available in 2004, but Red Hat has decided to stick with 5.8.0 for RHEL3. They must have had some reasons.
In a nutshell, by disabling the UTF-8 default locale, it fixes the problem with ASCII/ISO8859 Perl programs.
But Centos 4 includes it, and I assume RHEL4.
Nope, only MySQL 3.23.
We've already covered why it isn't reasonable to run those. But why can't there be an application upgrade to 4.x on a distribution that is usable today,
Definitely not! The whole reason why RHEL is very well trusted is because Red Hat sticks with a version and then backports any necessary fixes. Trust me, it actually takes Red Hat _more_work_ to do this, but they do it to ensure _exact_ functionality over the life of the product.
and one that will continue to keep itself updated with a stock 'yum update' command? I think this is just a policy issue, not based on any practical problems.
It's a policy issue, yes. And upgrading from MySQL 3.23 to MySQL 4.x would throw a massive wrench into a lot of Red Hat's SLAs.
Once Red Hat ships a package version in RHEL, unless they are unable to backport a fix, they do _not_ typically move forward. Again, SLAs, exact operation to an anal power, and _never_ "feature upgrades."
If you want that, that's what Fedora Core is for.
Allow multiple version of apps in the update repositories, I think.
Again, massive wrench into Red Hat SLAs.
Why can't we explictly update to an app version beyond the stock release if we want it and then have yum (etc.) track that instead of the old one?
SLAs.
If I had the perl, mysql, and dovecot versions from centos 4 backed into centos 3, I'd be happy for a while.
Not people who pay for RHEL with SLAs, no sir. Trust me on this, Red Hat is listening to the people who pay, and the people pay for the attention to bug fixes and that's about it. SuSE was the first to really prove this was the market, Red Hat just followed them.
I know it wouldn't be horribly hard to do this myself
Hard is not the problem. It's actually much harder to backport fixes to old versions. But Red Hat does it for a reason.
Remember, updating a system is more than just taking the latest package and building it. It's building it, running it in regression tests across a suite of systems, and _then_ shipping it. At least when you're talking about an environment where you're guaranteeing SLAs.
Unless you're like Microsoft and you ship things that re-introduce old bugs, have unforseen consequences, etc... Microsoft is notorious for "feature creep" and "historical loss" in their updates.
but I really hate to break automatic updates and introduce problems that may be unique to each system.
Exactomundo. ;->
I'd be extremely conservative about changes that increase the chances of crashing the whole system (i.e. kernel, device drivers, etc.) and stay fairly close to the developer's version of applications that just run in user mode. Even better, make it easy to pick which version of each you want, but make the update-tracking system automatically follow what you picked. Then if you need a 2.4 kernel, perl 5.8.5 and mysql 4.1 in the same bundle you can have it.
And you are now going to run a suite of regression tests with this various combinations -- remember, with each added combination, you increase the number of tests _exponentially_ -- and guarantee an X hour Service Level Agreement (SLA) on it?
In reality, what you're looking for is Fedora Core, not RHEL.
I'm talking about the CIPE author, who had to be involved to write the 1.6 version not an RPM maintainer who probably couldn't have.
Not to burst your bubble, but most Red Hat developers go beyond just being "maintainers." Many actively participate in many project developments. Red Hat used to actively include CIPE in the kernel, and test it as their standard VPN solution.
That changed in 2.6, for a number of reasons, a big one being that the other developers weren't even looking at the kernel 2.6-tests in 2003, let alone 2.6.0 on-ward once it came out in December. In reading the fall 2003 and other comments, it became pretty clear that Red Hat was extremely skeptical about even getting it to work, and if it was really worth it.
So how does any of this relate to the CIPE author, who didn't write CIPE for fedora and almost certainly didn't have an experimental 2.6 kernel on some unreleased distribution, knowing that CIPE wasn't going to work?
Excuse me? The developer didn't have to wait for a "release" distro to look at what issues where happening with kernel 2.6 -- let alone late kernel 2.5 developments or the months upon months of 2.6-test releases. For some reason you seem to believe this "scenario" is something only CIPE runs into?
There are countless kernel features and packages _external_ to the core kernel developers, and those projects _do_ "keep up" with kernel developments as they happen. But let's even assume for a moment they do not.
Debian 3.0 "Woody" and Debian 3.1 "Sarge" had kernel 2.6.0 available for download almost immediately. Various kernel 2.6-test testing in early Fedora Development showed that CIPE was totally broken for 2.6. And there are similar threads in 2003, while 2.6 was in 2.6-test, where people were talking about the lack of any CIPE compatibility.
This was _known_. Your continued insistence on saying Red Hat released 2.6 "early" is just non-sense. It was known for 9 months before it was released, months before even development started on Fedora Core 2, SuSE Linux 9.1, Mandrake Linux 10.0, etc... This is no where near a Red Hat policy, decision or otherwise "unstable" issue.
On the other hand, someone involved in building FC2 must have known and I don't remember seeing any messages going to the CIPE list asking if anyone was working on it.
Okay, I'm going to hit the CIPE archives just to see what I'm don't know ...
Hans Steegers seemed to be very aware and knowledgeable about the fact that CIPE 1.5 did not run on kernel 2.6 back in September 2003, 3 months before the final kernel 2.6.0 release. Unless I'm mistaken, he is very involved with CIPE's development.
Who else knew about the change? Do you expect every author of something that has been rpm-packaged to keep checking with Linus to see if he feels like changing kernel interfaces this month so as not to disrupt the FC release schedule?
I don't think you even understand the issue here. CIPE wasn't just made incompatible because of some "minor interface change" made in an odd-ball, interim 2.6 developer release. Kernel 2.6 was changed _massively_ from 2.4, and things like CIPE required _extensive_ re-writes! Hans knew this, as did most other people, about the same time -- Fall 2003 when the kernel 2.6-test releases were coming out!
This has absolutely *0* to do with Red Hat or any distributor, _period_!
I can understand people backing away from a changing interface.
??? I don't understand what you meant by that at all ???
And, as much as you want this to not be about RH/Fedora policies, you are then stuck with something unnecessarily inconvenient because of their policy of not upgrading apps within a release.
Fedora Core does, probably a little more so than Red Hat Linux prior.
But RHEL -- when you ship SLAs, you ship SLAs -- and you aren't upgrading features mid-release that can impact compatibility and reliability.
Period.
That's not the issue - I don't expect a hot-plug to go into the raid automatically. I do want it to pair them up on a clean reboot as it would if they were both directly IDE connected. So far nothing has.
That is _exactly_ the issue! Once you remove a disk from the volume, you have to _manually_ re-add it, even if you powered off and re- connected the drive. Once the system has booted without the drive just once, it doesn't connect it automagically.
Isn't it? I see different behavior with knoppix and ubuntu. I think their startup order and device probing is somewhat different.
Then report it to Bugzilla and use Knoppix and Ubuntu as examples. Red Hat _likes_ people to find issues and report them, and they will get fixed.
_Unless_ they don't do what Knoppix and Ubuntu do for a reason. Many times I've seen reasons not to autodetect things, and software RAID is one, depending on the conditions.
Separate issues - I'm able to use mdadm to add the firewire drive to the raid and it will re-sync, but if I leave the drive mounted and busy, every 2.6 kernel based distro I've tried so far will crash after several hours.
I've seen this issue with many other OSes as well.
I can get a copy by unmounting the partition, letting the raid resync then removing the external drive (being able to take a snapshot offsite is the main point anyway).
Once you do this, you must manually tell the system to trust it again. Otherwise, it will assume the drive was taken off-line for other reasons.
If some distros are trumping that logic and just blindly trusted it by default, then they deserve what they get from that logic -- even if it will only bite them in the ass 1 out of 20 times. I'll take the manual approach the other 19 times to avoid that 1. ;->
I've seen some bug reports about reiserfs on raid that may relate to the crash problem when running with the raid active.
Well, I'm ignorant on ReiserFS in general (I have limited experience dealing with it -- typically clean-ups and the off-line tools are never in-sync with the kernel, which seems good on its own, I'll admit), but maybe there is a race condition between ReiserFS and LVM2/MD.
This didn't happen under FC1 which never crashed between weekly disk swaps. There could also be some problems with my drive carriers.
It definitely could be a drive carrier issue. In reality, _only_ SATA (using the edge connections direction) and SCA SCSI can be trusted to properly stage transient power properly.
I typically like to use more reliable drive swapping. Again, either SCA SCSI or the newer SATA.
A firmware update on one type seems to have changed things but none of the problems are strictly reproducible so it is taking a long time to pin anything down.
Well, I wish you the best of luck.
There's really only one CIPE 'developer' and I don't think he has any particular interest in any specific distributions.
Could you _please_ explain the lack of 2.6 support until later in 2004 being a "distro-specific" issue? Red Hat, SuSE and many others just "moved on" and didn't bother to return, despite repeat attempts to get CIPE working in late 2003 through early 2004.
If anyone else was talking about it, and in any other place than the CIPE mailing list, I'm not surprised that it did not have useful results.
From what I've now read, people _were_ aware of it in fall of 2003
on-ward, and kernel 2.6-test was out, and basically no one worked on it.
You use the one that works and has a long history of working until the replacement handles all the needed operations.
I don't think you seem to understand what I just said. The standards compliant version can_not_ always handle the exact functionality of the variant from the standard.
Many times, what people think is so-called "proven" is actually quite broken. Anyone who exchanged tarballs between Linux and Solaris, Irix and other systems using GNU Tar typically ran into such issues.
POSIX compliance exists for a reason. GNU Tar, among many other Linux utilities, have deviated over the years. Things must break to bring back that deviation to standard.
I think the LibC4/5 forks and the return to GLibC 2 was a perfect example. And it doesn't take a rocket scientist to realize why GNU gave the reins to Cygnus (now Red Hat) on GCC 3 because GCC 2's C++ was quite the wasteland.
A committee decision isn't always the most reliable way to do something even if you follow the latest of their dozens of revisions.
I don't think you realize that many times it's not a 'committee decision' that cause the problem in the first place. Sometimes Linux utilities are just a bit too "eccentric" or introduce their own "extensions."
No, but I assume that Gnu tar will be available anywhere I need it.
On Linux, yes. The problem is that it doesn't interact well with other systems in many cases.
Given that I've compiled it under DOS, linked to both an aspi scsi driver and a tcp stack that could read/feed rsh on another machine that seems like a reasonable assumption. I can't think of anything less likely to work...
Unfortunately GNU Tar doesn't exactly handle its own extensions well on different platforms. ;->
So which is more important when I want to read something from my 1990's vintage tapes?
If GNU Tar even reads some of them! You should read up on GNU Tar. ;->
Maybe, maybe not. I always set up backups on filesystem boundaries anyway so I can prevent them from wandering into CD's or NFS mounts by accident, but I can imagine times when you'd want to include them and still do correct incrementals.
There are some defaults that are just dangerous. That's one of them.
You aren't following the scenario. The drives worked as shipped. They were running Centos 3.x which isn't supposed to have behavior-changing updates. I did a 'yum update' from the nicely-running remote boxes that didn't include a kernel and thus didn't do a reboot immediately afterwords.
You should have tested this in-house _first_.
I normally test on a local system, then one or a few of the remotes, make sure nothing breaks, then proceed with the rest of the remotes. So, after all that, I ended up with a flock of running remote boxes that were poised to become unreachable on the next reboot.
Again, you should have tested all this in-house _first_.
And even if I had rebooted the a local box after the corresponding update, it wouldn't have had the problem because I would have either installed that one in place or assigned the IP from its own console after swapping the disk in.
But had you followed such a procedure, you would have discovered it.
But they could at least think about what a behavior change is likely to do in different situations, and this one is pretty obvious. If eth0 is your only network interface and you refuse to start it at bootup, remote servers that used to work become unreachable.
You might want that in the case where you want only a specific hardware address to access the network.
I will re-iterate, there are things in "Common Criteria" standardization that is affecting both RHEL and SLES.
I do understand the opposite problem that they were trying to fix where a change in kernel detection order changes the interface names and has the potential to make a DHCP server start on the wrong interface, handing out addresses that don't work. But, it's the kind of change that should have come at a version revision or along with the kernel with the detection change.
Maybe so. But I'm still waiting on you to detail when this change was, in fact, made. So far, I'm just going on your comments that you merely yum'd the updates from the proper repository.
Note that I did test everything I could, and everything I could have tested worked because the pre-shipping behavior was to include the hardware address in the /etc/sysconfig/networking/profiles/defaults/xxxx file, but to ignore it at startup.
Ahhh, now we're getting to it! After you did a "yum update", did you check for any ".rpmsave" files?
So even when I tested the cloned disks after moving to a 2nd box they worked. The 'partially replicated environment' to catch this would have had to be a local machine with it's IP set while the drive was in a different box and then rebooted after installing an update that didn't require it. I suppose if lives were at stake I might have gone that far.
Maybe I've just been in too many environments where that's the deal, yes. And even when it's not lives, it's an "one shot deal" and I don't get a 2nd chance.
E.g., people complain about bugs in semiconductor designs, yet semiconductors aren't something like software where you build, run it, and know you've got bugs in 6-8 minutes. You have to go to layout, then fab it and then you get it back -- some 6-8 _weeks_ later if you're a major company (possibly 6-8 months if you're not).
So I tend to err on the side of making sure my formal testing is actually well thought out.
You are right, of course. I take responsibility for what happened along with credit for catching it before it caused any real downtime (which was mostly dumb luck from seeing the message on the screen because I happened to be at one of the remote locations when the first one was rebooted for another reason).
And that's good. If you're going to make a mistake, at least do it on a minimal number of systems. I've seen far too many people assume something will work and push it out to all.
Still, it gives me a queasy feeling about what to expect from vendors - and I've been burned the other direction too by not staying up the minute with updates so you can't just skip them.
If you want to make a comparison of the Linux world to any other, at least the "worst" Linux vendors are still better at patching than any other OS.
Hmmm, now I wonder if the code was intended to use the hardware address all along but was broken as originally shipped. It would be a bit more comforting if it was included in an update because someone thought it was a bugfix instead of someone thinking it was a good idea to change currently working behavior.
It might be that it was disabled -- possibly by yourself during config. But then an update changed that. Again, doing a: find / -name *.rpmsave
Is almost a mandatory step for myself anytime I upgrade. RPM is very good at dumping out those files when it can't use an existing config file or script that has been modified.