On Wed, 2005-05-25 at 19:18, Bryan J. Smith wrote:
From: Les Mikesell lesmikesell@gmail.com
Agreed. But note that the standards are set long before that...
??? By "standard" what do you mean. ???
Most of the committee decisions.
Ironically, being an American company, this is where Red Hat has done a phenominal job of any "western" distro company, IMHO, of pushing _hard_ for UTF-8. Red Hat made this major shift with CL3.0 (Red Hat Linux 8.0), which then went into EL3, which was based on CL3.1 (Red Hat Linux 9). Typical bitching and moaning was present, condemnations of both Perl maintainers of 5.8 and Red Hat Linux 8.0, etc...
I guess I could narrow down my complaint here to the specific RedHat policy of not shipping version number upgrades of applications within their own distribution versions. In this instance, it is about building OS versions with options that required UTF-8 (etc.) character set support along with a perl version that didn't handle it correctly, (which I can understand because that's the best they could do at the time), then *not* providing updates to those broken distributions to perl 5.8.3+ which would have fixed them in RH 8.0 -> RHEL3.x but instead expecting users to move to the next RH/Fedora release which introduces new broken things. Maybe the problems have been fixed in recent backed-in patches to RHEL3/centos3 but I don't think so.
Or you need mysql 4.x for transactions.
Again, there is a licensing issue with MySQL 4 that MySQL AB introduced that most people, short of Red Hat, are ignoring.
But Centos 4 includes it, and I assume RHEL4. We've already covered why it isn't reasonable to run those. But why can't there be an application upgrade to 4.x on a distribution that is usable today, and one that will continue to keep itself updated with a stock 'yum update' command? I think this is just a policy issue, not based on any practical problems.
I guess my real complaint is that the bugfixes and improvements to the old releases that you are forced by the situation to run are minimal as soon as it becomes obvious that the future belongs to a different version - that may be a long time in stabilizing to a usable point.
So what's your solution?
Allow multiple version of apps in the update repositories, I think. Why can't we explictly update to an app version beyond the stock release if we want it and then have yum (etc.) track that instead of the old one? If I had the perl, mysql, and dovecot versions from centos 4 backed into centos 3, I'd be happy for a while. I know it wouldn't be horribly hard to do this myself but I really hate to break automatic updates and introduce problems that may be unique to each system.
It will be interesting to see how this works out for Ubuntu. I think it would be possible to be more aggressive with the application versions but hold back more than RH on the kernel changes.
And that's the real question. What is the "best balance"?
I'd be extremely conservative about changes that increase the chances of crashing the whole system (i.e. kernel, device drivers, etc.) and stay fairly close to the developer's version of applications that just run in user mode. Even better, make it easy to pick which version of each you want, but make the update-tracking system automatically follow what you picked. Then if you need a 2.4 kernel, perl 5.8.5 and mysql 4.1 in the same bundle you can have it.
How was the CIPE author supposed to know that what would be released as FC2 would have a changed kernel interface?
My God, I actually can't believe you could even make such a statement. At this point, it's _futile_ to even really debate you anymore, you keep talking from a total standing of "assumption" and "unfamilarity." Something a maintainer of a package would not be, unless he honestly didn't care.
I'm talking about the CIPE author, who had to be involved to write the 1.6 version not an RPM maintainer who probably couldn't have.
Fedora Core 2 (2004May) was released 6 months _after_ Linux 2.6 (2003Dec).
So how does any of this relate to the CIPE author, who didn't write CIPE for fedora and almost certainly didn't have an experimental 2.6 kernel on some unreleased distribution, knowing that CIPE wasn't going to work? On the other hand, someone involved in building FC2 must have known and I don't remember seeing any messages going to the CIPE list asking if anyone was working on it.
He did respond with 1.6 as quickly as could be expected after a released distribution didn't work with it and a user reported problems on the mailing list.
How can you blame this on distributions? Honestly, I don't see it at all!
Who else knew about the change? Do you expect every author of something that has been rpm-packaged to keep checking with Linus to see if he feels like changing kernel interfaces this month so as not to disrupt the FC release schedule?
Many drivers were actually _deprecated_ in kernel 2.6, and not built by default because people didn't come forward and take ownership. I know, the lack of the "advansys.c" SCSI driver got me good. ;->
I can understand people backing away from a changing interface.
And if you want the things that work to not change???
Then you do _not_ adopt the latest distro that just came out -- especially not an "early adopter" release. It was well known that Fedora Core 2 was changing a lot of things, just like SuSE Linux 9.0/9.1.
And, as much as you want this to not be about RH/Fedora policies, you are then stuck with something unnecessarily inconvenient because of their policy of not upgrading apps within a release.
Where do you find the real info?
Is any distro likely to autodetect at bootup in time to cleanly connect the RAID and then keep working?
When you disconnect a drive from the RAID array, there is no way for the software to assume whether or not the drive is any good anymore when it reconnects unless you _manually_ tell it (or configure it to always assume it is good).
That's not the issue - I don't expect a hot-plug to go into the raid automatically. I do want it to pair them up on a clean reboot as it would if they were both directly IDE connected. So far nothing has.
This is not a Red Hat issue either.
Isn't it? I see different behavior with knoppix and ubuntu. I think their startup order and device probing is somewhat different.
Maybe it matters that the filesystem is reiserfs - I see some bug reports about that, but rarely have problems when the internal IDE is running as a broken mirror.
Filesystem shouldn't matter, it's a Disk Label (aka Partition Table) consideration.
Separate issues - I'm able to use mdadm to add the firewire drive to the raid and it will re-sync, but if I leave the drive mounted and busy, every 2.6 kernel based distro I've tried so far will crash after several hours. I can get a copy by unmounting the partition, letting the raid resync then removing the external drive (being able to take a snapshot offsite is the main point anyway). I've seen some bug reports about reiserfs on raid that may relate to the crash problem when running with the raid active. This didn't happen under FC1 which never crashed between weekly disk swaps. There could also be some problems with my drive carriers. A firmware update on one type seems to have changed things but none of the problems are strictly reproducible so it is taking a long time to pin anything down.
There were efforts to get cipe to work -- both in FC2 and then in Fedora Extras for FC2, but they were eventually dropped because of lack of interested by the CIPE developers themselveas in getting "up-to-speed" on 2.5/2.6.
There's really only one CIPE 'developer' and I don't think he has any particular interest in any specific distributions. If anyone else was talking about it, and in any other place than the CIPE mailing list, I'm not surprised that it did not have useful results.
The real world is a complicated place. If you want to substitute a different program for an old but non-standard one you need to make sure it handles all the same things.
But what if those either conflict with standards or are broken?
You use the one that works and has a long history of working until the replacement handles all the needed operations. A committee decision isn't always the most reliable way to do something even if you follow the latest of their dozens of revisions.
Until just recently, star didn't even come close to getting incrementals right and still. unlike gnutar, requires the runs to be on filesystem boundaries for incrementals to work. And, it probably doesn't handle the options that amanda needs. Theory, meet practice.
You assume "GNU Tar" is the "standard." ;->
No, but I assume that Gnu tar will be available anywhere I need it. Given that I've compiled it under DOS, linked to both an aspi scsi driver and a tcp stack that could read/feed rsh on another machine that seems like a reasonable assumption. I can't think of anything less likely to work...
It's not, never has been, and is years late to the POSIX-2001/SUSv3 party.
So which is more important when I want to read something from my 1990's vintage tapes?
As far as not crossing filesystem boundaries, that is easily accomodated.
Maybe, maybe not. I always set up backups on filesystem boundaries anyway so I can prevent them from wandering into CD's or NFS mounts by accident, but I can imagine times when you'd want to include them and still do correct incrementals.
It wasn't so much the cloning as it was setting the IP address with their GUI tool (and I did that because when I just made the change to the /etc/sysconfig/network-scripts/ifcfg-xxx file the way that worked in earlier RH versions it sometimes mysteriously didn't work). Then I shipped the disks in their hot-swap carriers to the remote sites to be installed.
Again, I will repeat that "configuration management" is part of avoiding "professional negligence." You should have tested those carriers before shipping them out by putting them in another box.
You aren't following the scenario. The drives worked as shipped. They were running Centos 3.x which isn't supposed to have behavior-changing updates. I did a 'yum update' from the nicely-running remote boxes that didn't include a kernel and thus didn't do a reboot immediately afterwords. I normally test on a local system, then one or a few of the remotes, make sure nothing breaks, then proceed with the rest of the remotes. So, after all that, I ended up with a flock of running remote boxes that were poised to become unreachable on the next reboot. And even if I had rebooted the a local box after the corresponding update, it wouldn't have had the problem because I would have either installed that one in place or assigned the IP from its own console after swapping the disk in.
That procedure can't be uncommon.
No, it's not uncommon, I didn't say that. I'm just saying that vendors can't test for a lot of different issues.
But they could at least think about what a behavior change is likely to do in different situations, and this one is pretty obvious. If eth0 is your only network interface and you refuse to start it at bootup, remote servers that used to work become unreachable. I do understand the opposite problem that they were trying to fix where a change in kernel detection order changes the interface names and has the potential to make a DHCP server start on the wrong interface, handing out addresses that don't work. But, it's the kind of change that should have come at a version revision or along with the kernel with the detection change.
No offense, but when I ship out components to remote sites, I do my due dilligence and test for every condition I can think of. But maybe I'm anal because I ship things out to places like White Sands, Edwards, etc... and techs can't be expected debugging such concepts, so I always test with at least a partially replicated environment (which would be at least 1 change of physical system ;-).
Note that I did test everything I could, and everything I could have tested worked because the pre-shipping behavior was to include the hardware address in the /etc/sysconfig/networking/profiles/defaults/xxxx file, but to ignore it at startup. So even when I tested the cloned disks after moving to a 2nd box they worked. The 'partially replicated environment' to catch this would have had to be a local machine with it's IP set while the drive was in a different box and then rebooted after installing an update that didn't require it. I suppose if lives were at stake I might have gone that far.
If I hadn't already known about it by then or if a lot of the remote servers had been rebooted and came up with no network access it might have ruined my whole day.
I think you're assigning blame in the wrong direction. You might think that's rude and arrogant, but in reality, if you keep blaming vendors for those type of mistakes, you're going to have a lot more of them coming your way until you change that attitude. No offense. ;->
You are right, of course. I take responsibility for what happened along with credit for catching it before it caused any real downtime (which was mostly dumb luck from seeing the message on the screen because I happened to be at one of the remote locations when the first one was rebooted for another reason). Still, it gives me a queasy feeling about what to expect from vendors - and I've been burned the other direction too by not staying up the minute with updates so you can't just skip them. Hmmm, now I wonder if the code was intended to use the hardware address all along but was broken as originally shipped. It would be a bit more comforting if it was included in an update because someone thought it was a bugfix instead of someone thinking it was a good idea to change currently working behavior.