On Thu, 2005-05-26 at 01:20, Bryan J. Smith wrote:
In this instance, it is about building OS versions with options that required UTF-8 (etc.) character set support along with a perl version that didn't handle it correctly,
In a nutshell, by disabling the UTF-8 default locale, it fixes the problem with ASCII/ISO8859 Perl programs.
No, that only fixes the problem of not being able to handle the default character set. It does not make explicit conversions work correctly as needed by Mime-tools, etc. Maybe this doesn't fit RedHat's definition of a bug, but it is still broken behavior, fixed in the upstream version that they only provide if you do a full disto change.
We've already covered why it isn't reasonable to run those. But why can't there be an application upgrade to 4.x on a distribution that is usable today,
Definitely not! The whole reason why RHEL is very well trusted is because Red Hat sticks with a version and then backports any necessary fixes. Trust me, it actually takes Red Hat _more_work_ to do this, but they do it to ensure _exact_ functionality over the life of the product.
If you believe that, you have to believe that Red Hat's programmers are always better than the original upstream program author. I'll agree that they are good and on the average do a good job, but that stops far short of saying that they know better than the perl (etc.) teams what version you should be running.
Once Red Hat ships a package version in RHEL, unless they are unable to backport a fix, they do _not_ typically move forward. Again, SLAs, exact operation to an anal power, and _never_ "feature upgrades."
If you want that, that's what Fedora Core is for.
So, you want a working application, take an incomplete kernel. I understand that's the way things are. I don't understand why you like it.
Allow multiple version of apps in the update repositories, I think.
Again, massive wrench into Red Hat SLAs.
Why can't we explictly update to an app version beyond the stock release if we want it and then have yum (etc.) track that instead of the old one?
SLAs.
OK, that limits what RedHat might offer. We are sort-of talking about Centos here as well as how other distibutions might be better. Is there a reason that a Centos or third-party repository could not be arranged such that an explicit upgrade could be requested to a current version which would then be tracked like your kernel-xxx-version is when you select smp/hugemem/unsupported?
Unless you're like Microsoft and you ship things that re-introduce old bugs, have unforseen consequences, etc... Microsoft is notorious for "feature creep" and "historical loss" in their updates.
Realistically you are just substituting a different set of people making different compromises.
I'd be extremely conservative about changes that increase the chances of crashing the whole system (i.e. kernel, device drivers, etc.) and stay fairly close to the developer's version of applications that just run in user mode. Even better, make it easy to pick which version of each you want, but make the update-tracking system automatically follow what you picked. Then if you need a 2.4 kernel, perl 5.8.5 and mysql 4.1 in the same bundle you can have it.
And you are now going to run a suite of regression tests with this various combinations -- remember, with each added combination, you increase the number of tests _exponentially_ -- and guarantee an X hour Service Level Agreement (SLA) on it?
There are times when you want predictable behavior, and times when you want correct behavior. When an upstream app makes changes that provide correct behavior but you are ensured of the old buggy behavior as a matter of policy, something is wrong.
In reality, what you're looking for is Fedora Core, not RHEL.
Well, FC1 seems like the only way to get the specific mix of working kernel and apps for certain things right now, but it is by definition a dead end - and not really up to date on the app side either.
I'm talking about the CIPE author, who had to be involved to write the 1.6 version not an RPM maintainer who probably couldn't have.
Not to burst your bubble, but most Red Hat developers go beyond just being "maintainers." Many actively participate in many project developments. Red Hat used to actively include CIPE in the kernel, and test it as their standard VPN solution.
Hence my surprise at their change of direction.
Excuse me? The developer didn't have to wait for a "release" distro to look at what issues where happening with kernel 2.6 -- let alone late kernel 2.5 developments or the months upon months of 2.6-test releases. For some reason you seem to believe this "scenario" is something only CIPE runs into?
It's something a decision to change kernels runs into. The CIPE author didn't make that decision.
Various kernel 2.6-test testing in early Fedora Development showed that CIPE was totally broken for 2.6. And there are similar threads in 2003, while 2.6 was in 2.6-test, where people were talking about the lack of any CIPE compatibility.
I just don't remember seeing any discussion of this on the CIPE mailing list which is the only place it might have been resolved.
I don't think you even understand the issue here. CIPE wasn't just made incompatible because of some "minor interface change" made in an odd-ball, interim 2.6 developer release. Kernel 2.6 was changed _massively_ from 2.4, and things like CIPE required _extensive_ re-writes! Hans knew this, as did most other people, about the same time -- Fall 2003 when the kernel 2.6-test releases were coming out!
I don't see how anyone but Olaf Titz could have made the necessary changes, and I don't see why he would have done so with appropriate timing for the FC2 release unless someone involved in the release made him aware of the planned changes.
This has absolutely *0* to do with Red Hat or any distributor, _period_!
The distribution decided to change the kernel version and you don't see how this affects the usability of included packages - or the need to coordinate such changes with the authors of said packages?
I can understand people backing away from a changing interface.
??? I don't understand what you meant by that at all ???
An interface is supposed to be a form of contract among programmers that is not changed. Linus has consistently refused to freeze his interfaces, hence the lack of binary driver support from device vendors, and frankly I'm surprised at the the number of open source developers that have continued to track the moving target. How interesting can it be to write the same device driver for the third time for the same OS?
And, as much as you want this to not be about RH/Fedora policies, you are then stuck with something unnecessarily inconvenient because of their policy of not upgrading apps within a release.
Fedora Core does, probably a little more so than Red Hat Linux prior.
How much change is going to happen in the lifetime of an FC release?
[back to firewire/raid]
That's not the issue - I don't expect a hot-plug to go into the raid automatically. I do want it to pair them up on a clean reboot as it would if they were both directly IDE connected. So far nothing has.
That is _exactly_ the issue! Once you remove a disk from the volume, you have to _manually_ re-add it, even if you powered off and re- connected the drive. Once the system has booted without the drive just once, it doesn't connect it automagically.
No, that isn't the issue on a simple reboot. A drive that is connected when you go down cleanly and is still connected when you restart shouldn't be handled differently just because there is a different type of wire connecting it.
Separate issues - I'm able to use mdadm to add the firewire drive to the raid and it will re-sync, but if I leave the drive mounted and busy, every 2.6 kernel based distro I've tried so far will crash after several hours.
I've seen this issue with many other OSes as well.
It didn't happen with FC1 on the same box/same drives.
I can get a copy by unmounting the partition, letting the raid resync then removing the external drive (being able to take a snapshot offsite is the main point anyway).
Once you do this, you must manually tell the system to trust it again. Otherwise, it will assume the drive was taken off-line for other reasons.
Agreed - I expect to have to mdadm --add and have a resync if I've done a --fail or --remove, or the hardware is disconnected.
I've seen some bug reports about reiserfs on raid that may relate to the crash problem when running with the raid active.
Well, I'm ignorant on ReiserFS in general (I have limited experience dealing with it -- typically clean-ups and the off-line tools are never in-sync with the kernel, which seems good on its own, I'll admit), but maybe there is a race condition between ReiserFS and LVM2/MD.
Actually, I think there may be a really horrible race condition built into any journaled file system that counts on ordered writes and the software raid level that doesn't guarantee that across the mirrors which may be working at different speeds, handling error retries independently, etc. But nobody seems to be talking much about it...
This didn't happen under FC1 which never crashed between weekly disk swaps. There could also be some problems with my drive carriers.
It definitely could be a drive carrier issue. In reality, _only_ SATA (using the edge connections direction) and SCA SCSI can be trusted to properly stage transient power properly.
I typically like to use more reliable drive swapping. Again, either SCA SCSI or the newer SATA.
Ummm, great. When I started doing this with FC1, SATA mostly didn't work and firewire did, except you had to modprobe it manually and tell it about new devices. These are 250 gig drives and I have 3 externals for offsite rotation, so I can't afford scsi.
A firmware update on one type seems to have changed things but none of the problems are strictly reproducible so it is taking a long time to pin anything down.
Well, I wish you the best of luck.
Today it is running with the mirroring on under FC3, but I don't know if anything is really different yet. There has been a recent kernel update, I've updated firmware on this carrier, and run some diagnostics to fix drive errors that might have been caused by the earlier firmware or kernels. The funny thing is that I started doing this because I thought working with disks would be easier than tapes... But it is nice to be able to plug the drive carrier into my laptop's usb and be able to restore anything instantly (the drive case does both usb and firewire).
[...]
You use the one that works and has a long history of working until the replacement handles all the needed operations.
I don't think you seem to understand what I just said. The standards compliant version can_not_ always handle the exact functionality of the variant from the standard.
Yes, when a standard is changed late in the game, that is to be expected. People will already have existing solutions and can only move away so fast - especially with formats of archived data.
Many times, what people think is so-called "proven" is actually quite broken. Anyone who exchanged tarballs between Linux and Solaris, Irix and other systems using GNU Tar typically ran into such issues.
An issue easily resolved by compiling GNU tar for the target system.
POSIX compliance exists for a reason. GNU Tar, among many other Linux utilities, have deviated over the years. Things must break to bring back that deviation to standard.
POSIX is the thing that changed here. And GNU tar has nothing to do with Linux other than being included in some distributions that also include a Linux kernel. I'm too lazy to look up the availability dates but I used GNUtar myself long before Linux. I agree that forward-looking, the current POSIX spec is useful, but the 'a' in tar is about archives that exist from a time when it wasn't.
So which is more important when I want to read something from my 1990's vintage tapes?
If GNU Tar even reads some of them! You should read up on GNU Tar. ;->
If you are reading the star author's comments, try to duplicate the situation yourself. The worst-case issue with GNU tar is that you have to repeat a restore of an incremental to get back a directory that was created between a full and incremental with the same name that an ordinary file had at the time of the full (or maybe that's backwards - at least your data is all there and you can restore it). For several years while the star author was posting this, star would have completely missed copying many changed files in an incremental. He's done some work in the last few months that probably fixes it but I doubt if that is in current distributions yet.
Here's the real test that you should try if you are even thinking about trusting incrementals: Make a full run of a machine with nearly full filesystems. Delete a bunch of files, add enough new ones that the old/new total would not fit. Rename some directories that contain old files. Make an incremental. Repeat if you plan multi-level incrementals. Restore the full and subsequent incremental(s) to bare metal. If you get a working machine with exactly the same files in the same places including your old files under the directories with new names, your plan will work. GNUtar gets all of this right with the --listed-incremental form at least from the mid-90's through recent distros that don't need magic file attributes to work (i.e. it might not do everything SELinux expects). And amanda depends on this behavior.
You aren't following the scenario. The drives worked as shipped. They were running Centos 3.x which isn't supposed to have behavior-changing updates. I did a 'yum update' from the nicely-running remote boxes that didn't include a kernel and thus didn't do a reboot immediately afterwords.
You should have tested this in-house _first_.
I did. It worked.
I normally test on a local system, then one or a few of the remotes, make sure nothing breaks, then proceed with the rest of the remotes. So, after all that, I ended up with a flock of running remote boxes that were poised to become unreachable on the next reboot.
Again, you should have tested all this in-house _first_.
I did. It worked.
And even if I had rebooted the a local box after the corresponding update, it wouldn't have had the problem because I would have either installed that one in place or assigned the IP from its own console after swapping the disk in.
But had you followed such a procedure, you would have discovered it.
Actually, in retrospect, the funny part is that one of the main reasons for cloning the disks in the first place was so that I'd be testing a bit-for-bit duplicate of what was in production.
But they could at least think about what a behavior change is likely to do in different situations, and this one is pretty obvious. If eth0 is your only network interface and you refuse to start it at bootup, remote servers that used to work become unreachable.
You might want that in the case where you want only a specific hardware address to access the network.
Perhaps, but do you really think I'd change my mind about that well after the machines were deployed?
Maybe so. But I'm still waiting on you to detail when this change was, in fact, made. So far, I'm just going on your comments that you merely yum'd the updates from the proper repository.
That's because I didn't do a reboot along with the update, so it could have been any of several runs. It pretty much had to be from initscripts-7.31.18.EL-1.centos.1.i386.rpm which I see is dated April 18 in my download cache. How should I associate this with RHEL3/Centos3 revisions to describe it?
Note that I did test everything I could, and everything I could have tested worked because the pre-shipping behavior was to include the hardware address in the /etc/sysconfig/networking/profiles/defaults/xxxx file, but to ignore it at startup.
Ahhh, now we're getting to it! After you did a "yum update", did you check for any ".rpmsave" files?
No, none of my configs changed, just the way they were handled after the initscript revision.
Maybe I've just been in too many environments where that's the deal, yes. And even when it's not lives, it's an "one shot deal" and I don't get a 2nd chance.
I'll admit to being a little sloppy because the boxes are behind a load balancer and I know I can lose one in production with serious problems. But, an *exact* copy here didn't show any problem, the updated remote machine didn't show any problem while still running. Everything looked like a go... I suppose I should have known that a new initscripts package could break booting, but RHEL3/Centos had a decent track record about that sort of thing so far.
It might be that it was disabled -- possibly by yourself during config. But then an update changed that. Again, doing a: find / -name *.rpmsave
Is almost a mandatory step for myself anytime I upgrade. RPM is very good at dumping out those files when it can't use an existing config file or script that has been modified.
None of the above. When I get a chance I'll compare the old/new version of the ifup steps to see if ignoring on a MAC mismatch was a new addition or if they fixed a broken comparison in the original. I'll feel much better about the whole thing if the check was there all along and they thought this was just a bugfix. Still, I have to wonder how many RH/Centos machines are out there in the same situation (IP set with redhat-config-network, then the disk or NIC moved, then a post April 18 update) just waiting to disappear from the network on the next reboot. It would also be interesting to see how RH support would respond when called about an unreachable box, but being a cheapskate running Centos, I wouldn't know.