OK ... there has been an awful lot of attributing problems to the 4.4 upgrade that are not really upgrade problems.
The best I can tell there is really only one major problem:
1. You need to install python-sqlite before sqlite (or at nearly the same time).
To accomplish this you can do this:
yum update python-sqlite sqlite
Then after that, a normal yum update works fine.
There are 2 other potential problems
2. There are other (hardware specific issues) concerning the 2.6.9-42 (or 42.0.2) kernel and some users. This is going to happen every kernel upgrade cycle.
3. There are sometimes futex issues ... that is a generic error that can be caused by many things, usually it is a somehow messed up rpm database with double RPMS installed or another problem. Maybe sometimes it just happens, it has been so since at least RH8.
Other than that, I don't see any other problems. (Except that some people have more than 1 version of an RPM in their rpm database ... and when they try to update now, there is a version conflict.)
I have now upgraded 30 production servers and several dozen workstations and have had no problems at all.
Some people are creating their own issues by doing piecemeal upgrades ... that is to be expected.
CentOS does not control how the upstream provider releases its product (there are 195 packages updated, for example), nor do we control when the release stuff or what they build it on. I can tell you, however that some things do not work with older releases ... though they are not necessarily listed as requirements. An example is audit and the audit libs. They don't work properly with all released kernels.
A CentOS-4.x update is CentOS-4 ... CentOS-4.4 is a point in time snapshot ... nothing more. Installing CentOS-4.2 (as an example) and only doing security updates after that is not tested as working or recommended (by CentOS or by the upstream provider).
As Lance said, in the future the upstream provider is going to create different channels for different snapshots and when they do, we will as well (4.5.0, 4.5.1, 4.5.2, etc.). Until that time (just like upstream), there is only a authorized CentOS-4 version ... and it is CentOS-4.x installed and all the updates applied.
----------------------------------------------
There was also a comment that these issues on not on the upstream mailing list so they must be happening only on CentOS. This is also not true. The upstream provider tells everyone to "Handle it via Support Calls" and not the Mailing List ... and sometimes not even via the public bugzilla. As an example see this:
https://www.redhat.com/archives/anaconda-devel-list/2006-August/msg00073.htm...
Then they create proprietary (non public) bugzilla entries to track and fix issues. That is their prerogative, however because of that not all fixes are done in public view.
We try to move these things also into the public view (again ... in the above example):
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202446
To assume that just because something is not on the upstream mailing list or upstream public bugzilla that it is not a known upstream issue and being worked upstream is incorrect.
The python-sqlite issue is not an upstream issue ... as they don't use yum or sqlite ... however all other issues I have seen are.
Thanks, Johnny Hughes
At 7:46 AM -0500 9/5/06, Johnny Hughes wrote:
I have now upgraded 30 production servers and several dozen workstations and have had no problems at all.
What is your typical hardware setup so we can have a good comparison guide here. Intel? AMD?
Very good post BTW.
On Tue, 2006-09-05 at 08:57 -0400, Alex Pilson wrote:
At 7:46 AM -0500 9/5/06, Johnny Hughes wrote:
I have now upgraded 30 production servers and several dozen workstations and have had no problems at all.
What is your typical hardware setup so we can have a good comparison guide here. Intel? AMD?
Most of the servers I have upgraded are Dell (1450 / 1850 / 2650) with intel p4 or p4 xeon.
Most of the workstations are also dell dimension with intel.
I have also upgraded a couple Opteron servers (sun fire V40z and V20z) and a couple AMD 64 3000-3400+ generic machines.
Alex Pilson wrote:
At 7:46 AM -0500 9/5/06, Johnny Hughes wrote:
I have now upgraded 30 production servers and several dozen workstations and have had no problems at all.
What is your typical hardware setup so we can have a good comparison guide here. Intel? AMD?
Very good post BTW.
I'll echo that critique of Johnny's post and add that I have been remiss in failing to follow up on a thread I opened several days ago regarding kernel 2.6.9-42.0.2.EL, an Asus A7N8X m/b and ntpd. Booting with "acpi=off" made ntpd work just fine, which made me think a bit. Shame on me! There was a BIOS update available that I "just never got around to installing". With the BIOS updated, I removed the kernel boot argument and ntpd still works properly.
Robert wrote:
on me! There was a BIOS update available that I "just never got around to installing". With the BIOS updated, I removed the kernel boot argument and ntpd still works properly.
Maybe they are tightening up ACPI parsing to be more standards compliant... good thing in the long run, but too bad for older hardware. On a server I have, there are newer BIOSes available that fix ACPI parsing problems... but alas, when I upgrade BIOS the agpgart hangs. I have not found a way to disable agpgart with a kernel parameter, it is a server with no agp anyway.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, Sep 05, 2006 at 08:57:13AM -0400, Alex Pilson wrote:
At 7:46 AM -0500 9/5/06, Johnny Hughes wrote:
I have now upgraded 30 production servers and several dozen workstations and have had no problems at all.
What is your typical hardware setup so we can have a good comparison guide here. Intel? AMD?
Very good post BTW.
Intel Celeron, P3 and P4, AMD Athlon and Sempron here.
Not a single problem.
[]s
- -- Rodrigo Barbosa "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)
At 7:46 AM -0500 9/5/06, Johnny Hughes wrote:
I have now upgraded 30 production servers and several dozen workstations and have had no problems at all.
What is your typical hardware setup so we can have a good comparison guide here. Intel? AMD?
5 Intel machines here, no problem. Fresh install of 4.4 with raid1 killed me though
Johnny Hughes napsal(a):
- You need to install python-sqlite before sqlite (or at nearly the
same time).
To accomplish this you can do this:
yum update python-sqlite sqlite
I'd add, please do not update sqlite alone. Updating sqlite prior python-sqlite will end with yum not working completely. Yum will start and finally eat all available memory during repodata reading. OS may hang.
Maybe dev team could release update version of yum, sqlite, and python-sqlite with changed dependecy in specs? So 'yum update yum' will lead to update all the three packages. David
David Hrbáč napsal(a):
I'd add, please do not update sqlite alone. Updating sqlite prior python-sqlite will end with yum not working completely. Yum will start and finally eat all available memory during repodata reading. OS may hang.
Forgot to write. Solution is to install python-sqlite manually, rpm -Uhv python-sqlite-1.1.7-1.2...... David
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, Sep 05, 2006 at 07:46:45AM -0500, Johnny Hughes wrote:
- There are sometimes futex issues ... that is a generic error that
can be caused by many things, usually it is a somehow messed up rpm database with double RPMS installed or another problem. Maybe sometimes it just happens, it has been so since at least RH8.
Tip: The best way to reduce the likeness of this yum futex problem happening is to remove the __* files that exist on the /var/lib/rpm directory before executing yum.
It will not completely eliminate the futex problem, but will drastically reduce them.
I have now upgraded 30 production servers and several dozen workstations and have had no problems at all.
About 12 servers here, and no problems.
[]s
- -- Rodrigo Barbosa "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)
On Tuesday 05 September 2006 08:46, Johnny Hughes wrote:
OK ... there has been an awful lot of attributing problems to the 4.4 upgrade that are not really upgrade problems.
The best I can tell there is really only one major problem:
- You need to install python-sqlite before sqlite (or at nearly the
same time).
I would consider http://bugs.centos.org/view.php?id=1483 to not be minor. I looked over Red Hat's Bugzilla and didn't, in the few minutes I skimmed, see the same issue in upstream. It could be related to yum's means of doing the package update versus up2date's method; on a production DNS box I had the problem mentioned in this bug, but on a machine that wasn't the production name server I didn't.
I reproduced the issue using the proper yum sequence, updating python-sqlite, then sqlite, the updating yum, then doing a clean all, and had the problem.
On Tue, 2006-09-05 at 11:32 -0400, Lamar Owen wrote:
On Tuesday 05 September 2006 08:46, Johnny Hughes wrote:
OK ... there has been an awful lot of attributing problems to the 4.4 upgrade that are not really upgrade problems.
The best I can tell there is really only one major problem:
- You need to install python-sqlite before sqlite (or at nearly the
same time).
I would consider http://bugs.centos.org/view.php?id=1483 to not be minor. I looked over Red Hat's Bugzilla and didn't, in the few minutes I skimmed, see the same issue in upstream. It could be related to yum's means of doing the package update versus up2date's method; on a production DNS box I had the problem mentioned in this bug, but on a machine that wasn't the production name server I didn't.
I reproduced the issue using the proper yum sequence, updating python-sqlite, then sqlite, the updating yum, then doing a clean all, and had the problem.
In the original case, we have tracked the problem to the sqlite update issue.
In your case, it seems to be something else.
In the original case, the issue was caused due to a mismatch in named libs to the named binaries ... (as the update died in the middle).
In your case, I'm not sure what caused the issue.
If you send me an "rpm -qa" of the server in question, I will make a test machine that matches it as closely as I can and update it several times and see if I can duplicate the error.
Thanks, Johnny Hughes
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Tue, Sep 05, 2006 at 11:32:22AM -0400, Lamar Owen wrote:
On Tuesday 05 September 2006 08:46, Johnny Hughes wrote:
OK ... there has been an awful lot of attributing problems to the 4.4 upgrade that are not really upgrade problems.
The best I can tell there is really only one major problem:
- You need to install python-sqlite before sqlite (or at nearly the
same time).
I would consider http://bugs.centos.org/view.php?id=1483 to not be minor. I looked over Red Hat's Bugzilla and didn't, in the few minutes I skimmed, see the same issue in upstream.
Since upstream don't use yum, I would not expect to see it there.
It could be related to yum's means of doing the package update versus up2date's method;
No. It is just that up2date doesn't use python-sqlite, or even sqlite as far as I know.
[]s
- -- Rodrigo Barbosa "Quid quid Latine dictum sit, altum viditur" "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)
On Tuesday 05 September 2006 13:20, Rodrigo Barbosa wrote:
No. It is just that up2date doesn't use python-sqlite, or even sqlite as far as I know.
I've sent Johnny the rpm -qa off-list.
Once again, let me note that I did the update with the recently recommended sequence of operations; doing the update to python-sqlite first, then updating sqlite, yum, and yum's dependencies, doing a yum clean all, and then the yum update (it's documented at bug 1483). I did not experience a yum 'lockup'; the yum update ran to completion, but I ended up with duplicates and a hosed bind. The yum log lists the updates; but the cleanup left dupes. Of course, yum is now hopelessly confused; but it runs, doesn't hog the CPU, or any of the other symptoms of the sqlite thing.
On Tue, 5 Sep 2006, Lamar Owen wrote:
I would consider http://bugs.centos.org/view.php?id=1483 to not be minor.
Hi, Lamar -- as it is my bug, a bit of context for those who have not read it is in order. -- smile -- In the seven hours it took me to run down the cause, I assure you, I did not consider it minor either. Thus my early and formal report and commentary for others to find.
I would note that Johnny and I, and Seth at the end, worked on this both in the main #centos IRC channel, and out of channel, to run down hypotheses for me to test. Thanks guys.
I looked over Red Hat's Bugzilla and didn't, in the few minutes I skimmed, see the same issue in upstream. It could be related to yum's means of doing the package update versus up2date's method; on a production DNS box I had the problem mentioned in this bug, but on a machine that wasn't the production name server I didn't.
No surprise that yum/sqlite issues do not affect the upstream, as their approach on the updater varies.
This bug hinges, very much, on the non-atomic nature of 'hot' system updates, and the fact that the yum-needed, sqlite-maintinaed cache of pacakges got munged half way through, to reproduce.
It is 'luck of the draw' as there are no relevant Requires in play, in the transaction sort as to whether the bind-libs and bind update fall on the same side of the update failure -- so long as they are NOT on differing sides, there is no problem; When they varied, not surprisingly, bind gets confused. ;0
I reproduced the issue using the proper yum sequence, updating python-sqlite, then sqlite, the updating yum, then doing a clean all, and had the problem.
and, in my post-analysis, it looks like there a pretty strong liklihood that this approach is great for over 90% of the boxes out there. Boxes with 'tight' partitioning, or packages held back (exclude=) from updates are a bit more likely to need two or more passes, and so expose themselves more frequently to the sequencing risk, where any failure needs manual intervention, to recover from.
-- Russ Herrold
On Tuesday 05 September 2006 22:35, R P Herrold wrote:
This bug hinges, very much, on the non-atomic nature of 'hot' system updates, and the fact that the yum-needed, sqlite-maintinaed cache of pacakges got munged half way through, to reproduce.
Sounds like a database type issue; that is, it could be demonstrated that a multiversion concurrency control (MVCC) mechanism, similar to the PostgreSQL backend, for the whole filesystem, would help this sort of thing. That is, a filesystem on ACID. (Atomicity, Consistency, Isolation, Durability: the magic mantra of database management).
A real transactional filesystem would allow truly atomic system updates. But, of course, there are definite downsides to that.
On Sat, 2006-09-09 at 12:43, Lamar Owen wrote:
On Tuesday 05 September 2006 22:35, R P Herrold wrote:
This bug hinges, very much, on the non-atomic nature of 'hot' system updates, and the fact that the yum-needed, sqlite-maintinaed cache of pacakges got munged half way through, to reproduce.
Sounds like a database type issue; that is, it could be demonstrated that a multiversion concurrency control (MVCC) mechanism, similar to the PostgreSQL backend, for the whole filesystem, would help this sort of thing. That is, a filesystem on ACID. (Atomicity, Consistency, Isolation, Durability: the magic mantra of database management).
A real transactional filesystem would allow truly atomic system updates. But, of course, there are definite downsides to that.
I think you are missing the big picture here. Yum is managing the whole system. Given an ACID database, would you expect to be able to upgrade it to a new, potentially incompatible version of an ACID database with transactions in progress? What yum needs is just some special consideration when modifying its own components.
On Saturday 09 September 2006 14:42, Les Mikesell wrote:
On Sat, 2006-09-09 at 12:43, Lamar Owen wrote:
A real transactional filesystem would allow truly atomic system updates. But, of course, there are definite downsides to that.
I think you are missing the big picture here.
No, I don't think I am. I think you are. The big picture is that yum is not atomic in its update (not yum's fault, either); lack of atomicity (in my case) produced a problem (I DID UPDATE python-sqlite IN THE CORRECT ORDER). This is, thanks to yum's role in managing the complete installed set, a systemic issue; the whole system needs to atomically go from one consistent state to another, and portions of the system in one state need to be isolated from those portions of the system in the other state. Otherwise there will be problems; no, they are not terribly widespread; but the general case solution would work wonders for yum updating its own 'stuff' too.
In the general case, I'd like to issue something like:
# acidfs-begin-transaction # yum -y update [bunch of output] # if yum-no-error-condition; # acidfs-commit # else # acidfs-rollback # fi
RDBMS's have been doing this for decades. I do this daily, using SQL. Note that, for all processes except the shell process inside the transaction, no changes have occurred to the filesystem; after acidfs-commit all the changes 'suddenly' appear (and hopefully in-core text is reloaded if possible); if you get to acidfs-rollback, everything reverts and no process is any the wiser.
Yum is managing the whole system.
It is currently managing the whole filesystem. But what about in-memory program text? The currently loaded programs need to continue to have the older libs available if needed; a single 'commit' operation at the end needs to atomically reload program text and provide consistent library dependencies post-run (the only way you can do this now is shut down the system, boot rescue media, and yum update to a chroot (which is the system), then reboot the system (aka 'booting an update CD'; anaconda does this quite well)). The in core text needs to be isolated from what's going on on the filesystem, and if an error occurs (out of space, for instance, or a locked file) an atomic rollback would be very nice (anaconda does not do this, though).
Given an ACID database, would you expect to be able to upgrade it to a new, potentially incompatible version of an ACID database with transactions in progress?
Of course I would; a filesystem/database (hmm, let's see, similar to, but farther beyond what NILFS claims) would have to be guaranteed backwards compatible. That's a given for a filesystem; it is unreasonable for the authors of a filesystem to introduce such changes and expect seamless updates of the filesystem code itself.
What yum needs is just some special consideration when modifying its own components.
Ok, let me repeat: I updated the yum components in the right order. This is NOT the sqlite/python-sqlite issue. In my case, after following the advice of updating python-sqlite, then sqlite, then yum and its dependencies, I got a system with a lot of dupes and out of sync libraries. Lots of out of sync libraries; I've not finished the recovery yet, although bind at least is working. If there were a 'yum rollback' (or a 'acidfsrollback snapshot-prior-to-yum') then I could at least try it again. But there isn't, and I know there isn't, and I know all the deal about no support, etc etc.
But, yes, yum does need some special care with its own components; but that's the narrow view; I'm looking at the big picture of a possible system-wide rollback/commit facility for true systemic atomicity; consistency of the in-core text versus on-disk text (the times I've tried to open something with, say, firefox, when it has been open for a while, but after it has been yum updated and the open fails in mysterious ways are rather annoying; this is not unique to CentOS); isolation of the changes that are happening on-disk and the view from the in-core text (my firefox example; if you run a yum update of firefox while firefox is running, strange things are guaranteed to happen to your running firefox as its in-core text becomes inconsistent with the on-disk libs and such!); and durability of the change (once committed, fully committed).
Yes, I know a reboot to special update media would fix all that; that's an anaconda-mediated update, and from the system's point of view it is atomic (you're just doing the update under general anesthesia, so to speak). But online updating, and updating without rebooting, are things I am not really willing to give up (especially when the quarterly update is as large as it is usually; makes Windows XP Service Pack 2 look like you're downloading a small text file!)
Just throwing an idea out, that's all, for discussion. This systemic non-atomicity and inconsistency is endemic to all linuxen at the moment.
On Sat, 2006-09-09 at 16:57, Lamar Owen wrote:
In the general case, I'd like to issue something like:
# acidfs-begin-transaction # yum -y update [bunch of output] # if yum-no-error-condition; # acidfs-commit # else # acidfs-rollback # fi
Isn't that what LVM snapshots are supposed to provide?
Yum is managing the whole system.
It is currently managing the whole filesystem. But what about in-memory program text?
That's up to the rpm layer and the scripts contained in the packages. Yum doesn't know much about that other than dependencies.
The currently loaded programs need to continue to have the older libs available if needed;
This would not be possible without massive changes in the way RPM works. The new installs won't happen if they can't see their dependent libs.
a single 'commit' operation at the end needs to atomically reload program text and provide consistent library dependencies post-run (the only way you can do this now is shut down the system, boot rescue media, and yum update to a chroot (which is the system), then reboot the system (aka 'booting an update CD'; anaconda does this quite well)). The in core text needs to be isolated from what's going on on the filesystem, and if an error occurs (out of space, for instance, or a locked file) an atomic rollback would be very nice (anaconda does not do this, though).
The only hope would be to make the equivalent of a virtual machine where the old system keeps running until the new one is completely constructed.
Given an ACID database, would you expect to be able to upgrade it to a new, potentially incompatible version of an ACID database with transactions in progress?
Of course I would; a filesystem/database (hmm, let's see, similar to, but farther beyond what NILFS claims) would have to be guaranteed backwards compatible.
Backwards compatibility? You seem to have confused Linux distributions with something else. Try, for example, to copy a Centos 3.x distro onto filesystems created by 4.x and make it boot.
That's a given for a filesystem; it is unreasonable for the authors of a filesystem to introduce such changes and expect seamless updates of the filesystem code itself.
Are filesystem authors required to be reasonable?
What yum needs is just some special consideration when modifying its own components.
Ok, let me repeat: I updated the yum components in the right order. This is NOT the sqlite/python-sqlite issue. In my case, after following the advice of updating python-sqlite, then sqlite, then yum and its dependencies, I got a system with a lot of dupes and out of sync libraries. Lots of out of sync libraries; I've not finished the recovery yet, although bind at least is working.
I think I'm missing something here. If yum itself or the rpm database wasn't broken, what went wrong?
If there were a 'yum rollback' (or a 'acidfsrollback snapshot-prior-to-yum') then I could at least try it again. But there isn't, and I know there isn't, and I know all the deal about no support, etc etc.
But you've made a big assumption here that yum itself would work properly while doing this. If yum was working right you wouldn't have dups now.
Just throwing an idea out, that's all, for discussion. This systemic non-atomicity and inconsistency is endemic to all linuxen at the moment.
And necessarily so, since each rpm package installs independently and may have to complete with both process and filesystem changes before some of the others will work.
On Sat, Sep 09, 2006 at 08:08:15PM -0500, Les Mikesell wrote:
Isn't that what LVM snapshots are supposed to provide?
A yum plugin to do an lvm snapshot before any actions would be cool.
On Sun, 2006-09-10 at 11:16, Matthew Miller wrote:
On Sat, Sep 09, 2006 at 08:08:15PM -0500, Les Mikesell wrote:
Isn't that what LVM snapshots are supposed to provide?
A yum plugin to do an lvm snapshot before any actions would be cool.
I think somewhere in the fine print it says that LVM snapshots don't work on the root filesystem or the tools don't support it completely - at least that's what they are saying on the lvm mail list. Probably the same sort of issue where changes in the system affect the currently running programs that is causing the problem you want to avoid - and this just moves it somewhere else.
[I realize this is growing OT, and I don't plan to discuss this ad nauseum.] [Note also that a LUFS-based ACID filesystem (using PostgreSQL as the storage manager) exists; see http://www.edlsystems.com/psqlfs/ for this. Since it's LUFS-based, probably couldn't use it for a root filesystem anyway....and it's fairly old at this point; a similar filesystem for FUSE is at http://relfs.sourceforge.net/; also, to see how old an idea this is, read http://www.linuxjournal.com/article/1383 for a 1997-era take on it]
On Saturday 09 September 2006 21:08, Les Mikesell wrote:
On Sat, 2006-09-09 at 16:57, Lamar Owen wrote:
In the general case, I'd like to issue something like:
# acidfs-begin-transaction # yum -y update [bunch of output] # if yum-no-error-condition; # acidfs-commit # else # acidfs-rollback # fi
Isn't that what LVM snapshots are supposed to provide?
No. LVM snapshots could allow you to roll back all changes to a filesystem to a previous state (a VMware snapshot likewise); what I'm talking about only rolls back (if needed) the changes made by the yum process, allowing other filesystem changes to stay (like changes to a running PostgreSQL or MySQL database, or web page changes, or /var/log/messages changes, etc). Basic database stuff. And, critical to what I'm talking about, a truly ACID filesystem isolates the changes-in-process-but-not-committed to the process doing the changes; the rest of the system is clueless as to the changes until the filesystem changes are committed.
Of course, my procedure above is oversimplified; one would obviously want to lock for writing the files impacted by an update; for that, you'd want to get a list of the rpms that are going to be updated, and then lock for writing the files in the packages that are going to be updated. With MVCC, the write lock does not block any readers (they're going to get the previous version anyway).
The currently loaded programs need to continue to have the older libs available if needed;
This would not be possible without massive changes in the way RPM works. The new installs won't happen if they can't see their dependent libs.
If the filesystem itself is using MVCC, then it is done below the RPM level by the filesystem; until the commit occurs all processes except the one doing the update still see the old filesystem state. This is the 'I' in ACID, and is a basic trait of all but the most crippled databases.
The only hope would be to make the equivalent of a virtual machine where the old system keeps running until the new one is completely constructed.
If the filesystem itself implements process-granular MVCC (multiversion concurrency control), then a VM environment isn't necessary.
Backwards compatibility? You seem to have confused Linux distributions with something else. Try, for example, to copy a Centos 3.x distro onto filesystems created by 4.x and make it boot.
That's forwards compatibility. Take a CentOS 4 system and install to a filesystem created by CentOS 3 to test backwards compatibility.
Yes, I know the 'culture' of real backwards compatibility is not strong, and that's regretable, but at the same time even today most all Linux distributions are capable of reading very early ext2 filesystems. Now, I'm not sure if the modern Linuxen would open and run a libc4 a.out binary, though....
Are filesystem authors required to be reasonable?
Why not? (Not that it's relevant). <rant>And that's one reason I won't touch ReiserFS with a ten foot pole, even though it has some very nice database-like features.</rant> Unfortunately there are a few high-profile OSS developers who aren't apparently reasonable (Schilling, for instance); most OSS developers, thankfully, are fairly reasonable, within reason (The PostgreSQL team, for instance, to use one with which I am closely familiar).
I think I'm missing something here. If yum itself or the rpm database wasn't broken, what went wrong?
Good question. I don't have a full answer to this; I do know that a 'yum rollback' would be a very nice feature for me right now for that one server. The rpm database is fine; yum did not crash out, but I still got dupes, and bind fell over and died a horrible screaming death because of it. I do not know exactly what happened behind the scenes to create the mess; I am just left with a mess after following the 'recommended' procedure (that worked fine on several other boxen; none of which were active nameservers, though).
But you've made a big assumption here that yum itself would work properly while doing this. If yum was working right you wouldn't have dups now.
Maybe. I don't know that for sure; I think the problem lies deeper, myself. But I have no evidence to back up my gut feeling.
Just throwing an idea out, that's all, for discussion. This systemic non-atomicity and inconsistency is endemic to all linuxen at the moment.
And necessarily so, since each rpm package installs independently and may have to complete with both process and filesystem changes before some of the others will work.
Yeah, I know that far too well; and I know some of RPM's more arcane bugs that have, in the past, been resolved with WONTFIX. Each RPM is standalone; in the process of maintaining a fairly interdependent set of RPMs (PostgreSQL) for five years I learned this very well. But thanks to my experience at this low level I am of the conviction that this is the wrong thing from a system point of view; unfortunately I am not convinced that the 'right way' is out there; but I'd know it if I saw it. An ACID compliant filesytem, among other advantages, could alleviate and work around some of the issues of the RPM package system (Debian's isn't any better in this regard).
Les Mikesell wrote:
On Sat, 2006-09-09 at 12:43, Lamar Owen wrote:
On Tuesday 05 September 2006 22:35, R P Herrold wrote:
This bug hinges, very much, on the non-atomic nature of 'hot' system updates, and the fact that the yum-needed, sqlite-maintinaed cache of pacakges got munged half way through, to reproduce.
Sounds like a database type issue; that is, it could be demonstrated that a multiversion concurrency control (MVCC) mechanism, similar to the PostgreSQL backend, for the whole filesystem, would help this sort of thing. That is, a filesystem on ACID. (Atomicity, Consistency, Isolation, Durability: the magic mantra of database management).
A real transactional filesystem would allow truly atomic system updates. But, of course, there are definite downsides to that.
I think you are missing the big picture here. Yum is managing the whole system. Given an ACID database, would you expect to be able to upgrade it to a new, potentially incompatible version of an ACID database with transactions in progress? What yum needs is just some special consideration when modifying its own components.
A simple solution...install 3-phase power at your home, dedicate an air conditioned room to the project, buy a VAX and run VMS. :-)
On Sat, 9 Sep 2006, Lamar Owen wrote:
On Tuesday 05 September 2006 22:35, R P Herrold wrote:
This bug hinges, very much, on the non-atomic nature of 'hot' system updates, and the fact that the yum-needed, sqlite-maintianed cache of packages got munged half way through, to reproduce.
Sounds like a database type issue; ... That is, a filesystem on ACID. (Atomicity, Consistency, Isolation, Durability: the magic mantra of database management).
*flashback* Acid - Berkeley - BSD -- coincidence? I think _not_. ;)
*cough* Nothing so complex needed here; a simple early flock holding upon the sqlite-python verson with which the cache was created would have sufficed, in thinking about the bugreport.
As to ACID, a interesting sidelight (as to MySQL, not the PostgreSQL which Lamar mentioned). It turns out that even with use of the Innodb backend engine in MySQL as we ship it in CentOS, and as inherited from the upstream PNAELV, for perfomance reasons, full ACID compliance is turned off. We found a reproducer in a project I am involved with, which forced us to dig down and at the database level, to ensure that full compliance is enabled from the initial connection on [http://www.trading-shim.org/capitals/?NEWS, July 30 release], to solve a need for transaction Isolation [http://dev.mysql.com/books/mysqlpress/mysql-tutorial/ch10.html].
As to using a database backend filesystem, with full ACID consistency enabled, I think a certain OS vendor in Redmond had to backaway from that featureset bulletpoint item in a somewhat delayed upcoming release. One had better be willing to suffer performance hit collateral damage with the present state of the implementaion art. Think of inviting a gorilla over, to use an anvil to swat a fly in the kitchen. ouch.
-- Russ Herrold
As to using a database backend filesystem, with full ACID consistency enabled, I think a certain OS vendor in Redmond had to backaway from that featureset bulletpoint item in a somewhat delayed upcoming release. One had better be willing to suffer performance hit collateral damage with the present state of the implementaion art. Think of inviting a gorilla over, to use an anvil to swat a fly in the kitchen. ouch.
It the risk of saying that which shall not be named, doesn't reiserfs v4 use some form of db implementation for storage?
On Sunday 10 September 2006 20:45, R P Herrold wrote:
On Sat, 9 Sep 2006, Lamar Owen wrote:
On Tuesday 05 September 2006 22:35, R P Herrold wrote:
This bug hinges, very much, on the non-atomic nature of 'hot' system updates, and the fact that the yum-needed, sqlite-maintianed cache of packages got munged half way through, to reproduce.
Sounds like a database type issue; ... That is, a filesystem on ACID. (Atomicity, Consistency, Isolation, Durability: the magic mantra of database management).
*flashback* Acid - Berkeley - BSD -- coincidence? I think _not_. ;)
As long as the filesystem doesn't hallucinate, we're ok.
*cough* Nothing so complex needed here; a simple early flock holding upon the sqlite-python verson with which the cache was created would have sufficed, in thinking about the bugreport.
For your original report, probably so. But I did the update of those bits first, and got no errors or 'hung' yum.
As to ACID, a interesting sidelight (as to MySQL, not the PostgreSQL which Lamar mentioned). It turns out that even with use of the Innodb backend engine in MySQL as we ship it in CentOS, and as inherited from the upstream PNAELV, for perfomance reasons, full ACID compliance is turned off.
Why does this not surprise me? PostgreSQL doesn't even have an option to turn off ACID-compliance; and performance is competitive, especially under large concurrencies.
As to using a database backend filesystem, with full ACID consistency enabled, I think a certain OS vendor in Redmond had to backaway from that featureset bulletpoint item in a somewhat delayed upcoming release.
:-) I figured someone would bring up WinFS. Wouldn't it be a serious coup for the open source community to do what MS could not?
One had better be willing to suffer performance hit collateral damage with the present state of the implementaion art. Think of inviting a gorilla over, to use an anvil to swat a fly in the kitchen. ouch.
Nah. Think of a flyswatter that could, if needed, reconstitute that swatted fly if, perchance, it turned out to be a useful insect. Hmm, makes undelete real easy, too. Makes secure wiping harder, though. NILFS is part of the way there. Shoot first, ask questions later, rollback if needed.
There probably would be a performance hit to a degree; but, again, PostgreSQL's performance under MVCC is quite good, and competitive even with MySQL's MyISAM tables under large concurrencies with an even mix of readers and writers.