Les Mikesell lesmikesell@gmail.com wrote:
What I want is to be able to update more than one machine and expect them to have the same versions installed. If that isn't a very common requirement I'd be very surprised.
So what you want to checkout the repository from a specific tag and/or date. So you want:
1. The repository to have every single package -- be it packages as whole, or some binary delta'ing between RPMs (if possible)
2. The repository meta-data to have all history so it can backtrack to any tag/date.
In other words, you want a repository to maintain storage and use CPU-I/O power to resolves tens of GBs of inter-related data and corresponding versioning meta-data.
BTW, Your comparison to CVS is extremely poor, so _stop_. ;-> I'm going to show you how in a moment.
APT, YUM and countless other package repositories store packages whole, with a "current state" meta-data list, and the packages and that meta-data is services via HTTP and the _client_ resolves what it wants to do.
What you want is a more "real-time" resolution logic "like CVS." That either requires:
A) A massive amount of data transfer if done at the client, or
B) A massive amount of CPU-I/O overhead if done at the server
Gettting to your piss-poor and inapplicable analogy to CVS, "A" is typically done either on _local_ disk or over a NFS mount, possibly a streamed RSH/SSH. In any case, "A" is almost always done locally -- at least when it comes to multiple-GBs of files. ;->
"B" is what happens when you run in pserver/kserver mode, and you now limit your transaction size. I.e., try checking in a 500MB file to a CVS pserver, and see how _slow_ it is.
In other words, what you want is rather impractical for a remote server _regardless_ if the server or client does it. Remember, we're talking GBs of files!
I see 2 evolutionary approaches to the problem.
1. Maintain multiple YUM repositories, even if all but the original are links to the original. The problem is this is who defines what the "original" is? That's why you should maintain your _own_, so it's what _you_ expect it to be.
2. Modify the YUM respository meta-data files so they store revisions, whereby each time createrepo is run, the meta-data is continuing list.
#1 is direct and practical. #2 adds a _lot_ to the initial query YUM does, and could push it from seconds to minutes or even _hours_ at the client (not to mention the increase in traffic). That's the problem.
This isn't Centos-specific - I just rambled on from some other mention of it and apologize for dwelling on it here. There are 2 separate issues: One is that yum doesn't know if a repository or mirror is consistent or in the middle of an update with only part of a set of RPM's that really need to be installed together.
Not true. The checks that createrepo does can prevent an update if there are missing dependencies. The problem is that most "automated" repos bypass those checks.
So, again, we're talking "repository management" issues and _not_ the tool itself.
The other is that if you update one machine and everything works, you have no reason to expect the same results on the
next machine a few minutes later.
Because there is not tagging/date facility. But to add that, you'd have to add either (again): 1. A _lot_ of traffic (client-based) 2. A _lot_ of CPU-I/O overhead (server-based)
Again, using your poor analogy to CVS, have you every done a checkout of a 500MB over the Internet -- using ssh or, God help you, pserver/kserver?
Both issues would be solved if there were some kind of tag mechanism that could be applied by the repository updater after all files are present and updates could be tied to earlier tags even if the repository is continuously
updated.
So, in other words, you want the client to get repository info in 15-30 minutes, instead of 15-30 seconds. ;->
Either that, or you want the server of the repository to deal with all that overhead, taking "intelligent requests" from clients, instead of merely serving via HTTP.
I realize that yum doesn't do what I want - but lots of people must be having the same issues and either going to a lot of trouble to deal with them or just taking their chances.
Or we do what we've _always_ done. We maintain _internal_ configuration management.
We maintain the "complete" repository, and then individual "tag/date" repositories of links.
Understand we are _not_ talking a few MB of source code that you resolve via CVS. We're talking GBs of binary packages.
You _could_ come up with a server repository solution using XDelta and a running journal for the meta-data. And after a few hits, the repository server would tank.
The alternative is for the server repository to just keep complete copies of all packages (which some do), but then keeping a running journal for the meta-data. But that would still require the client to either download/resolve a lot (taking 15-30 minutes, instead of 15-30 seconds), _or_ put that resolution back on the server.
_This_ is the point you keep missing. It's the load that is required to do what you want. Not just a few hundred developers moving around a few MBs of files, but _tens_ of _thousands_ of users accessing _GBs_ of binaries.
That's why you rsync the repository down, and you do that locally. There is no way to avoid that. Even Red Hat Network (RHN) and other solutions do that -- they have you mirror things locally, with resolution going on locally.
In other words, local configuration management. It's very easy to do with RPM and YUM. You can't "pass the buck" to the Internet repository. Red Hat doesn't even let its Enterprise customers do it, and they wouldn't want to either. They have a _local_ repository.