"Bryan J. Smith" b.j.smith@ieee.org wrote:
*UNLESS* you aren't talking about deltas ... but *PATCHES* Patches are _not_ Deltas ... So what's the catch? Space! So while you drastically reduce the ripple load on the server, you increase the storage. Catch-22. ... You can_not_ do deltas without the _original_ delta files. So you would have to transfer the _entire_ delta file to the client, which is _larger_ than just the RPM. ;-> That's the impossibility I'm talking about! ;-> The only way is by maintaining patches on the server. That removes the overhead of run-time generation of differences via a "ripple delta" because the patches are only generated once.
Here is how an "ideal" 3-tier Delta-Patch-Client approach could work.
- Master Repos: Maintains full deltas of all packages - Server Repos: Also maintains full deltas of all packages, generates patches of all package permutations - Clients: Download patches
The Master Repos only push delta changes to Server Repos. The Clients only pull patches from Server Repos.
The Master Repos _only_ need to ripple through deltas when Server Repos request updates. If server repos do this frequently enough, this should only be 1 delta. In fact, the Master Repo should "cache" the "last patch" (HEAD - 1 rev) for the Server Repos.
The Server Repos actually serve the clients. They generate all necessary revision permutations. I.e., if 1.7 has just been downloaded at the server repo, it needs to generate 6 patches (1.1 -> 1.7, 1.2-> 1.7 ... 1.6 -> 1.7) -- but it _only_ does that once. It then keeps the patches for the clients to use.
This is the _most_efficient_ way to both Master Repo to Server Repo transfers as well as Server Repro to Client transfers. But because the Master Repos are not serving clients, the "ripple delta" overhead is virtually eliminated for the Master Repos (and it caches the last delta, as most Server Repos will typically be near HEAD). It also _exponentially_ reduces the number of "ripple deltas" a Server Repo has to do -- as it only does one "patch set" one-time for the clients.
BUT WHAT THIS DOES _NOT_ DO IS ADDRESS THE "DEPENDENCY" META-DATA ISSUE.
Because the only way to address the "dependency" meta-data issue is to maintain _all_ delta changes. That means -- yet again -- a version control repository at the client itself. So you're back to mirroring repositories (although the delta approach _does_ reduce the amount necessary to mirror).
So we _still_ need a "bloated" meta-data format at the Server Repo so the clients can figure out dependencies without first having to download the patch. It's no different than full RPMs, except the patches are smaller than full RPMs. But you still don't want clients downloading one patch, checking it only to discover they need another patch for another package, etc...
So, again, while it solves the traffic issue, it does _not_ solve the larger issue of "give me all changes through date X" when the current date is Y. That's why you can_not_ avoiding having to maintain your own, _internal_ repository. Only with the _full_ repository can you do this _internally_.