[CentOS] Why is yum not liked by some? -- 3-tier Delta-Patch-Client (doesn't solve client problem)

Tue Sep 13 23:23:41 UTC 2005
Bryan J. Smith <b.j.smith at ieee.org>

"Bryan J. Smith" <b.j.smith at ieee.org> wrote:
> *UNLESS* you aren't talking about deltas ... but *PATCHES*
> Patches are _not_ Deltas ... So what's the catch?  Space!
> So while you drastically reduce the ripple load on the
> server, you increase the storage.  Catch-22.
>  ...  
> You can_not_ do deltas without the _original_ delta files.
> So you would have to transfer the _entire_ delta file to
> the client, which is _larger_ than just the RPM.  ;->
> That's the impossibility I'm talking about!  ;->
> The only way is by maintaining patches on the server.
> That removes the overhead of run-time generation of
> differences via a "ripple delta" because the patches are
> only generated once.

Here is how an "ideal" 3-tier Delta-Patch-Client approach
could work.

- Master Repos:  Maintains full deltas of all packages
- Server Repos:  Also maintains full deltas of all packages,
            generates patches of all package permutations
- Clients:  Download patches

The Master Repos only push delta changes to Server Repos.
The Clients only pull patches from Server Repos.

The Master Repos _only_ need to ripple through deltas when
Server Repos request updates.  If server repos do this
frequently enough, this should only be 1 delta.  In fact, the
Master Repo should "cache" the "last patch" (HEAD - 1 rev)
for the Server Repos.

The Server Repos actually serve the clients.  They generate
all necessary revision permutations.  I.e., if 1.7 has just
been downloaded at the server repo, it needs to generate 6
patches (1.1 -> 1.7, 1.2-> 1.7 ... 1.6 -> 1.7) -- but it
_only_ does that once.  It then keeps the patches for the
clients to use.

This is the _most_efficient_ way to both Master Repo to
Server Repo transfers as well as Server Repro to Client
transfers.  But because the Master Repos are not serving
clients, the "ripple delta" overhead is virtually eliminated
for the Master Repos (and it caches the last delta, as most
Server Repos will typically be near HEAD).  It also
_exponentially_ reduces the number of "ripple deltas" a
Server Repo has to do -- as it only does one "patch set"
one-time for the clients.

BUT WHAT THIS DOES _NOT_ DO IS ADDRESS THE "DEPENDENCY"
META-DATA ISSUE.

Because the only way to address the "dependency" meta-data
issue is to maintain _all_ delta changes.  That means -- yet
again -- a version control repository at the client itself. 
So you're back to mirroring repositories (although the delta
approach _does_ reduce the amount necessary to mirror).

So we _still_ need a "bloated" meta-data format at the Server
Repo so the clients can figure out dependencies without first
having to download the patch.  It's no different than full
RPMs, except the patches are smaller than full RPMs.  But you
still don't want clients downloading one patch, checking it
only to discover they need another patch for another package,
etc...

So, again, while it solves the traffic issue, it does _not_
solve the larger issue of "give me all changes through date
X" when the current date is Y.  That's why you can_not_
avoiding having to maintain your own, _internal_ repository. 
Only with the _full_ repository can you do this _internally_.


-- 
Bryan J. Smith                | Sent from Yahoo Mail
mailto:b.j.smith at ieee.org     |  (please excuse any
http://thebs413.blogspot.com/ |   missing headers)