[CentOS] Why is yum not liked by some?

Wed Sep 14 02:21:06 UTC 2005
Les Mikesell <lesmikesell at gmail.com>

On Tue, 2005-09-13 at 18:03, Bryan J. Smith wrote:

> > I want that functionality, but I was arguing that all it
> > would take to get it is sequentially increasing timestamps
> > on files being added to the repository and knowledge of the
> > final timestamp of each consistent update set - and letting
> > the yum client have that information to figure out the
> rest.
> 
> But the meta-data and its dependency tree _changes_ for each
> point in time.  What was a dependency tree one createrepo
> changes on the next run.  That's the problem.

Yes, there is no argument that yum would have to change.
However it could be a small change.

> The only way to fix it currently is to have the YUM client
> access RPMs directly, instead of relying on the YUM
> repository's meta-dta. 

Yum does its dependency computations on the client side
based on the contents of the .hdr files (otherwise
it wouldn't work when combining the contents of different
repositories).  It needs the .hdr files, not the RPMS.
There is some magic in the repo metadata that makes
the client only download the latest .hdr files but if
you update often you end up with them all anyway and
use only the latest.  The needed change is that if you
specify a point-in-time the client should toss/ignore
.hdr files past that and get downrevs if available. Note
that you could do this yourself with nothing but an
ftp view of the repository and you'll see the client could
do it directly, although I agree that repository support
could make it easier.

> Otherwise, there has to be some major
> changes at the repository-level. 

I'd call it a minor change to expose an option to get
backrev .hdr files when wanted.


> > One thing that no one mentioned about CVS is that it always
> > stores the full ready-to-go copy of the latest version and
> > builds the diffs backwards to earlier versions on the
> > assumption that you are most likely to want the most recent
> > version.
> 
> Reverse deltas.  Instead of taking the original revision and
> rippling deltas forward, you take the latest, and do ripple
> deltas backward.

Yes, but what you really want to do is give the client
the least he needs to make what it has into what it
wants.  You are always going to be going forward and
clients that update regularly will always need only
the diff between the current and last prior RPM.

> Reverse deltas don't solve the "ripple differences" problem,
> but they do minimize it.  They typically cut the number of
> deltas required if people people are pulling the last few
> revisions.  That is typically the case in software.
> 
> If you're at revision 1.4 and you want version 1.7, the
> version control service of a forward delta must build all the
> way from 1.1 to 1.7 -- and ripple through 6 differences.  In
> the reverse delta, it would only need to ripple 3 times --
> from 1.7 back to 1.4.
> 
> *UNLESS* you aren't talking about deltas ... but *PATCHES*

If you work only 2 revs at a time there is no difference.

> Patches are _not_ Deltas.  Patches are like doing a full
> backup and an incremental since the last full backup.  So if
> you need to restore, you only need the latest incremental and
> last full.  There is no "ripple."  So you only need *1* file
> for an update.

Yes, one file for the difference between any two revs which
is almost always what you want - or you should be updating
more often.  If you need to repeat the process with multiple
steps, the client can easily calculate whether it is better
to collect multiple deltas and apply them or just grab the
complete version it wants.

> So what's the catch?  Space!

So be sensible about what you keep around and make the
client fall back to existing procedure if the delta
it might use isn't there.

> The only way is by maintaining patches on the server.
> That removes the overhead of run-time generation of
> differences via a "ripple delta" because the patches are only
> generated once.  But that then _bloats_ the server storage.

Keep only 1 or 2 delta/patch files for the latest revs where
the traffic will actually be happening and thus reduced.  In
the unlikely event you want something else, use the existing
procedure.

> Again, I don't think you understand how deltas work.  ;->

I didn't realize that you wouldn't call them deltas unless
you cram more than one in the same file.  Do you call the
first one a patch, then change the name when you append the
next run?  The piece everyone will want is currrent-1->current
so the most benefit would come from keeping that in it's
own file.

-- 
  Les Mikesell
    lesmikesell at gmail.com