[CentOS] Why is yum not liked by some?

Fri Sep 9 16:19:20 UTC 2005
Lamar Owen <lowen at pari.edu>

On Friday 09 September 2005 11:43, Bryan J. Smith wrote:
> Lamar Owen <lowen at pari.edu> wrote:
> > This seems a good application for differential or 'patch'
> > RPM's and a CVS-like RPM repository.  The mirroring

> But now you're talking GBs of binary data, and "real-time"
> resolution by 1,000s of clients!

> Yes, it's techncially possible to leverage XDelta and other
> support to do this.  And you're going to burden your server
> with a massive amount of overhead that you don't have when
> you simply share stuff out via HTTP.

It depends on the implementation.  You in your other delta message spell out 
essentially the same idea.

> Think about it.  ;->

I have, repeatedly.  If the RPMs in question are stored with the payload 
unpacked, and binary deltas against each file (similar to the CVS repository 
v file) stored, then what is happening is not quite as CPU-intensive as you 
make it out to be.  Most patches are a few bytes here and there in an up to a 
few megabyte executable, with most package patches touching one or a few 
files, but typically not touching every binary in the package.  You store the 
patch (applied with xdelta or similar) and build the payload on the fly 
(simple CPIO here).  You send an RPM out that was packed by the server, which 
is I/O bound, not CPU bound.  With forethought to those things that can be 
prebuilt versus those things that have to be generated realtime, the amount 
of realtime generation can be minimized, I think.

> CPU, memory, disk, etc... will be _expoentially_ increased.

Prove exponential CPU usage increase.  If designed intelligently, it might be 
no more intensive than rsync, which is doing much of what is required 
already.  Would need information on the loading of rsync on a server.

> As I said, check in GBs of different binarie revisions to CVS
> and share the same out in multiple trees via HTTP.  Now
> monitor the difference in load when you have 1, 2, 4, 8 ...
> 1024 clients connect!

That's because CVS as it stands is inefficient with binaries.

> It is _not_ feasible, period.

Think outside the CVS box, Bryan.  I did not say 'Use CVS for this'; I said 
'Use a CVS-like system for this' meaning simply the guts of the mechanism.  
CVS per se would be horribly inefficient for this purpose.

Store the unpacked RPMs and binary deltas for each file.  Store prebuilt 
headers if needed.  Trust the server to sign on the fly rather than at build 
time (I/O bound).  Pack the payload on the fly with CPIO (I/O bound).  Send 
the RPM out (I/O bound) when needed.  Mirrors rsync the whole unpacked 
repository (I/O bound).

Are there issues with this?  Of course there are.  But the tradeoff is 
mirroring many GB of RPM's (rsync has to take some CPU for mirroring this 
large of a collection) versus mirroring fewer GB of unpacked RPM's plus 
binary deltas, and signing the on-the-fly RPM.  Yes, it will take more CPU, 
but I think linearly more CPU and not exponentially.  Of course, it would 
have to be tried.  The many GB of mirror has got to have many GB of 
redundancy in it.

The size of the updates is getting out of control; for those with limited 
bandwidth it becomes very difficult to stay up to date.
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu