[CentOS] Why is yum not liked by some?
lowen at pari.edu
Fri Sep 9 16:19:20 UTC 2005
On Friday 09 September 2005 11:43, Bryan J. Smith wrote:
> Lamar Owen <lowen at pari.edu> wrote:
> > This seems a good application for differential or 'patch'
> > RPM's and a CVS-like RPM repository. The mirroring
> But now you're talking GBs of binary data, and "real-time"
> resolution by 1,000s of clients!
> Yes, it's techncially possible to leverage XDelta and other
> support to do this. And you're going to burden your server
> with a massive amount of overhead that you don't have when
> you simply share stuff out via HTTP.
It depends on the implementation. You in your other delta message spell out
essentially the same idea.
> Think about it. ;->
I have, repeatedly. If the RPMs in question are stored with the payload
unpacked, and binary deltas against each file (similar to the CVS repository
v file) stored, then what is happening is not quite as CPU-intensive as you
make it out to be. Most patches are a few bytes here and there in an up to a
few megabyte executable, with most package patches touching one or a few
files, but typically not touching every binary in the package. You store the
patch (applied with xdelta or similar) and build the payload on the fly
(simple CPIO here). You send an RPM out that was packed by the server, which
is I/O bound, not CPU bound. With forethought to those things that can be
prebuilt versus those things that have to be generated realtime, the amount
of realtime generation can be minimized, I think.
> CPU, memory, disk, etc... will be _expoentially_ increased.
Prove exponential CPU usage increase. If designed intelligently, it might be
no more intensive than rsync, which is doing much of what is required
already. Would need information on the loading of rsync on a server.
> As I said, check in GBs of different binarie revisions to CVS
> and share the same out in multiple trees via HTTP. Now
> monitor the difference in load when you have 1, 2, 4, 8 ...
> 1024 clients connect!
That's because CVS as it stands is inefficient with binaries.
> It is _not_ feasible, period.
Think outside the CVS box, Bryan. I did not say 'Use CVS for this'; I said
'Use a CVS-like system for this' meaning simply the guts of the mechanism.
CVS per se would be horribly inefficient for this purpose.
Store the unpacked RPMs and binary deltas for each file. Store prebuilt
headers if needed. Trust the server to sign on the fly rather than at build
time (I/O bound). Pack the payload on the fly with CPIO (I/O bound). Send
the RPM out (I/O bound) when needed. Mirrors rsync the whole unpacked
repository (I/O bound).
Are there issues with this? Of course there are. But the tradeoff is
mirroring many GB of RPM's (rsync has to take some CPU for mirroring this
large of a collection) versus mirroring fewer GB of unpacked RPM's plus
binary deltas, and signing the on-the-fly RPM. Yes, it will take more CPU,
but I think linearly more CPU and not exponentially. Of course, it would
have to be tried. The many GB of mirror has got to have many GB of
redundancy in it.
The size of the updates is getting out of control; for those with limited
bandwidth it becomes very difficult to stay up to date.
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC 28772
More information about the CentOS