On Friday 09 September 2005 11:43, Bryan J. Smith wrote:
Lamar Owen lowen@pari.edu wrote:
This seems a good application for differential or 'patch' RPM's and a CVS-like RPM repository. The mirroring
But now you're talking GBs of binary data, and "real-time" resolution by 1,000s of clients!
Yes, it's techncially possible to leverage XDelta and other support to do this. And you're going to burden your server with a massive amount of overhead that you don't have when you simply share stuff out via HTTP.
It depends on the implementation. You in your other delta message spell out essentially the same idea.
Think about it. ;->
I have, repeatedly. If the RPMs in question are stored with the payload unpacked, and binary deltas against each file (similar to the CVS repository v file) stored, then what is happening is not quite as CPU-intensive as you make it out to be. Most patches are a few bytes here and there in an up to a few megabyte executable, with most package patches touching one or a few files, but typically not touching every binary in the package. You store the patch (applied with xdelta or similar) and build the payload on the fly (simple CPIO here). You send an RPM out that was packed by the server, which is I/O bound, not CPU bound. With forethought to those things that can be prebuilt versus those things that have to be generated realtime, the amount of realtime generation can be minimized, I think.
CPU, memory, disk, etc... will be _expoentially_ increased.
Prove exponential CPU usage increase. If designed intelligently, it might be no more intensive than rsync, which is doing much of what is required already. Would need information on the loading of rsync on a server.
As I said, check in GBs of different binarie revisions to CVS and share the same out in multiple trees via HTTP. Now monitor the difference in load when you have 1, 2, 4, 8 ... 1024 clients connect!
That's because CVS as it stands is inefficient with binaries.
It is _not_ feasible, period.
Think outside the CVS box, Bryan. I did not say 'Use CVS for this'; I said 'Use a CVS-like system for this' meaning simply the guts of the mechanism. CVS per se would be horribly inefficient for this purpose.
Store the unpacked RPMs and binary deltas for each file. Store prebuilt headers if needed. Trust the server to sign on the fly rather than at build time (I/O bound). Pack the payload on the fly with CPIO (I/O bound). Send the RPM out (I/O bound) when needed. Mirrors rsync the whole unpacked repository (I/O bound).
Are there issues with this? Of course there are. But the tradeoff is mirroring many GB of RPM's (rsync has to take some CPU for mirroring this large of a collection) versus mirroring fewer GB of unpacked RPM's plus binary deltas, and signing the on-the-fly RPM. Yes, it will take more CPU, but I think linearly more CPU and not exponentially. Of course, it would have to be tried. The many GB of mirror has got to have many GB of redundancy in it.
The size of the updates is getting out of control; for those with limited bandwidth it becomes very difficult to stay up to date.