On Friday 09 September 2005 20:19, Mike McCarty wrote:
Lamar Owen wrote:
solve this problem. Are you going to tell the community that this is an unsolvable problem?
No, I am not. Already, several people, myself included, have given a way for you to accomplish what you seem to want.
Well, first, Les and I are not the same person, and what Les wants and what I'd like to see are two different but related things. I believe that incremental updates (rpm-deltas) are desireable from a bandwidth and storage point of view, and highly desireable from a user point of view. They do present issues for repository operators and packagers, this is true.
But then Johnny mentions that the mirroring load is 50GB for the tree. This is a lot of data to move around, really.
Now, bandwidth doesn't scare me; we have one research project here that will be collecting 12TB of data per day (if it captures a full day at a time; currently not possible, but desirable). (The project involves a phased array, with the raw data being stored and rephased after collection; this is like being able to repoint a dish to an observation in the past for conventional radio telescopes). This would require 2/5ths of an OC-48 to mirror; doable, yes, but not desireable or affordable. Drive space doesn't scare me (except cost; got a quote on a petabyte-class storage array (it was 1.4PB and cost upwards of $3 million). CPU horsepower doesn't scare me, either, as I'm getting a MAPstation as part of a different research project (now this box has a interesting interconnect called SNAP that scales to 64 MAP DEL processors and 32 host P4's on a crossbar type switch; you can do the research on google too). The MAPstation runs on Linux, FWIW. For the application (cross-correlation of interferometry data, 2 frequencies, 2 polarizations, and 2 antennas) a MAP processor will have the equivalent power of an 800GHz P4, but be clocked at only 200MHz due to the massively paralleled pipelining available with this kind of direct-execution-logic (non-Von Neumann) processor. But all of that is irrelevant.
What is relevant is that I have seen the end user's response to having to download multiple megabytes for a hundred byte or less change. While it doesn't bother me, it did bother my users (speaking of the PostgreSQL users I built and released packages for).
So the enduser potentially could reap the best benefit of a rpmdelta system. SuSE is or has been doing rpmdeltas for a year now, and I seem to recall that the results were pretty good.
Les wanted similar to CVS functionality where you can tag a repository as consistent at a certain branch (not necessarily by date, as you mentioned), and be able to consistently grab a set of packages.
I mentioned CVS worked on a diff principle, and that that might be an interesting way of doing it (all the while thinking about my PostgreSQL users). Maybe I confused the two issues; possible.
The dumb client glorified webserver type system will be very difficult to make work this we, this is true. But who says we have to stick to a glorified wget? But the key question is, cost-benefit analysis-wise, is it worth the effort (both development and execution)? Maybe it is, maybe it isn't. But I do believe it is worth a try, if only to help the enduser (which could be a small server at a parish library, for instance.... :-)).