[CentOS] Why is yum not liked by some?
lowen at pari.edu
Fri Sep 9 20:20:05 UTC 2005
On Friday 09 September 2005 14:18, Bryan J. Smith wrote:
> I don't think you're realizing what you're suggesting.
Yes, I do. I've suggested something like this before, and there has been some
work on it (see Fedora lists archives from nearly a year or more ago).
> Who is going to handle the load of the delta assembly?
The update generation process. Instead of building just an RPM, the
buildsystem builds the delta package to push to the package server.
> But the on-line, real-time, end-user assembly during
> "check-out" is going to turn even a high-end server into a
> big-@$$ door-stop (because it's not able to do much else)
> with just a few users checking things out!
Do benchmarks on a working system of this type, then come back to me about the
unbearable server load.
> Do you understand
Do you understand how annoyingly arrogant you sound? I am not a child, Bryan.
> Not true! Not true at all! You're talking GBs of
> transactions _per_user_.
I fail to see how a small update of a few files (none of which approach 1GB in
size!) can produce multiple GB's of transactions per user. You seem to not
understand how simple this system could be, nor do you seem willing to even
try to understand it past your own preconceived notions.
> You're going to introduce:
> - Massive overhead
In your opinion.
> - Greatly increased "resolution time" (even before
> considering the server responsiveness)
> - Many other issues that will make it "unusable" from the
> standpoint of end-users
All in your opinion.
> You can_not_ do this on an Internet server. At most, you can
> do it locally with NFS with GbE connections so the clients
> themselves off-load a lot of the overhead. That's not
> feasible over the Internet, so that falls back on the
> Internet server.
How in the world would sending an RPM down the 'net built from a delta use
more bandwidth than sending that same file as is sent now? Being that HTTP
is probably the transport for EITHER.
> As I mentioned before, not my Internet server! ;->
That is your choice, and your opinion.
> In any case, it's a crapload more overhead than merely
> serving out files via HTTP. You're going to reduce your
> ability to service users by an order of magntitude, if not 2!
Have you even bothered to analyze this in an orderly fashion, instead of
flying off the handle like Chicken Little? Calm down, Bryan.
> > With forethought to those things that can be
> > prebuilt versus those things that have to be generated
> > realtime, the amount of realtime generation can be
> > minimized, I think.
> That's the key right there -- you think.
Against your opinion, because neither of us has empirical data on this.
> Again, keep in mind that repositories merely serve out files
> via HTTP today. Now you're adding in 10-100x the overhead.
> You're sending data back and forth, back and forth, back and
> forth, between the I/O, memory, CPU, etc... Just 1 single
> operation is going to choke most servers that can service
> 10-100 HTTP users.
And this is balanced to the existing rsync-driven mirroring that is doing
multiple gigabytes worth of traffic. If the size of the files being rsync'd
is reduced by a sufficient percentage, wouldn't that lighten that portion of
the load? Have you worked the numbers for a balance? I know that if I were
contracting with you on any of my upcoming multi-terabyte-per-day radio
astronomy research projects, and you started talking to me this way, you'd be
looking for another client.
> No, you're talking about facilities that go beyond what rsync
> does. You're not just doing simple file differences between
> one system and another. You're talking about _multiple_
> steps through _multiple_ deltas and lineage.
If you have, say, ten updates. You apply the ten update deltas in sequence
and send it down the pike. Is applying a delta to a binary file that is a
few kilobytes in length that stressful? What single binary in a typical
CentOS installation is over a few megs?
> There's a huge difference between traversing extensive delta
> files and just an rsync delta between existing copies. ;->
Yes, there is. The rsync delta is bidirectional traffic.
> I only referenced CVS because someone else made the analogy.
You not even paying enough attention to know who said what; why should I
listen to a rant about something you have no empirical data to back?
I made an analogy to CVS, and I really think things could be made more
bandwidth and storage efficient for mirrors, master repositories, and
endusers without imposing an undue CPU load at the mirror. Feel free to
disagree with me, but at least keep it civil, and without insulting my
> So yes, I know CVS stores binaries whole.
> That aside, the XDelta is _still_ going to cause a sizeable
> amount of overhead.
How much? Why not try it (read the Fedora lists archives for some folks who
have indeed tried it).
> Far more than Rsync.
> I know. I was already thinking ahead, but since the original
> poster doesn't even understand how delta'ing works, I didn't
> want to burden him with further understanding.
Oh, just get off the arrogance here, please. You are not the only genius out
here, and you don't have anything to prove with me. I am not impressed with
resumes, or even with an IEEE e-mail address. Good attitude beats brilliance
any day of the week.
> > CVS per se would be horribly inefficient for this purpose.
> Delta'ing _period_ is horribly inefficient for this purpose.
> In fact, storing the revisions whole would actually be
> _faster_ than reverse deltas of _huge_ binary files.
But then there's still the many GB for the mirror. There are only two reasons
to do deltas, in my opinion:
1.) Reduce mirror storage space.
2.) Reduce bandwidth required to mirror, and/or reduce bandwidth to the
enduser (which I didn't address in this, but could be addressed, even though
it is far more complicated to send deltas straight to the user).
> You're talking about cpio operations _en_masse_ on a server!
> Have you ever done just a few smbtar operations from a server
> before? Do you _know_ what happens to your I/O?
> _That's_ what I'm talking about.
It once again depends on the process used. A streaming process could be used
that would not impact I/O as badly as you state (although you first said it
would kill my CPU, not my I/O). But tests and development would have to be
Again, the tradeoff is between the storage and bandwidth required at the
mirrors to processing. Of course, if the mirror server is only going to
serve http, doing it in the client isn't good.
> > Store prebuilt headers if needed.
> As far as I'm concerned, that's the _only_ thing you should
> _ever_ delta. I don't relish the idea of a repository of
> delta'd cpio archives. It's just ludicrious to me -- and
> even more so over the Internet.
So you think I'm stupid for suggesting it. (That's how it comes across). Ok,
I can deal with that.
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC 28772
More information about the CentOS