[CentOS] yum vs up2date

Thu Sep 7 21:39:19 UTC 2006
Les Mikesell <lesmikesell at gmail.com>

On Thu, 2006-09-07 at 13:49 -0500, Johnny Hughes wrote:

> Yes ... but you didn't understand.  let me try again.
> 
> mirror.centos.org is something that we own.  There are 10 total servers.
> We pass out 1 address (because rrdns does not work correctly with
> python) ...

OK - so that's the real problem. I have always wondered why people
liked python...  Is this something that could be fixed?  Can we
shame them by pointing out that even IE does something fairly
intelligent when handed multiple IP addresses some of which don't
work?

>  that address is based on your IP and geoip relevant.
> So ... you get 1 (and only 1 ) address to connect to.

And that's the application's fault, not a technical requirement.

> If in the middle of you 100 packages, you have a glitch and loose your
> ability to connect to the server (it is overloaded, a router 3 hops down
> dies, etc. ... whatever) then that download fails and you drop out of
> yum.

Application problem again, but still not particularly fatal as long
as none of the packages from the run have been installed.  Just run
it again later.

> If you are using a mirrorlist, there are 9 other urls to try if that
> happens.

I mostly get 'metadata out of sync' errors when that happens.  It
hasn't been as pretty as you suggest.  And it often takes a 'yum
clean' to fix, which is even worse than just picking up where you
left off when your internet is working again.

> NEXT REASON:
> 
> 10 internal servers can not serve all the updates required ... 127
> mirrors can.  Those 127 mirrors are not named the same thing, nor are
> the paths the same.  We can't pass them out as mirror.centos.org ... the
> mirror operators have chosen different paths that work for them, etc.

You could pass out any IP's you want as mirror.centos.org.  It
might be inconvenient for the sites to map a standard URL into their
server layout so there might be some issues.  Anyone running apache
with named vhosts could do it if they wanted.

> What we have done it built a system that will test them, pass to you 10
> active, close and geoip relevant IPs in real time.  If a mirror is out
> of date, it goes away.

That would also be fine if it always gave the same list in the same
order when asked through the same proxy - if the client walks the list
in order.  Or if the URL to get this list was always the same and
it is marked cachable for some time.  There would be the possibility
of a bad, stale copy but somewhat offset by having several alternatives.

> For your situation, you can pick one mirror, use it in all your config
> files, then it will work perfectly for you in your cache.

People behind the proxies don't coordinate their choices of
distributions or repositories, let alone which mirror to use
of each repository.  Besides, didn't you point out the problem
of just using one IP already?  How can you suggest that as a
solution now?

> Unless 127 people want to donate machines to CentOS.org to put under our
> control, then we can name them all mirror.centos.org and control their
> paths.

You don't have to control a machine to give out it's IP in a DNS
response. You only have to arrange for it accept the name as a vhost
and map the document root to the top of your mirror tree. That still
may be a lot to ask but it is a very different question.

> Have you ever tried to develop a mirror system that can provide updates
> to 1.5 million clients? 

No, I deal with a few hundred machines spewing perhaps a hundred Mbs
to the internet and my servers are all on the same continent but even
at that scale the last thing I'd want to do is defeat anyone's local
caching scheme and I go out of my way to give multiple IPs for the
same URL for site redundancy instead of letting clients see different
URLs.

The nature of your product is such that the odds are good that vast
numbers of those boxes are located in some small number of places that
would only pull one copy of an update if you didn't go out of your way
to force each machine to get its own.

-- 
  Les Mikesell
   lesmikesell at gmail.com