[CentOS-mirror] Thoughts on DVD images

Mon Nov 8 09:00:26 UTC 2010
Peter Pöml <peter at poeml.de>


[resending, after realizing that I was subscribed with an old address]

On Fri, May 21, 2010 at 08:22:41AM -0700, Jim Kusznir wrote:
> I actually really like the idea of a push mirror.  I've always thought
> the polling rather inefficient, and can be problematic.  It also takes
> more time to stabilize the mirror tree, and results in a complete loss
> of control of bandwidth distribution at the msync mirrors, resulting
> in everyone getting slower speeds, and sync'ing against a different
> mirror each time (and as we saw, sometimes the mirrors' content isn't
> stable).  A push model would allow the msync servers to exercise more
> control over their bandwidth utilization, getting more "compete"
> mirrors out there more quickly.  It would also let them establish
> which public mirrors are getting pushed from which msync mirrors, in
> case some msync mirrors have a better connection to some sites (I2
> mirrors, for example).
> Of course, all this is focusing on major releases.  The push system is
> REALLY nice when 'minor updates' are released.  A push cycle kicks off
> to push out the few packages that need updating.  No more wasting
> bandwidth, etc. probing the mirror regularly.  And the smaller updates
> (which would generally not be bandwidth constrained) can get out there
> much faster, resulting in a more stable mirror tree.
> So, in short, I would like to see push mirroring for my mirror.

(Real) push syncing is a very powerful tool, which I have worked
extensively with openSUSE in the past. For extreme cases, it scales
better than anything else.

Recently, I set up a completely new mirror infrastructure for the
Document Foundation (http://www.documentfoundation.org/) which grew to
about 50 mirrors in just a few days. To about 10 of the mirrors, I can
push content directly. All I can say is that it is a blessing, for me as
content provider. 

It is not an option for every mirror, because of various site specific
restrictions that are in place here or there. But that is no problem,
because the fact that I can change content on even some mirrors is
already extremely helpful for the content provider. It allows me to do
things quickly that take hours of waiting otherwise, e.g. when moving
files around (which doesn't occur frequently, but it can happen).

All in all, I would summarize the advantages as this:

- allows for timely syncs without unnecessary delays
- controlling the order in which things are synced (e.g. rpms before
  metadata, or deletion of old metadata in second step)
- instant publication of staged content when I release it
- instant redirections to a mirror once a file arrived there
- instant stopping of redirections when I delete files from mirrors
- the possibility to corrent some things almost instantly, when
  something has gone wrong.

It also means that mirrors don't have to take care of setting up a
periodic sync, locking, and unnecessary syncs can be avoided.

For mirrors that are far away, where it takes long to get them uptodate,
it is good to start syncing timely (and not 4-8 hours later).

Again, it doesn't matter if this method is not used with all mirrors --
for me as content provider it helps a lot if just some mirrors can be
synced this way. The background is that a content provider is really
"helpless" when certain files are on _no_ mirror yet, because the own
bandwidth is limited. Having like 10 mirrors to sync instantly helps a
lot because it immediately allows me to redirect, keeping traffic for
essential things. And other mirrors quickly catch up. 

Push syncing is obviously very useful to prime the tier 1 mirrors.

Technically, where mirror admins were interested in pushing, granting
rsync write access has been acceptable to them, restricting access by IP
and/or password (with rsync over ssh as option, but seldomly used).

(A while ago, I started working on a framework to handle automate push
syncing, but it is making extremely slow progress. If someone would be
interested in working together I would be very happy.)

(And I keep telling myself that I ought to learn more about the way that
Debian handles this - I believe they simply cascade triggered pull
syncs, and I'm sure that also works well.)