[CentOS-mirror] centos mirror @ seas.harvard.edu

Fri Apr 22 05:00:28 UTC 2016
Chuck Anderson <cra at WPI.EDU>

On Fri, Apr 22, 2016 at 12:31:31AM -0400, Chuck Anderson wrote:
> On Fri, Apr 22, 2016 at 04:04:17AM +0300, Anssi Johansson wrote:
> > 22.4.2016, 0.48, Chuck Anderson kirjoitti:
> > >>You could also try without --delay-updates, which also triggers this
> > >>requirement to know the full file list in advance.
> > >
> > >But not using --delay-updates means that the yum repo could be in an
> > >inconsistent/nonfuctional state until the sync finishes.  That isn't
> > >good for a public mirror.
> > 
> > If also using --delete-delay, the files to be deleted will be
> > deleted only at the end of sync, reducing the chances of anything
> > breaking. Although I see your point, I believe the base repositories
> > (os, updates, extras, fasttrack etc) would end up getting synced in
> > such a sequence that yum won't get confused. The repodata directory
> > tends to get synced last, and it is not harmful if there are .rpm
> > files on the mirror that are not yet referenced in the repodata.
> 
> "tends to get synced last" isn't something to rely on.
> 
> Instead, you could sync everything except the repodata, then do a 2nd
> sync of just the repodata.

I should correct my statement above.  The first sync should sync
everything except the repodata/ directories, and not do any --delete.
The second sync should then do --delay-updates on the full tree.
There was a long discussion about this on the Red Hat mirror list in
2009.  It is probably just easier to always use --delay-updates
--delete-delay so you dont have to get things "just right" with the
manual methods.  You might also need to increase --timeout if you are
having problems with timeouts...y

Some excerpts from the 2009 discussion:

> rsync pulls files in sort order, so repodata comes before many
> packages. If you pull fast the time interval between repodata and all
> the following is short and the probability of mismatch is small. But
> if it takes longer, or there's a lot yet to pull after repodata, it
> may become a problem. Given the number of client updates, even a small
> fraction of misses becomes a big number over time, and users will
> complain.

> > A way around --delay-updates is to have a multi-pass rsync which first
> > transfers rpms only w/o --delete*, then transfers everything w/o
> > --delete* (repodata including rpms to avoid any racing between data
> > and metadata) and finally does a full rsync w/ --delete* options
> > (again full for avoiding racing problems). That's the way I used it
> > before rsync (on my mirror) had the delay options.

> A 2-pass is enough, just use delay-updates in the second one. It's
> much smaller so won't be a big hit and will be short enough to
> minimize incoherences.