[CentOS-mirror] [CentOS-devel] Getting content to mirrors faster

Wed Mar 1 16:36:30 UTC 2017
Jason L Tibbitts III <tibbs at math.uh.edu>

I just wanted to make a note that I have worked out a system which
enables me to mirror all of the Fedora repository, which consists of
about 12TB in I believe eleven million files, with a polling interval of
ten minutes.  A typical update (when there are changes to mirror)
including the mirrormanager checkin takes about four minutes (most of it
waiting for mirrormanager).  A poll when there are no changes takes six
seconds.  The load on the server during a poll is rsync startup time and
a handful of stat calls.  A full tree traversal on the server is not
required.  (It may still be required on the client, but that's no worse
than plain rsync.)

The software which handles this is at
https://pagure.io/quick-fedora-mirror

It involves a server-side component (written in python2 with limited
dependencies) to generate file lists in a useful format, and a client
side component (currently written in zsh) which fetches the file lists,
processes them, calls rsync with a list of changed files, and does a
mirrormanager checkin.  (The mirrormanager client is not required.)

A tiered setup (mirrors pulling from other mirrors) works fine; only the
master mirror ever needs to generate the file lists.  None of this
limits the ability of clients to mirror in any other way.

Hardlinks are copied as hardlinks assuming that the file lists for all
cross-linked rsync modules are regenerated at the same time, and when
that doesn't happen, there's an included client-side hardlinker which
uses the file list data to more quickly hardlink a repostory.

A form of exclude lists is supported in the client.

We're also working on a no-polling setup using the Fedora message bus,
with mirrors automatically waking up and fetching when new content is
pushed out.

While the default configurations, the above message bus stuff, and maybe
the mirrormanager checkin are Fedora-specific, I do believe the software
will work for any rsync server willing to run the file list generator.
If any of this interests your project, please let me know.
-- 
 Jason L Tibbitts III - tibbs at math.uh.edu - 713/743-3486 - 660PGH
 System Manager:  University of Houston Department of Mathematics