[CentOS-devel] mirrorlist code updated / SIGs and AltArch support rolled-in

Mon Sep 24 08:45:36 UTC 2018
Fabian Arrotin <arrfab at centos.org>


Recently I had to update the existing code running behind
mirrorlist.centos.org (the service that returns you a list of validated
mirrors for yum, see the /etc/yum.repos.d/CentOS*.repo file) as it was
still using the Maxmind GeoIP Legacy country database. As you can
probably know, Maxmind announced that they're discontinuing the Legacy
DB, so that was one reason to update the code. Switching to GeoLite2 ,
with python2-geoip2 package was really easy to do and so was done
already and pushed last month.

But that's when I discussed with Anssi (if you don't know him, he's
maintaining the CentOS external mirrors DB up2date, including through
the centos-mirror list ) that we thought about not only doing that
change there, but in the whole chain (so on our "mirror crawler" node,
and also for the isoredirect.centos.org service), and random chat like
these are good because suddenly we don't only want to "fix" one thing,
but also take time on enhancing it and so adding more new features.

The previous code was already supporting both IPv4 and IPv6, but it was
consuming different data sources (as external mirrors were validated
differently for ipv4 vs ipv6 connnectivity). So the first thing was to
rewrite/combine the new code on the "mirror crawler" process for
dual-stack tests, and also reflect that change o nthe frontend (aka
mirrorlist.centos.org) nodes.

While we were working on this, Anssi proposed to also not adapt the
isoredirect.centos.org code, but convert it in the same python format as
the mirrorlist.centos.org, which he did.

Last big change also that was added is the following : only some
repositories/architectures were checked/validated in the past but not
all the other ones (so nothing from the SIGs and nothing from AltArch,
so no mirrorlist support for i386/armhfp/aarch64/ppc64/ppc64le).

While it wasn't a real problem in the past when we launched the SIGs
concept, and that we added after that the other architectures (AltArch),
we suddenly started suffering from some side-effects :

 * More and more users "using" RPM content from mirror.centos.org
(mainly through SIGs - which is a good indicator that those are
successful, which is a good "problem to solve")
 * We are currently losing some nodes in that mirror.centos.org network
(it's still entirely based on free dedicated servers donated to the project)

To address first point, offloading more content to the 600+ external
mirrors we have right now would be really good, as those nodes have
better connectivity than we do, and with more presence around the globe
too, so slowly pointing SIGs and AltArch to those external mirrors will

The other good point is that , as we switched to the GeoLite2 City DB,
it gives us more granularity and also for example, instead of "just"
returning you a list of 10 validated mirrors for USA (if your request
was identified as coming from that country of course), you now get a
list of validated mirrors in your state/region instead. That means that
then for such big countries having a lot of mirrors, we also better
distribute the load amongst all of those, which is a big win for
everybody - users and mirrors admins - )

For people interested in the code, you'll see that we just run several
instances of the python code, behind Apache running with
mod_proxy_balancer. That means that if we just need to increase the
number of "instances", it's easy to do but so far it's running great
with 5 running instances per node (and we have 4 nodes behind
mirrorlist.centos.org). Worth noting that on average, each of those
nodes gets 36+ millions requests per week for the mirrorlist service (so
144+ millions in total per week)

So in (very) short summary :

mirrorlist.centos.org code now supports SIGs/AltArch repositories (we'll
sync with SIGs to update their .repo file to use mirrorlist= instead of
baseurl= soon)
we have better accuracy for large countries, so we redirect you to a
'closer' validated mirror


So that means that now the following combination are possible through

testing base os for aarch64 :
curl 'http://mirrorlist.centos.org/?release=7&arch=aarch64&repo=os'
testing RDO Rocky release for ppc64le :

And so on .... :)

There is no "pressure" to update your -release pkg to switch from
baseurl=mirror.centos.org to mirrorlist=mirrorlist.centos.org but I just
wanted to inform that it's now "live" and you can start thinking about
that change, and why not with an update pkg just pushed to -testing (aka
buildlogs.centos.org) and then move to that later ?


Fabian Arrotin
The CentOS Project | https://www.centos.org
gpg key: 56BEC54E | twitter: @arrfab

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/centos-devel/attachments/20180924/f666fe4e/attachment-0007.sig>