[CentOS-devel] setting up an emergency update route

On 02/03/2015 02:03 PM, Fabian Arrotin wrote:
> On 03/02/15 14:38, Karanbir Singh wrote:
>> Hi,
> 
>> At the end of the Dojo in Brussels, I had the chance to field the 
>> question to our contributor audience : how can we get security
>> updates out to the user machines faster.
> 
>> At the moment, things are setup like any other distro or large
>> open source content network is : we rsync in stages, and external
>> mirrors pickup every 4 to 6 hours, some external mirrors pickup
>> from other external mirrors. Net result is that for a given update,
>> it can be upto 16 to 18 hours before we get a majority content sync
>> in front of most users.
> 
> So, we have to split the answer into two categories :
>  *  people using default yum repositories provided by
> mirrorlist.centos.org : I just verified on the node producing all
> those mirrorlists the needed time to crawl all external mirrors and
> validate their contents : It takes maximum 3 hours to crawl all
> external mirrors for
> {5,6,7}/{os,updates,extras,centosplus}/{i386,x86_64}. So in the worst
> case scenario, supposing that we dropped a new rpm/metadata just at
> the start of the crawler process, we'd have to wait 6 hours (so
> waiting for second run to finish validating and pushing new mirrorlists)

this is already included in my 16 to 18 hrs estimate on how long it
takes to get majority sanity on the mirror network.

> 
>  * people not using default yum repositories : nothing we can directly
> do, as we don't control the repo they are using

people who know what they are doing will do whatever they want, and are
not really impacted by changes we make for the default setup.
Communication and promotion of those updates will likely be the only
real value we can add for them.

> 
> 
>> In cases like the recent Glibc issue, 18 hrs can be a long time
>> since the release ( remember, we already lag RHEL releases since
>> our process starts once theirs ends ).
> 
>> There were a couple of ideas that came up in the conversation at
>> the Dojo, and then in the following conversations over the entire
>> Fosdem weekend. The two that seemed most likely, easiest to
>> implement and perhaps most robust, involved a chunk of the load
>> moving to mirror.centos.org for some period of time. These are :
> 
>> A) we setup a rapid update repo, that would be hosted on and run
>> from mirror.centos.org exclusively. The yum repo definitions would
>> still point at mirrorlist, however they would only expect
>> centos.org urls in the baseurl stack from mirrorlist.centos.org;
>> This would allow us to reduce the overall to-user-visibility in
>> default centos linux installs to under an hour for content upto
>> 250MB in size.
> 
>> B) integrate the mirrorlist backend with the release mechanism in
>> centos linux, so when there is a new updates pushed, all updates
>> are then delivered via mirror.centos.org for the next 24 hrs. After
>> this period, traffic reshapes to be delivered from the external
>> mirrors by default.
> 
>> The Key issue to note with (A) is that while we might push
>> something to this rapid update repo, the same content will also be
>> available in the regular updates/ repo. So once its starts showing
>> up externally, traffic will naturally switch to using the updates
>> repo from local mirrors ( using repo names and cost etc, we can
>> influence repo priority where there is common content ).
> 
> So, let me add then something between (A) and (B) : as we control also
> the node producing the mirrorlists, why not having a parallel job on
> the same host, just crawling "on demand" the updates repo for a
> specific release when we know that we have to release "critical"
> updates. We'd then be able to validate directly in loop which mirrors
> would be validated for that specific package/repodata and so not
> having to wait multiple hours. At the same time, we can add
> mirror.centos.org in the mix, but already validated by default.

We can try that, it would reduce the time-to-check, but will have no
impact on the sync rates for getting content out. So at best we'd be
shaving off a few hours from the overall run, for a large chunk of
compexity in the mirrorlist layers.

Unless we can find a huge hole with A, it seems simplest and easily
executed without any code changes.

> Reason why I'd not specifically like to see only mirror.centos.org
> nodes being used is that from time to time, we also lost some of those
> nodes, just because monthly quota was then used, and or NOC team
> thinking that a DoS was happening :-)

these are  largely an automation and monitoring problem - both of which
can be improved. If a machine is down, our dns should not be handing out
that machine's IP on mirror.centos.org questions.

-- 
Karanbir Singh
+44-207-0999389 | http://www.karan.org/ | twitter.com/kbsingh
GnuPG Key : http://www.karan.org/publickey.asc