[CentOS-devel] setting up an emergency update route

On Sat, Feb 7, 2015 at 3:44 PM, Stephen John Smoogen <smooge at gmail.com> wrote:
>
>
> On 7 February 2015 at 08:12, Tim Verhoeven <tim.verhoeven.be at gmail.com>
> wrote:
>>
>> Hi,
>>
>> I've been thinking a bit about this. The best solution IMHO besides
>> building your our CDN, which is indeed a bit over the top for this, is
>> to push these updates instead of working with a pull method. So would
>> it be possible to find some mirrors that would allow us to push
>> packages into our repo's on their servers. In case of releases that
>> need to go out quickly we could use a seperate mirrorlist that only
>> includes our servers and the mirrors that allows us to push to. So we
>> can move the needed packages our quickly and let users get them fast.
>> Later as the other mirrors sync up we just go back to the normal
>> mirrorlist.
>>
>> Stupid idea or not?
>>
>
> I don't think it is "stupid", but it is overly simplified. Just going off of
> the EPEL checkins to mirrorlist there are at least 400k->600k active systems
> which are going to be checking hourly for updates for an emergency update.
> The number of mirrors who are going to allow a push system are going to have
> to be large enough to deal with the thundering herd problem when an update
> occurs and 500k systems checkin at 10 after the hour (seems like a common
> time for boxes which check in hourly) all see there is a new update and
> start pulling from it.

There are approaches that could make it more effective. One of them is
an inventory based update mechanism: A server side flag, available to
clients, to report changes in the repository and allow clients to
efficiently update by scanning that flag for new files and repodata
information could be far more efficient for many sites.

One of the subtler difficulties, and this is being ignored by the
Fedora migrations to dnf, is the cost of the metadata updates. The
repodata alone is over 500 MBytes for CentOS 7. This is *nsane* to
keep transmitting for every micro-update or critical update. Scaled
out across a bulky local cluster and simply running "yum check-ipdate"
can saturate your bandwidth, and has done so for me. That's why I use
local mirrors when possible. But then, hey, my local mirror has to
pull these alerts *all the time*, which puts it in a constant state of
churn for the repository information. It gets out of hand very
quickly.

The underlying solution to the bulky repodata is to *stop using
monolithic repodata*. Switch to a much, much lighter weight repodata
and stop trying to invent new, bulky, confusing features such as
"Recommends" and concentrate on splitting it much like "apt" splits up
its repositories. One package, one small header file, if the package
updates update *that* header file instead of a monolithic database.

I realize that's not going to happen right now: too much work as hbeen
invested in yum and dnf as they exist to do this. But it's worth
keeping in mind, it sets a half Gig transmisison cost to *any*
repository updates of the main OS repositories.