On Sat, Feb 7, 2015 at 3:44 PM, Stephen John Smoogen <smooge at gmail.com> wrote: > > > On 7 February 2015 at 08:12, Tim Verhoeven <tim.verhoeven.be at gmail.com> > wrote: >> >> Hi, >> >> I've been thinking a bit about this. The best solution IMHO besides >> building your our CDN, which is indeed a bit over the top for this, is >> to push these updates instead of working with a pull method. So would >> it be possible to find some mirrors that would allow us to push >> packages into our repo's on their servers. In case of releases that >> need to go out quickly we could use a seperate mirrorlist that only >> includes our servers and the mirrors that allows us to push to. So we >> can move the needed packages our quickly and let users get them fast. >> Later as the other mirrors sync up we just go back to the normal >> mirrorlist. >> >> Stupid idea or not? >> > > I don't think it is "stupid", but it is overly simplified. Just going off of > the EPEL checkins to mirrorlist there are at least 400k->600k active systems > which are going to be checking hourly for updates for an emergency update. > The number of mirrors who are going to allow a push system are going to have > to be large enough to deal with the thundering herd problem when an update > occurs and 500k systems checkin at 10 after the hour (seems like a common > time for boxes which check in hourly) all see there is a new update and > start pulling from it. There are approaches that could make it more effective. One of them is an inventory based update mechanism: A server side flag, available to clients, to report changes in the repository and allow clients to efficiently update by scanning that flag for new files and repodata information could be far more efficient for many sites. One of the subtler difficulties, and this is being ignored by the Fedora migrations to dnf, is the cost of the metadata updates. The repodata alone is over 500 MBytes for CentOS 7. This is *nsane* to keep transmitting for every micro-update or critical update. Scaled out across a bulky local cluster and simply running "yum check-ipdate" can saturate your bandwidth, and has done so for me. That's why I use local mirrors when possible. But then, hey, my local mirror has to pull these alerts *all the time*, which puts it in a constant state of churn for the repository information. It gets out of hand very quickly. The underlying solution to the bulky repodata is to *stop using monolithic repodata*. Switch to a much, much lighter weight repodata and stop trying to invent new, bulky, confusing features such as "Recommends" and concentrate on splitting it much like "apt" splits up its repositories. One package, one small header file, if the package updates update *that* header file instead of a monolithic database. I realize that's not going to happen right now: too much work as hbeen invested in yum and dnf as they exist to do this. But it's worth keeping in mind, it sets a half Gig transmisison cost to *any* repository updates of the main OS repositories.