On 01/03/17 14:44, Anssi Johansson wrote:
Hi Anssi,
Thanks a lot for all those ideas. While I understand your idea behind a TXT record at the DNS level, I'd say that I'm not a fan of that idea. Even with lower TTL, you'd be surprized how many hits we had on servers that were migrated to new ones, as it seems some ISPs aren't obeying the TTL and so were still serving wrong (and expired) A/AAAA records from their cache. DNS itself (currently for centos.org) isn't DNSSEC enabled too, so that would mean other protection, etc
I'd prefer your alternative with "let's host this behind a https web server" and also for the reason that it's easier to have TLS for webserver, and that from the automation point-of-view, it's easier for people allowed to build/sign/push (two people) to just update/drop a file somewhere, than using DNS modification. For dns, as the zone is actually under git/puppet control, that would mean *not* using that, but rather having a delegated zone that would allow nsupdate with a key that those people would share, etc ... So the simple file served from https seems easier from my side.
I'd like to get opinions from Johnny/KB (people able to sign/push) as they'd be directly concerned by that decision.
From an "external mirror admin" PoV, we should also use the centos-mirror list to discuss this, to get their opinions ?
Also, we can divide your proposal into two parts : - external mirrors can check a file they can compare against to sync "faster" than through their cron jobs (discussed above) - modifying completely the msync.centos.org network to have external mirrors not syncing from us, but betwen them (not sure how people feel about this)
PS : Anssi is now part of the mirrors managers team for CentOS , for people not yet aware of that fact
I just wanted to make a note that I have worked out a system which enables me to mirror all of the Fedora repository, which consists of about 12TB in I believe eleven million files, with a polling interval of ten minutes. A typical update (when there are changes to mirror) including the mirrormanager checkin takes about four minutes (most of it waiting for mirrormanager). A poll when there are no changes takes six seconds. The load on the server during a poll is rsync startup time and a handful of stat calls. A full tree traversal on the server is not required. (It may still be required on the client, but that's no worse than plain rsync.)
The software which handles this is at https://pagure.io/quick-fedora-mirror
It involves a server-side component (written in python2 with limited dependencies) to generate file lists in a useful format, and a client side component (currently written in zsh) which fetches the file lists, processes them, calls rsync with a list of changed files, and does a mirrormanager checkin. (The mirrormanager client is not required.)
A tiered setup (mirrors pulling from other mirrors) works fine; only the master mirror ever needs to generate the file lists. None of this limits the ability of clients to mirror in any other way.
Hardlinks are copied as hardlinks assuming that the file lists for all cross-linked rsync modules are regenerated at the same time, and when that doesn't happen, there's an included client-side hardlinker which uses the file list data to more quickly hardlink a repostory.
A form of exclude lists is supported in the client.
We're also working on a no-polling setup using the Fedora message bus, with mirrors automatically waking up and fetching when new content is pushed out.
While the default configurations, the above message bus stuff, and maybe the mirrormanager checkin are Fedora-specific, I do believe the software will work for any rsync server willing to run the file list generator. If any of this interests your project, please let me know.