[CentOS-mirror] Mirrormanager (now for real)

Mon Oct 10 04:48:56 UTC 2011
Matt Domsch <Matt_Domsch at dell.com>

On Fri, Oct 07, 2011 at 03:55:30PM -0500, Ralph Angenendt wrote:
> Hey,
> long time no infrastructure changes (well, on the CentOS side, quite a
> few on my side).
> As several people have offered to help moving our selfbotched system to
> mirrormanager (sorry Peter, but there were some offers - I am still not
> sure which one really is better technically, but the more helping hands,
> the better), I'd like to start this now.

Excellent.  I'm happy to help in any way I can.
> First question: I guess as a first move we need a machine to host that on.
> Can anyone running a mirrormanager instance tell me, what kind of specs
> that machine needs to have?

There are three primary applications: 1) the web pages for mirror admins to
login and change their mirror data, which on Fedora is using ~160MB RAM;
2) various cronjobs (~50MB); 3) the mirrorlist request handler
(125MB).  1 and 3 can run on one, or many, machines. 2) by the nature
of the jobs only runs on a single system.

MM maintains a local cache in /var/lib/mirrormanager.  Fedora's copy
thereof is 28MB.

> Does it need to hold a copy of the mirror?

Machine 2 above needs a copy of the mirror, yes, either local or
NFS-mounted.  Some aspects, like update-master-directory-list and the
crawler, can simply grab an rsync listing from another system, rather
than have the full mirror be mounted; however, the metalink generator
does need to be able to read files, so having a full mirror nearby
would be good.  Fedora has the directory tree on an NFS-mountable
volume, which the MM cronjob server mounts read-only.

> On the software side, I guess httpd and mysql-server (and mirrormanager)
> - anything else? Or is the sqlite variant fast enough for the amount of
> mirrors we have?

Because there are several actors updating the database simultaneously
(particularly the update-master-directory-list cronjob and the
crawlers), it's preferred to use postgres or mysql rather than
sqlite.  I haven't tried running with sqlite in production.  MM uses
TurboGears, which uses SQLObject currently (soonish SQLAlchemy) so you
are free to choose whichever database backend you want.
> At the moment we run quite a few instances of mirrorlist.centos.org -
> the machines which hand out the urllist to machines - is that possible
> with mirrormanager, too? Or will one machine be able to handle the load?

As Adrian noted, this is not only possible, but recommended.  Fedora
has ~8 servers handing out the mirrorlist by request, and they don't
even blink at the load.  In general, each mirrorlist request is
answered within 0.3s, of which <0.1s is spent in the mirrorlist code,
the rest is just setting up TCP connections and getting through the
load balancers.

I would also recommend writing a script to convert your existing
database information into MM, so as to not make people re-enter it.

You'll want to figure out how you want to handle updating mirror info
- if you're going to centralize that as you do today, or if you want
each mirror to be able to edit their own information (the Fedora
model).  MM can work in either mode.  If mirror admins can do it
themselves, you'll need some form of account system, either local to
MM (which is present), or something that the TurboGears framework can
hook into (such as the Fedora Acccount System).


Matt Domsch
Technology Strategist
Dell | Office of the CTO