[CentOS-mirror] mirrormanager: Database conversion

Mon Nov 7 05:40:49 UTC 2011
Matt Domsch <Matt_Domsch at dell.com>

On Sat, Nov 05, 2011 at 04:51:43PM -0500, Ralph Angenendt wrote:
> Hi,
> 
> while I am still struggling with that host (well, small technicialities,
> nothing major), I'd like to start a discussion on how to convert our
> mirror table to the layout which is needed for mirrormanager.
> 
> You can find a description of the table at
> http://oerks.de/~ralph/mirrordb.txt - that probably is easier to look at
> than trying to get out a version via mail which doesn't break for
> everyone :)
> 
> Let me explain that table (and let me explain the fields we probably
> still need and which we don't need anymore, afaics).
> 
> Name: Is our primary key - so every mirror has to have a unique name in
> our DB

MM has a "Site" that matches your URL to the Sponsor's website.  Each
Site has a list of mirror Hosts.  Each Site's name must be unique, and
Within each Site, the Hosts names must be unique.

> location-major: That's the continent
> locmajidx: numerical representation of the continent
> location-minor: Country (or in the case of US and Canada: State)

MM lets GeoIP handle this for us, to the country level. I haven't yet
added State-level - oftentimes it really doesn't match up with network
topology enough to care.

> http,ftp,rsync: The URLs the mirror is at

HostCategoryURL, two types: public (default), and private for other
downstream mirrors to use.  Not sure these are actually used.


> speed: Used for the representation on the mirrorlist at www.centos.org.
> Mostly T1 anyway, not needed anymore, I guess.

Nope.

> bandwidth: Actual bandwidth. Not needed.

MM does need this, an integer value in Mbps (100 = 100Mbps uplink).  Host.bandwidth_int.

> status: set by mirror-status (at least Dead, Disabled is for manual
> intervention)

Each Site and each Host have two flag bigs: admin_active, and
user_active.  admin_active lets the MM database admin kill a mirror off quickly;
user_active lets the user do this for themselves, particularly in
preparation for a long outage.

> state: more detailed state

?

> contact-name: Name of the person running the mirror. Internal use for us.
> contact-tel: I cannot remember calling a mirroradmin :)
> contact-email: Our second unique field. I guess that will be used for login

MM only knows about a user account name we list as the mirror admin.
In the Fedora world, this is the FAS account name.  In RPMFusion, I
expect it's a local database built into TurboGears.  Pretty thin on
info though, could add these other fields if we need to.

> comments: Free form, normally the request mail sent to the list. Nice to
> have, but not needed.
> access*: Not used
> Type: We only have direct mirrors.
> restructured: That must have happened before 2006 :)
> centostext: What to add to the mirror URLs (so mostly unused)

MM has content Categories (Fedora Linux, Fedora EPEL, and historical
categories).  Each Host has one or more Categories = HostCategory.
Each HostCategory has one or more HostCategoryURLs.  The Categories
can be rooted at any arbitrary URL, but from the top of the Category
on down, a mirror has to maintain the upstream master directory
structure.

You'll want to think about how you structure your content into
Categories.  It works best if a Category is a distinct subtree, not
overlapping with other Categories.

> url: The URL to the sponsor's website

Site.orgUrl

> info_note, notes_private,infoblock,graphic_url: Not used.
> centos*: Which versions does the mirror carry?
> arch_all: Yes, if not, then:
> arches: Free form - only used for the mirror list on www

This is detected dynamically by MM, and exposed in the publiclist
chooser.


> dvd-iso: Does it carry them (always yes since 6)
> dvd*: The versions (6 is set to yes always)
> dvd-iso-host,rsync-dvd-host: No idea. Not used

Again, dynamically detected.

> cc: The TLD the mirror is in. Actually used for generating
> mirrorlists.txt for that country

Host.country

> continent: Used for the mirrorlist on www

Not used.  MM uses GeoIP, and augments its mapping of countries to
continents with a CountryContinentRedirect.  Little used, but it maps
say Israel to Europe instead of Asia, because it has better network
connectivity to Europe.

> centos_code,priority: Not used
> use-in-mirror-list: Used: We don't really put 10Mbit-machines in EU or
> US or CA into the mirrorlist.txt which is handed out via yum

Ah.  We do, but they get listed at the top of mirrorlist.txt 1/100 as
often as a 1Gb mirror would.  That's the weighted random sample based
on bandwidth.

 
> I guess we can drop many of those when going over to mirrormanager. But:
> What I don't see on the Fedora pages is a list of all the mirrors (by
> country/continent/whatever) - I know that this is one thing we actually
> do need and want.

I don't have the breakdown by continent in /publiclist.  Wouldn't be
hard, but I hate mucking with that page - it took some major CSS
hacking to get it as readable as it is. :-)

> Anything I actually overlooked?

Do you have private mirrors in your database now?  That maps to
Site.private and Host.private.

Host.internet2  if a host is on Internet2 or related high-speed
educational/research network. We can look that up in MM's private copy
of the Internet2 route tables if needed - that's how I populated the
field the first time for Fedora too.

Host.internet2_clients  if a host on I2, even if private, should be
listed for other I2 clients in the same country.  By default set it
false, let mirror admins update it themselves.

Host.asn = AS Number
Host.asn_clients if a host should serve the whole ASN regardless of
netblocks set.  Lets mirrors in places with many netblocks, but a
single ASN, get away with a single value here.  Again, we can look
this up in our private copy of the worldwide routing table.

Host.countries_allowed = list of countries allowed.  e.g. a mirror in
.il may want to only serve users in .il.

Host.netblocks is a list of netblocks that Host should be primary
mirror for.  This is required for private mirrors.

Host.acl_ips = list of IP addresses or hostnames that will get put
into the /rsync_acl list.  Other mirrors may wget /rsync_acl to get
that list, and use it in their own rsyncd.conf files.  Only real
problem with it is anyone could sign up to be a private mirror, fill
this in, and then get early access to a pre-bitflip mirror via the
acl.  Oh well...

I think that'll be enough to get going though.

Be thinking about categories.  At a glance, I think a single Category
"CentOS" would be fine.  You could in theory do two Categories
"CentOS" and "CentOS ISOs" and rig up update-master-directory-list to
ignore /isos in your "CentOS" Category, and ignore everything but
/isos in the "CentOS ISOs" Category.  but I don't think that will
buy you much, and it buys exactly nothing with C6 and newer.

-- 
Matt Domsch
Technology Strategist
Dell | Office of the CTO