Hi,
while I am still struggling with that host (well, small technicialities, nothing major), I'd like to start a discussion on how to convert our mirror table to the layout which is needed for mirrormanager.
You can find a description of the table at http://oerks.de/~ralph/mirrordb.txt - that probably is easier to look at than trying to get out a version via mail which doesn't break for everyone :)
Let me explain that table (and let me explain the fields we probably still need and which we don't need anymore, afaics).
Name: Is our primary key - so every mirror has to have a unique name in our DB location-major: That's the continent locmajidx: numerical representation of the continent location-minor: Country (or in the case of US and Canada: State) http,ftp,rsync: The URLs the mirror is at speed: Used for the representation on the mirrorlist at www.centos.org. Mostly T1 anyway, not needed anymore, I guess. bandwidth: Actual bandwidth. Not needed. status: set by mirror-status (at least Dead, Disabled is for manual intervention) state: more detailed state contact-name: Name of the person running the mirror. Internal use for us. contact-tel: I cannot remember calling a mirroradmin :) contact-email: Our second unique field. I guess that will be used for login comments: Free form, normally the request mail sent to the list. Nice to have, but not needed. access*: Not used Type: We only have direct mirrors. restructured: That must have happened before 2006 :) centostext: What to add to the mirror URLs (so mostly unused) url: The URL to the sponsor's website info_note, notes_private,infoblock,graphic_url: Not used. centos*: Which versions does the mirror carry? arch_all: Yes, if not, then: arches: Free form - only used for the mirror list on www dvd-iso: Does it carry them (always yes since 6) dvd*: The versions (6 is set to yes always) dvd-iso-host,rsync-dvd-host: No idea. Not used cc: The TLD the mirror is in. Actually used for generating mirrorlists.txt for that country continent: Used for the mirrorlist on www centos_code,priority: Not used use-in-mirror-list: Used: We don't really put 10Mbit-machines in EU or US or CA into the mirrorlist.txt which is handed out via yum
I guess we can drop many of those when going over to mirrormanager. But: What I don't see on the Fedora pages is a list of all the mirrors (by country/continent/whatever) - I know that this is one thing we actually do need and want.
All the other location data probably aren't needed anymore, afaics.
Anything I actually overlooked?
Ralph
On Sat, Nov 05, 2011 at 10:51:43PM +0100, Ralph Angenendt wrote:
What I don't see on the Fedora pages is a list of all the mirrors (by country/continent/whatever) - I know that this is one thing we actually do need and want.
Have you seen this:
http://mirrors.fedoraproject.org/publiclist
Adrian
On 05.11.2011 23:11, Adrian Reber wrote:
On Sat, Nov 05, 2011 at 10:51:43PM +0100, Ralph Angenendt wrote:
What I don't see on the Fedora pages is a list of all the mirrors (by country/continent/whatever) - I know that this is one thing we actually do need and want.
Have you seen this:
As I had to ask, I guess not :)
Thanks for the info, that actually is great if we don't have to integrate it into our "CMS" anymore.
Cheers,
Ralph
On Sat, Nov 05, 2011 at 04:51:43PM -0500, Ralph Angenendt wrote:
Hi,
while I am still struggling with that host (well, small technicialities, nothing major), I'd like to start a discussion on how to convert our mirror table to the layout which is needed for mirrormanager.
You can find a description of the table at http://oerks.de/~ralph/mirrordb.txt - that probably is easier to look at than trying to get out a version via mail which doesn't break for everyone :)
Let me explain that table (and let me explain the fields we probably still need and which we don't need anymore, afaics).
Name: Is our primary key - so every mirror has to have a unique name in our DB
MM has a "Site" that matches your URL to the Sponsor's website. Each Site has a list of mirror Hosts. Each Site's name must be unique, and Within each Site, the Hosts names must be unique.
location-major: That's the continent locmajidx: numerical representation of the continent location-minor: Country (or in the case of US and Canada: State)
MM lets GeoIP handle this for us, to the country level. I haven't yet added State-level - oftentimes it really doesn't match up with network topology enough to care.
http,ftp,rsync: The URLs the mirror is at
HostCategoryURL, two types: public (default), and private for other downstream mirrors to use. Not sure these are actually used.
speed: Used for the representation on the mirrorlist at www.centos.org. Mostly T1 anyway, not needed anymore, I guess.
Nope.
bandwidth: Actual bandwidth. Not needed.
MM does need this, an integer value in Mbps (100 = 100Mbps uplink). Host.bandwidth_int.
status: set by mirror-status (at least Dead, Disabled is for manual intervention)
Each Site and each Host have two flag bigs: admin_active, and user_active. admin_active lets the MM database admin kill a mirror off quickly; user_active lets the user do this for themselves, particularly in preparation for a long outage.
state: more detailed state
?
contact-name: Name of the person running the mirror. Internal use for us. contact-tel: I cannot remember calling a mirroradmin :) contact-email: Our second unique field. I guess that will be used for login
MM only knows about a user account name we list as the mirror admin. In the Fedora world, this is the FAS account name. In RPMFusion, I expect it's a local database built into TurboGears. Pretty thin on info though, could add these other fields if we need to.
comments: Free form, normally the request mail sent to the list. Nice to have, but not needed. access*: Not used Type: We only have direct mirrors. restructured: That must have happened before 2006 :) centostext: What to add to the mirror URLs (so mostly unused)
MM has content Categories (Fedora Linux, Fedora EPEL, and historical categories). Each Host has one or more Categories = HostCategory. Each HostCategory has one or more HostCategoryURLs. The Categories can be rooted at any arbitrary URL, but from the top of the Category on down, a mirror has to maintain the upstream master directory structure.
You'll want to think about how you structure your content into Categories. It works best if a Category is a distinct subtree, not overlapping with other Categories.
url: The URL to the sponsor's website
Site.orgUrl
info_note, notes_private,infoblock,graphic_url: Not used. centos*: Which versions does the mirror carry? arch_all: Yes, if not, then: arches: Free form - only used for the mirror list on www
This is detected dynamically by MM, and exposed in the publiclist chooser.
dvd-iso: Does it carry them (always yes since 6) dvd*: The versions (6 is set to yes always) dvd-iso-host,rsync-dvd-host: No idea. Not used
Again, dynamically detected.
cc: The TLD the mirror is in. Actually used for generating mirrorlists.txt for that country
Host.country
continent: Used for the mirrorlist on www
Not used. MM uses GeoIP, and augments its mapping of countries to continents with a CountryContinentRedirect. Little used, but it maps say Israel to Europe instead of Asia, because it has better network connectivity to Europe.
centos_code,priority: Not used use-in-mirror-list: Used: We don't really put 10Mbit-machines in EU or US or CA into the mirrorlist.txt which is handed out via yum
Ah. We do, but they get listed at the top of mirrorlist.txt 1/100 as often as a 1Gb mirror would. That's the weighted random sample based on bandwidth.
I guess we can drop many of those when going over to mirrormanager. But: What I don't see on the Fedora pages is a list of all the mirrors (by country/continent/whatever) - I know that this is one thing we actually do need and want.
I don't have the breakdown by continent in /publiclist. Wouldn't be hard, but I hate mucking with that page - it took some major CSS hacking to get it as readable as it is. :-)
Anything I actually overlooked?
Do you have private mirrors in your database now? That maps to Site.private and Host.private.
Host.internet2 if a host is on Internet2 or related high-speed educational/research network. We can look that up in MM's private copy of the Internet2 route tables if needed - that's how I populated the field the first time for Fedora too.
Host.internet2_clients if a host on I2, even if private, should be listed for other I2 clients in the same country. By default set it false, let mirror admins update it themselves.
Host.asn = AS Number Host.asn_clients if a host should serve the whole ASN regardless of netblocks set. Lets mirrors in places with many netblocks, but a single ASN, get away with a single value here. Again, we can look this up in our private copy of the worldwide routing table.
Host.countries_allowed = list of countries allowed. e.g. a mirror in .il may want to only serve users in .il.
Host.netblocks is a list of netblocks that Host should be primary mirror for. This is required for private mirrors.
Host.acl_ips = list of IP addresses or hostnames that will get put into the /rsync_acl list. Other mirrors may wget /rsync_acl to get that list, and use it in their own rsyncd.conf files. Only real problem with it is anyone could sign up to be a private mirror, fill this in, and then get early access to a pre-bitflip mirror via the acl. Oh well...
I think that'll be enough to get going though.
Be thinking about categories. At a glance, I think a single Category "CentOS" would be fine. You could in theory do two Categories "CentOS" and "CentOS ISOs" and rig up update-master-directory-list to ignore /isos in your "CentOS" Category, and ignore everything but /isos in the "CentOS ISOs" Category. but I don't think that will buy you much, and it buys exactly nothing with C6 and newer.
On 07.11.2011 06:40, Matt Domsch wrote:
On Sat, Nov 05, 2011 at 04:51:43PM -0500, Ralph Angenendt wrote:
Name: Is our primary key - so every mirror has to have a unique name in our DB
MM has a "Site" that matches your URL to the Sponsor's website. Each Site has a list of mirror Hosts. Each Site's name must be unique, and Within each Site, the Hosts names must be unique.
Okay, the site URL isn't unique per se, but unique per sponsor (if he has many hosts).
location-minor: Country (or in the case of US and Canada: State)
MM lets GeoIP handle this for us, to the country level. I haven't yet added State-level - oftentimes it really doesn't match up with network topology enough to care.
That is only for representation on the website anyway. What's used for generating the lists is in cc: (Host.country).
http,ftp,rsync: The URLs the mirror is at
HostCategoryURL, two types: public (default), and private for other downstream mirrors to use. Not sure these are actually used.
Hmm? Does not compute: Those are the actual URLs of the mirror content.
bandwidth: Actual bandwidth. Not needed.
MM does need this, an integer value in Mbps (100 = 100Mbps uplink). Host.bandwidth_int.
Okay. As this is free form for us, this needs normalizing, then.
state: more detailed state
?
The reason why it was disabled (lagging, non-responsive and so on. Nobody really uses that).
contact-name: Name of the person running the mirror. Internal use for us. contact-tel: I cannot remember calling a mirroradmin :) contact-email: Our second unique field. I guess that will be used for login
MM only knows about a user account name we list as the mirror admin.
Hmmm. contact-email in that case.
In the Fedora world, this is the FAS account name. In RPMFusion, I expect it's a local database built into TurboGears. Pretty thin on info though, could add these other fields if we need to.
I don't know if they are needed. But we need a user db, we have no general account system in our infrastructure.
MM has content Categories (Fedora Linux, Fedora EPEL, and historical categories). Each Host has one or more Categories = HostCategory. Each HostCategory has one or more HostCategoryURLs. The Categories can be rooted at any arbitrary URL, but from the top of the Category on down, a mirror has to maintain the upstream master directory structure.
Ummm. I need to digest that :) (but we require the mirrors to have the same structure, otherwise they won't show up in the mirrorlist yum uses).
You'll want to think about how you structure your content into Categories. It works best if a Category is a distinct subtree, not overlapping with other Categories.
I guess something like Releases would be best here? 4, 5, 6, 7 ... We don't have anything else.
This is detected dynamically by MM, and exposed in the publiclist chooser.
Great, because entering and checking that can be a PITA :)
dvd-iso: Does it carry them (always yes since 6) dvd*: The versions (6 is set to yes always) dvd-iso-host,rsync-dvd-host: No idea. Not used
Again, dynamically detected.
When switching to mirrormanager everyone will get the dvds anyway. We'll drop the double tree then.
Not used. MM uses GeoIP, and augments its mapping of countries to continents with a CountryContinentRedirect. Little used, but it maps say Israel to Europe instead of Asia, because it has better network connectivity to Europe.
Okay. We had rather longish discussions on this list about those mappings :)
centos_code,priority: Not used use-in-mirror-list: Used: We don't really put 10Mbit-machines in EU or US or CA into the mirrorlist.txt which is handed out via yum
Ah. We do, but they get listed at the top of mirrorlist.txt 1/100 as often as a 1Gb mirror would. That's the weighted random sample based on bandwidth.
Yeah, that is fine.
I don't have the breakdown by continent in /publiclist. Wouldn't be hard, but I hate mucking with that page - it took some major CSS hacking to get it as readable as it is. :-)
What Adrian pointed me too looks okay.Unless one of the mirror sponsors object (anyone still with us here?).
Anything I actually overlooked?
Do you have private mirrors in your database now? That maps to Site.private and Host.private.
No. And I am not sure if we want to allow them - but that is open to discussion.
Host.internet2 if a host is on Internet2 or related high-speed educational/research network. We can look that up in MM's private copy of the Internet2 route tables if needed - that's how I populated the field the first time for Fedora too.
Okay. I think I like that idea.
Host.internet2_clients if a host on I2, even if private, should be listed for other I2 clients in the same country. By default set it false, let mirror admins update it themselves.
Yeah, that's fine, too.
Host.asn = AS Number Host.asn_clients if a host should serve the whole ASN regardless of netblocks set. Lets mirrors in places with many netblocks, but a single ASN, get away with a single value here. Again, we can look this up in our private copy of the worldwide routing table.
Wonderful.
Host.countries_allowed = list of countries allowed. e.g. a mirror in .il may want to only serve users in .il.
Hmmm. Okay, I can understand that in countries with few mirrors.
Host.acl_ips = list of IP addresses or hostnames that will get put into the /rsync_acl list. Other mirrors may wget /rsync_acl to get that list, and use it in their own rsyncd.conf files. Only real problem with it is anyone could sign up to be a private mirror, fill this in, and then get early access to a pre-bitflip mirror via the acl. Oh well...
:)
I think that'll be enough to get going though.
Be thinking about categories. At a glance, I think a single Category "CentOS" would be fine. You could in theory do two Categories "CentOS" and "CentOS ISOs" and rig up update-master-directory-list to ignore /isos in your "CentOS" Category, and ignore everything but /isos in the "CentOS ISOs" Category. but I don't think that will buy you much, and it buys exactly nothing with C6 and newer.
No, it probably won't. I was thinking in Releases, maybe, they don't fluctuate that fast as in Fedoraland.
I am actually populating the machine now with a tree and will do some toying around with mirrormanager during this week, to see what I am up against.
I might drop some questions on IRC :)
Cheers and thanks,
Ralph
On Mon, Nov 07, 2011 at 04:05:59PM -0600, Ralph Angenendt wrote:
On 07.11.2011 06:40, Matt Domsch wrote:
http,ftp,rsync: The URLs the mirror is at
HostCategoryURL, two types: public (default), and private for other downstream mirrors to use. Not sure these are actually used.
Hmm? Does not compute: Those are the actual URLs of the mirror content.
easy to misparse my comment. :-( HostCategoryURLs are used extensively, yes. The private flag on them (so they're only visible to other mirrors downstream of you via the SiteToSite tree), not so much.
bandwidth: Actual bandwidth. Not needed.
MM does need this, an integer value in Mbps (100 = 100Mbps uplink). Host.bandwidth_int.
Okay. As this is free form for us, this needs normalizing, then.
When I first implemented this for Fedora, I cheated by setting the value to 100 for everyone, then going back and adjusting to the correct value if and when one was known.
state: more detailed state
?
The reason why it was disabled (lagging, non-responsive and so on. Nobody really uses that).
gotcha. MM doesn't have a reason field, the crawler just marks dirs as not up-to-date.
I guess something like Releases would be best here? 4, 5, 6, 7 ... We don't have anything else.
Seems sane.
Host.countries_allowed = list of countries allowed. e.g. a mirror in .il may want to only serve users in .il.
Hmmm. Okay, I can understand that in countries with few mirrors.
or limited connectivity to other countries, such that it would be expensive to serve users outside.
Thanks, Matt
On 09.11.2011 06:04, Matt Domsch wrote:
Okay. As this is free form for us, this needs normalizing, then.
When I first implemented this for Fedora, I cheated by setting the value to 100 for everyone, then going back and adjusting to the correct value if and when one was known.
I actually have the value, but it is 100M, 100Mbps, 1Gps, 1GE, 1000MBps and so on :) Well, a bit of scripting.
I guess something like Releases would be best here? 4, 5, 6, 7 ... We don't have anything else.
Seems sane.
Do you have some time for an IRC chat this week (I am mostly - except Wednesday - available from 19:00 UTC until 0:00 UTC. I am having troubles to even getting the web interface to run :/ cherrypy complains about not finding a welcome.html, also see https://bugzilla.redhat.com/show_bug.cgi?id=722736.
Or if you can point me in the right direction via mail or a simple IRC query I would be happy, too. This is my first with turbogears.
Cheers,
Ralph
On 09.11.2011 06:04, Matt Domsch wrote:
On Mon, Nov 07, 2011 at 04:05:59PM -0600, Ralph Angenendt wrote:
MM does need this, an integer value in Mbps (100 = 100Mbps uplink). Host.bandwidth_int.
Okay. As this is free form for us, this needs normalizing, then.
Hrm, first I need to offer an apology for being quite over such a long time, but real life really got into my way for the last > 2 weeks.
I guess I've identified the needed fields from our database and now have the following in a csv table: (all one line)
"LMU Muenchen, Dpt. Biologie 2, IT-Gruppe","http://centos.bio.lmu.de/%22,%22ftp://centos.bio.lmu.de/centos/%22,%22rsync:... Angenendt","centos-mirror-lmu@strg-alt-entf.org","http://zi.bio.lmu.de/%22,%22de"
These are the fields: Name of the mirror's sponsor, then the mirror URLs (http,ftp and rsync). Next field is bandwidth (where we have *tons* of mirrors without any entry - mostly older entries). Then "Contact Name", "Contact Mail", "URL to sponsor" and the countrycode (here de for Germany).
Does that look like a workable subset?
Regards,
Ralph
On Sun, Dec 11, 2011 at 03:58:43PM -0600, Ralph Angenendt wrote:
On 09.11.2011 06:04, Matt Domsch wrote:
On Mon, Nov 07, 2011 at 04:05:59PM -0600, Ralph Angenendt wrote:
MM does need this, an integer value in Mbps (100 = 100Mbps uplink). Host.bandwidth_int.
Okay. As this is free form for us, this needs normalizing, then.
Hrm, first I need to offer an apology for being quite over such a long time, but real life really got into my way for the last > 2 weeks.
I guess I've identified the needed fields from our database and now have the following in a csv table: (all one line)
"LMU Muenchen, Dpt. Biologie 2, IT-Gruppe","http://centos.bio.lmu.de/%22,%22ftp://centos.bio.lmu.de/centos/%22,%22rsync:... Angenendt","centos-mirror-lmu@strg-alt-entf.org","http://zi.bio.lmu.de/%22,%22de"
These are the fields: Name of the mirror's sponsor, then the mirror URLs (http,ftp and rsync). Next field is bandwidth (where we have *tons* of mirrors without any entry - mostly older entries). Then "Contact Name", "Contact Mail", "URL to sponsor" and the countrycode (here de for Germany).
Does that look like a workable subset?
Yes.
I've created an import script, in ~mdomsch/import_centos which I haven't tested at all, but should be fairly close.
Need to add a header to the CSV as I note in the script. Need to convert those "100Mbit" values to '100', either in the script or manually in the CSV before import.
Users will be created in the local database, with random 16-character passwords, and username == their email address. It will send an email.
Need to create a wiki page in the CentOS wiki on how to configure report_mirror for CentOS use. The email points at this wiki page.
I'll be mostly offline from 12/17 - 1/3, but will occasionally be able to check email. I will have exceedingly limited internet access during this time, but will respond if/when I can.
On 12.12.2011 06:29, Matt Domsch wrote:
I'll be mostly offline from 12/17 - 1/3, but will occasionally be able to check email. I will have exceedingly limited internet access during this time, but will respond if/when I can.
Then let's move the rest of the work to 2012 - I have been swamped with work and won't have that much time "between" the years either.
I'm sorry for my sporadic being on and off, but I am still trying to get used to new job and town :)
By that time I might have a plan on how to remove our non-dvd tree, too.
Cheers,
Ralph