Hey,
long time no infrastructure changes (well, on the CentOS side, quite a few on my side).
As several people have offered to help moving our selfbotched system to mirrormanager (sorry Peter, but there were some offers - I am still not sure which one really is better technically, but the more helping hands, the better), I'd like to start this now.
First question: I guess as a first move we need a machine to host that on.
Can anyone running a mirrormanager instance tell me, what kind of specs that machine needs to have? Does it need to hold a copy of the mirror?
On the software side, I guess httpd and mysql-server (and mirrormanager) - anything else? Or is the sqlite variant fast enough for the amount of mirrors we have?
At the moment we run quite a few instances of mirrorlist.centos.org - the machines which hand out the urllist to machines - is that possible with mirrormanager, too? Or will one machine be able to handle the load?
Depending on your answers I am going to snatch a machine from our pool for setting that up, next week. And with that machine we can start building, testing and then deploying mirrormanager.
I am away over the weekend, so please don't expect answers - or more questions - from me before monday evening :)
Cheers,
Ralph
On Fri, Oct 07, 2011 at 10:55:30PM +0200, Ralph Angenendt wrote:
As several people have offered to help moving our selfbotched system to mirrormanager (sorry Peter, but there were some offers - I am still not sure which one really is better technically, but the more helping hands, the better), I'd like to start this now.
First question: I guess as a first move we need a machine to host that on.
Yes.
Can anyone running a mirrormanager instance tell me, what kind of specs that machine needs to have? Does it need to hold a copy of the mirror?
Nothing really fancy. The RPM Fusion mirrormanager instance has a local mirror and I think it is necessary.
On the software side, I guess httpd and mysql-server (and mirrormanager)
- anything else? Or is the sqlite variant fast enough for the amount of
mirrors we have?
The RPM Fusion mirrormanager uses postgresql and I am pretty sure also the Fedora instance uses postgresql.
At the moment we run quite a few instances of mirrorlist.centos.org - the machines which hand out the urllist to machines - is that possible with mirrormanager, too? Or will one machine be able to handle the load?
That is possible. If you look at mirrors.rpmfusion.org or mirrors.fedoraproject.org you will see that both resolve to at least two machines. You can have as many as you want. The mirrorlist for RPM Fusion is running on two VMs with CentOS 5 (32bit). One with 512MB and the other with 1024MB. 512MB is sometimes not enough so I would say at least 768MB. The mirrorlist for Fedora and RPM Fusion are not running on a live copy of the MirrorManager database. They are refreshed hourly.
As far as I know most of the Fedora setup is running on RHEL 6.
Adrian
On Fri, Oct 07, 2011 at 03:55:30PM -0500, Ralph Angenendt wrote:
Hey,
long time no infrastructure changes (well, on the CentOS side, quite a few on my side).
As several people have offered to help moving our selfbotched system to mirrormanager (sorry Peter, but there were some offers - I am still not sure which one really is better technically, but the more helping hands, the better), I'd like to start this now.
Excellent. I'm happy to help in any way I can.
First question: I guess as a first move we need a machine to host that on.
Can anyone running a mirrormanager instance tell me, what kind of specs that machine needs to have?
There are three primary applications: 1) the web pages for mirror admins to login and change their mirror data, which on Fedora is using ~160MB RAM; 2) various cronjobs (~50MB); 3) the mirrorlist request handler (125MB). 1 and 3 can run on one, or many, machines. 2) by the nature of the jobs only runs on a single system.
MM maintains a local cache in /var/lib/mirrormanager. Fedora's copy thereof is 28MB.
Does it need to hold a copy of the mirror?
Machine 2 above needs a copy of the mirror, yes, either local or NFS-mounted. Some aspects, like update-master-directory-list and the crawler, can simply grab an rsync listing from another system, rather than have the full mirror be mounted; however, the metalink generator does need to be able to read files, so having a full mirror nearby would be good. Fedora has the directory tree on an NFS-mountable volume, which the MM cronjob server mounts read-only.
On the software side, I guess httpd and mysql-server (and mirrormanager)
- anything else? Or is the sqlite variant fast enough for the amount of
mirrors we have?
Because there are several actors updating the database simultaneously (particularly the update-master-directory-list cronjob and the crawlers), it's preferred to use postgres or mysql rather than sqlite. I haven't tried running with sqlite in production. MM uses TurboGears, which uses SQLObject currently (soonish SQLAlchemy) so you are free to choose whichever database backend you want.
At the moment we run quite a few instances of mirrorlist.centos.org - the machines which hand out the urllist to machines - is that possible with mirrormanager, too? Or will one machine be able to handle the load?
As Adrian noted, this is not only possible, but recommended. Fedora has ~8 servers handing out the mirrorlist by request, and they don't even blink at the load. In general, each mirrorlist request is answered within 0.3s, of which <0.1s is spent in the mirrorlist code, the rest is just setting up TCP connections and getting through the load balancers.
I would also recommend writing a script to convert your existing database information into MM, so as to not make people re-enter it.
You'll want to figure out how you want to handle updating mirror info - if you're going to centralize that as you do today, or if you want each mirror to be able to edit their own information (the Fedora model). MM can work in either mode. If mirror admins can do it themselves, you'll need some form of account system, either local to MM (which is present), or something that the TurboGears framework can hook into (such as the Fedora Acccount System).
Thanks, Matt
On 10/10/2011 06:48 AM, Matt Domsch wrote:
On Fri, Oct 07, 2011 at 03:55:30PM -0500, Ralph Angenendt wrote:
As several people have offered to help moving our selfbotched system to mirrormanager (sorry Peter, but there were some offers - I am still not sure which one really is better technically, but the more helping hands, the better), I'd like to start this now.
Excellent. I'm happy to help in any way I can.
Thank you (and thank you Adrian), I'm just going to reply to this post.
Can anyone running a mirrormanager instance tell me, what kind of specs that machine needs to have?
There are three primary applications: 1) the web pages for mirror admins to login and change their mirror data, which on Fedora is using ~160MB RAM; 2) various cronjobs (~50MB); 3) the mirrorlist request handler (125MB). 1 and 3 can run on one, or many, machines. 2) by the nature of the jobs only runs on a single system.
MM maintains a local cache in /var/lib/mirrormanager. Fedora's copy thereof is 28MB.
Okay. And as we need to have the Database running on the same host (as the machines we have are rather spread around the world than being together in one nice and cozy data center) the machine I just "found" which has 512MB RAM seems to be to small to do that.
So first thing: Look for another machine :)
Does it need to hold a copy of the mirror?
Machine 2 above needs a copy of the mirror, yes, either local or NFS-mounted. Some aspects, like update-master-directory-list and the crawler, can simply grab an rsync listing from another system, rather than have the full mirror be mounted; however, the metalink generator does need to be able to read files, so having a full mirror nearby would be good. Fedora has the directory tree on an NFS-mountable volume, which the MM cronjob server mounts read-only.
Okay. Finding a machine which can hold the complete tree is trivial, all of ours can do that.
On the software side, I guess httpd and mysql-server (and mirrormanager)
- anything else? Or is the sqlite variant fast enough for the amount of
mirrors we have?
Because there are several actors updating the database simultaneously (particularly the update-master-directory-list cronjob and the crawlers), it's preferred to use postgres or mysql rather than sqlite. I haven't tried running with sqlite in production. MM uses TurboGears, which uses SQLObject currently (soonish SQLAlchemy) so you are free to choose whichever database backend you want.
Hmmm, good. If MySQL is an option - we run other instances of that, then there's no need to to run a postgres instance. MirrorManager doesn't take advantage of the inetnum (or what it is called) data type in postgres which is able to store IP addresses and CIDR data?
At the moment we run quite a few instances of mirrorlist.ceVntos.org - the machines which hand out the urllist to machines - is that possible with mirrormanager, too? Or will one machine be able to handle the load?
As Adrian noted, this is not only possible, but recommended. Fedora has ~8 servers handing out the mirrorlist by request, and they don't even blink at the load.
Fine. I think we can reuse the machines we have. How can that info be copied? At the moment I copy a tree which has the mirrorlist info for any given country/release/repo and a perl module which then grabs that info and gives it out. How does that work in mirror manager?
I would also recommend writing a script to convert your existing database information into MM, so as to not make people re-enter it.
Yes, that actually is the plan :) I need to put them side by side though, so I can transform them. Is there a "picture" of the DB schema?
You'll want to figure out how you want to handle updating mirror info
- if you're going to centralize that as you do today, or if you want
each mirror to be able to edit their own information (the Fedora model). MM can work in either mode. If mirror admins can do it themselves, you'll need some form of account system, either local to MM (which is present), or something that the TurboGears framework can hook into (such as the Fedora Acccount System).
Yeah, I'd like that people are able to edit their own info. Would be a good test to see if contact info is uptodate, too :)
So let me try to find a different machine :/
Cheers, and thank you and Adrian again,
Ralph
On Tue, Oct 11, 2011 at 04:38:44PM -0500, Ralph Angenendt wrote:
Hmmm, good. If MySQL is an option - we run other instances of that, then there's no need to to run a postgres instance. MirrorManager doesn't take advantage of the inetnum (or what it is called) data type in postgres which is able to store IP addresses and CIDR data?
No, it doesn't. It stores IPs as text, and does all the conversions of it in python.
At the moment we run quite a few instances of mirrorlist.ceVntos.org - the machines which hand out the urllist to machines - is that possible with mirrormanager, too? Or will one machine be able to handle the load?
As Adrian noted, this is not only possible, but recommended. Fedora has ~8 servers handing out the mirrorlist by request, and they don't even blink at the load.
Fine. I think we can reuse the machines we have. How can that info be copied? At the moment I copy a tree which has the mirrorlist info for any given country/release/repo and a perl module which then grabs that info and gives it out. How does that work in mirror manager?
Fedora uses a MM script to dump the database once an hour into the mirrorlist cache, then rsync's it to the target mirrorlist servers. kill -HUP mirrorlist_server.py and it reloads the cache.
I would also recommend writing a script to convert your existing database information into MM, so as to not make people re-enter it.
Yes, that actually is the plan :) I need to put them side by side though, so I can transform them. Is there a "picture" of the DB schema?
Uh, no... I can dump the schema as SQL, if you know of a tool to pull it in. I can help with the script too.
http://git.fedorahosted.org/git/?p=mirrormanager;a=blob;f=server/mirrormanag... has the schema in SQLObject. Each class (SQLObject) is a table.
On 10/10/2011 06:48 AM, Matt Domsch wrote:
On Fri, Oct 07, 2011 at 03:55:30PM -0500, Ralph Angenendt wrote:
As several people have offered to help moving our selfbotched system to mirrormanager (sorry Peter, but there were some offers - I am still not sure which one really is better technically, but the more helping hands, the better), I'd like to start this now.
Excellent. I'm happy to help in any way I can.
Oh, Bonus question: How do you check if a mirror is stale and how do you raise a flag about that in the mirror database?
Ralph
On Tue, Oct 11, 2011 at 04:54:30PM -0500, Ralph Angenendt wrote:
On 10/10/2011 06:48 AM, Matt Domsch wrote:
On Fri, Oct 07, 2011 at 03:55:30PM -0500, Ralph Angenendt wrote:
As several people have offered to help moving our selfbotched system to mirrormanager (sorry Peter, but there were some offers - I am still not sure which one really is better technically, but the more helping hands, the better), I'd like to start this now.
Excellent. I'm happy to help in any way I can.
Oh, Bonus question: How do you check if a mirror is stale and how do you raise a flag about that in the mirror database?
There is a crawler that runs on a cronjob. It compares what the database thinks a mirror should have, with what it actually has (HTTP HEAD or FTP DIR commands). If a given directory does not match, that one directory is marked not up-to-date.
On 10/12/2011 01:27 AM, Matt Domsch wrote:
On Tue, Oct 11, 2011 at 04:54:30PM -0500, Ralph Angenendt wrote:
Oh, Bonus question: How do you check if a mirror is stale and how do you raise a flag about that in the mirror database?
There is a crawler that runs on a cronjob. It compares what the database thinks a mirror should have, with what it actually has (HTTP HEAD or FTP DIR commands). If a given directory does not match, that one directory is marked not up-to-date.
Thanks for the answers up to now.
Cheers,
Ralph
On 10.10.2011 06:48, Matt Domsch wrote:
Can anyone running a mirrormanager instance tell me, what kind of specs that machine needs to have?
There are three primary applications: 1) the web pages for mirror admins to login and change their mirror data, which on Fedora is using ~160MB RAM; 2) various cronjobs (~50MB); 3) the mirrorlist request handler (125MB). 1 and 3 can run on one, or many, machines. 2) by the nature of the jobs only runs on a single system.
Okay, I "scored" a machine. Let me setup and install a current CentOS, mirrormanager and a database. Then I am going to toy around with it for a bit - and if I have questions, I'm going to ask here.
Hope for rain over the weekend :)
Cheers,
Ralph