IRC meeting regarding new mirroring system for CentOS

List overview All Threads
Download

newer

older

Old Temp File...

Unresponsive Mirror

Ralph Angenendt

20 Oct 2010 20 Oct '10

8:51 p.m.

Hey,

I'm sorry if someone waited for the (proposed by me) irc chat on Monday 18th - I had a somewhat surprise visit by family on that date.

As a new date I propose Monday, October 25th at 20:00 UTC in #centos-mirror on irc.freenode.net (and this time I won't have visitors over or at least know it early enough to say so).

Any opinions on that date?

Regards,

Ralph

Show replies by date

Karanbir Singh

20 Oct 20 Oct

11:26 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

On 10/20/2010 09:51 PM, Ralph Angenendt wrote:

...

As a new date I propose Monday, October 25th at 20:00 UTC in #centos-mirror on irc.freenode.net (and this time I won't have visitors over or at least know it early enough to say so).

Any opinions on that date?

Works for me.

- KB

Bangladeshi CentOS Mirror Maintainer [BD-SERVERS.NET]

21 Oct 21 Oct

12:20 a.m.

Oke

I am adding it to my schedule.

With Thanks

-Bauani

On Thu, Oct 21, 2010 at 2:51 AM, Ralph Angenendt ralph.angenendt@gmail.com wrote:

...

Hey,

I'm sorry if someone waited for the (proposed by me) irc chat on Monday 18th - I had a somewhat surprise visit by family on that date.

As a new date I propose Monday, October 25th at 20:00 UTC in #centos-mirror on irc.freenode.net (and this time I won't have visitors over or at least know it early enough to say so).

Any opinions on that date?

Regards,

Ralph _______________________________________________ CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror

-- Regards & Besh Wishes Noor Ahamed Bauani Chief Technology Advisor Dhaka Wireless http://www.dhaka-wireless.net/ An IPv6 Ready ISP in Bangladesh, Need an IPv6 Connectivity? Just Knock us! HP: +880-1818-BAUANI (SMS Only, No Direct Call Please) ----------------------------------------------------------------------------------------------------------------- Give Me Sunshine, Give Me Some Rain, Give Me a Chance to Grow up Again

J.H.

7:39 a.m.

New subject: IRC meeting regarding new mirroring system for CentOS

I'll endeavor to attend though I'll be in a physical conference most of the day. Personally I would mostly be interested in a transcript to be published, at least to the mirrors, of the discussion either way.

- John 'Warthog9' Hawley

On 10/20/2010 01:51 PM, Ralph Angenendt wrote:

...

Hey,

I'm sorry if someone waited for the (proposed by me) irc chat on Monday 18th - I had a somewhat surprise visit by family on that date.

As a new date I propose Monday, October 25th at 20:00 UTC in #centos-mirror on irc.freenode.net (and this time I won't have visitors over or at least know it early enough to say so).

Any opinions on that date?

Regards,

Ralph _______________________________________________ CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror

Jeff Sheltren

2:21 p.m.

On Thu, Oct 21, 2010 at 12:39 AM, J.H. warthog9@kernel.org wrote:

...

I'll endeavor to attend though I'll be in a physical conference most of the day. Personally I would mostly be interested in a transcript to be published, at least to the mirrors, of the discussion either way.

+1 to this!

Thanks, Jeff

R P Herrold

2:32 p.m.

On Thu, 21 Oct 2010, Jeff Sheltren wrote:

...

On Thu, Oct 21, 2010 at 12:39 AM, J.H. warthog9@kernel.org wrote:

...
I'll endeavor to attend though I'll be in a physical conference most of the day. Personally I would mostly be interested in a transcript to be published, at least to the mirrors, of the discussion either way.

+1 to this!

I guess in my mind, the question becomes -- What is wrong with people wanting such, installing screen on a reliable box in fine bandwidth, and using an irssi, and having a local transcript?

I wrote scripts that did the automatic log capture, rotation, and coloration early on, and itseemed to me that all it did was provide 'fodder' for google to remember with ephemeral content that some speakers might not want to see years later in a web search. When CentOS was spunout of cAos into a standalone project on new inrastructure, I certainly did not fee that the effort was worth migrating, as I rarely used it

Part of why IRC works in my opinion is that it is casual and unguarded

I understand that this is a meeting but a better reaction might be a person feeling the need to take a local log, and if one felt motivated, produce 'minutes' of the event in the wiki -- that way, the 'burden' of the production is borne by the person 'feeling the itch'

-- Russ herrold

J.H.

4:57 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

...

I understand that this is a meeting but a better reaction might be a person feeling the need to take a local log, and if one felt motivated, produce 'minutes' of the event in the wiki -- that way, the 'burden' of the production is borne by the person 'feeling the itch'

This is a meeting that effects the entire mirror structure and as such I felt that those who might not be able to attend the meeting for any number of reasons: time zone miss-matches, connectivity issues, real life, work, they didn't know about the meeting for whatever reason, would still be able to walk back through the log of what was discussed with all of the view points on the discussion.

In a lot of cases the discussion is a lot more useful than the final decided outcome.

I would argue this is such a case where understanding the issues and being able to bring things up after the fact (on the mailing list perhaps) for further discussion would be a good thing. That's all.

- John 'Warthog9' Hawley

Bangladeshi CentOS Mirror Maintainer [BD-SERVERS.NET]

7:49 p.m.

Hello Fellow Members

I would like to draw attention on putting the transcript on mirror for a time being. IF WE USE BANDWIDTH TO SYNC WITH MASTER MIRROR EVERYDAY 4 TO 6 TIMES, I CAN'T SEE ANY PROBLEM ON PUTTING THE TRANSCRIPT ON MIRROR SO THAT IT CAN AUTOMATICALLY DELIVER TO MAINTAINER AND WHO WILL NOT ABLE TO ATTEND THE MEETING, CAN WALK THROUGH IT LATER ON CONVENIENT TIME.

I think it will not be a BIG file (specially with gzip or bzip2 format) which will impact the bandwidth you have.

This is my personal view.

Cheers.

Ahamed Bauani http://mirrors.bd-servers.net/centos/

On Thu, Oct 21, 2010 at 8:32 PM, R P Herrold herrold@owlriver.com wrote:

...

On Thu, 21 Oct 2010, Jeff Sheltren wrote:

...
On Thu, Oct 21, 2010 at 12:39 AM, J.H. warthog9@kernel.org wrote:

...
I'll endeavor to attend though I'll be in a physical conference most of the day. Personally I would mostly be interested in a transcript to be published, at least to the mirrors, of the discussion either way.

+1 to this!

I guess in my mind, the question becomes -- What is wrong with people wanting such, installing screen on a reliable box in fine bandwidth, and using an irssi, and having a local transcript?

I wrote scripts that did the automatic log capture, rotation, and coloration early on, and itseemed to me that all it did was provide 'fodder' for google to remember with ephemeral content that some speakers might not want to see years later in a web search. When CentOS was spunout of cAos into a standalone project on new inrastructure, I certainly did not fee that the effort was worth migrating, as I rarely used it

Part of why IRC works in my opinion is that it is casual and unguarded

I understand that this is a meeting but a better reaction might be a person feeling the need to take a local log, and if one felt motivated, produce 'minutes' of the event in the wiki -- that way, the 'burden' of the production is borne by the person 'feeling the itch'

-- Russ herrold _______________________________________________ CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror

Ralph Angenendt

22 Oct 22 Oct

8:57 a.m.

I seem to have sent that only to John (blame gmail's web interface ...)

On Thu, Oct 21, 2010 at 9:39 AM, J.H. warthog9@kernel.org wrote:

...

I'll endeavor to attend though I'll be in a physical conference most of the day. Personally I would mostly be interested in a transcript to be published, at least to the mirrors, of the discussion either way.

I think we could put it somewhere on the wiki. If things need to change, there needs to be a place to document that anyway :)

Ralph

Adrian Reber

21 Oct 21 Oct

10:49 a.m.

New subject: IRC meeting regarding new mirroring system for CentOS

On Wed, Oct 20, 2010 at 10:51:13PM +0200, Ralph Angenendt wrote:

...

I'm sorry if someone waited for the (proposed by me) irc chat on Monday 18th - I had a somewhat surprise visit by family on that date.

As a new date I propose Monday, October 25th at 20:00 UTC in #centos-mirror on irc.freenode.net (and this time I won't have visitors over or at least know it early enough to say so).

Any opinions on that date?

I will try to be there.

Adrian

Ralph Angenendt

27 Oct 27 Oct

9:31 p.m.

Am 20.10.10 22:51, schrieb Ralph Angenendt:

...

As a new date I propose Monday, October 25th at 20:00 UTC in #centos-mirror on irc.freenode.net (and this time I won't have visitors over or at least know it early enough to say so).

There is a wiki page for that process now. I put down the notes I took at the meeting for now. There's also a log of the IRC meeting, which I want to redact a bit first, as there is some off topic chatting in there (and several joins/leaves during the meeting). I won't have time for that before friday, though.

Here's the page, which will fill up with more information:

http://wiki.centos.org/InfraWiki/Mirrors

I like to thank the people who were there and gave us input about other solutions (and questioned why we do things like we do).

Regards,

Ralph

Ralph Angenendt

2 Nov 2 Nov

6:43 p.m.

Am 27.10.10 23:31, schrieb Ralph Angenendt:

...

Here's the page, which will fill up with more information:

http://wiki.centos.org/InfraWiki/Mirrors

That page now also has the log of the irc conversation.

Regards,

Ralph

Bangladeshi CentOS Mirror Maintainer [BD-SERVERS.NET]

8:31 p.m.

On Wed, Nov 3, 2010 at 12:43 AM, Ralph Angenendt ralph.angenendt@gmail.com wrote:

...

Am 27.10.10 23:31, schrieb Ralph Angenendt:

...
Here's the page, which will fill up with more information:

http://wiki.centos.org/InfraWiki/Mirrors

That page now also has the log of the irc conversation.

Thanks, I was Sick & Hospitalized for last few days. Thats why I couldn't join in IRC. Anyway let me read it first.

Thank you guys.

...

Regards,

Ralph _______________________________________________ CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror

Peter Pöml

8 Nov 8 Nov

9:02 a.m.

New subject: IRC meeting regarding new mirroring system for CentOS

Hi everybody,

[resending, after realizing that I was subscribed with an old address]

On Wed, Oct 27, 2010 at 11:31:56PM +0200, Ralph Angenendt wrote:

...

There is a wiki page for that process now. I put down the notes I took at the meeting for now. There's also a log of the IRC meeting, which I want to redact a bit first, as there is some off topic chatting in there (and several joins/leaves during the meeting). I won't have time for that before friday, though.

Here's the page, which will fill up with more information:

http://wiki.centos.org/InfraWiki/Mirrors

I like to thank the people who were there and gave us input about other solutions (and questioned why we do things like we do).

Regards,

Ralph

I would also like to thank you for the good meeting, and also for considering MirrorBrain.

This mail is very long -too long-, which I would like to apologize for, but I thought it would be good to provide a comprehensive overview of the options that I see.

First off, I think you can't go wrong if you go with MirrorManager, because it works for Fedora, and it already has support for the somewhat more special requirement that you have, which is yum mirror lists. The similarity of Fedora and Centos might make many things easier. MirrorBrain doesn't have this yet, because none of its users needed it so far. As MirrorBrain tries to be a generic solution, it is generally agnostic of project or metadata structure, and does everything on file level. That doesn't mean that support for "special" features is unwanted, of course. Especially if it can be implemented in a way that it fits into the concept, and doesn't make deployment for other users more difficult. It is certainly a nice option - there are many Yum-based distros, after all.

(background: Being usable not only by Linux distros is a declared goal of the MirrorBrain project, in order to get as many users (and potential developers) into the boat and collaborate.

For a mirroring infrastructure, I believe that only collaboration across organization borders can yield a mature, flexible and long-lived solution. And there are not really many people working on this, only a handful. It would be cool to merge MirrorBrain and MirrorManager somehow. Might be a lot of work but useful in the long-term. )

Having said all that, I thought that Yum mirrorlist in MirrorBrain should not be hard to implement. I spent some time on it today and got quite far; configuring mapping of URL query arguments to directories/files is done, and actual mapping works. I chose Apache config as vehicle for that, and the following is a working config:

For instance, $1/$2/x86_64 is the base URL to a repository, and the match groups can optionally be replaced with what the client specified to the query arguments. ($1 is the first group from the configuration line, $2 the second, and so on. The names and number of query args are all arbitrary.) The last argument is a relative path, and the file that must be present on eligible mirrors. The resulting path here would be e.g. 5.5/os/x86_64/repodata/repomd.xml, and the client would get a list of mirrors in the form of http://mirror.example.com/path/to/centos/5.5/os/x86_64/ (That's what's missing to be implemented, but it's the easiest part :-) So I'm confident that I can promise Yum mirror list soon. Maybe I can finish it this week, maybe the week after, I don't now.

Meanwhile, I would appreciate input from you: is this reasonable? Would it serve your needs?

If it does, I think the only feature in missing in MirrorBrain for you would be sorted out.

(Needless to say that the mirror list that yum gets will be sorted by suitability of the mirrors)

So, on to the other issues that were raised in the meeting.

Summarizing what I heard, the following are the problems that you would like to solve:

1) scalability 2) cleaning up the historic DVD/nonDVD setup 3) partial mirroring 4) finer mirror selection (by prefix, autonomous system, state/region, in addition to country/continent) 5) consistency problems 6) content verification 7) (presumably) backwards compatibility to existing installations 8) (maybe) satellite setups

1) scalability

The dimensions are: - 70.000 files in 500 directories - >400 mirrors - 40 requests per second

Sounds fine from my point of view. MB has handled more files, and more requests. The number of mirrors I have run it with was smaller, 150 at most, but I wouldn't expect big problems. The little mirrorprobe that runs every minute might run into a system limit when starting 400 threads, to check all mirrors at the same time, so maybe it needs to be tweaked, or changed to a different model, using a pool of threads or starting some processes as well.

2) cleaning up the historic DVD/nonDVD setup

Sounds like a good idea :-)

3) partial mirroring

Supported well by MirrorBrain.

4) finer mirror selection (by prefix, autonomous system, state/region, in addition to country/continent)

MirrorBrain uses BGP/routing data to find out the network prefix and AS of clients and mirrors, and matches them. Other criteria are GeoIP country and continent. The closest match is used for mirror selection. If several mirrors are there to choose from, a weighted randomization is also applied, to be able to give some mirrors more requests and others less. We talked in our meeting about the need for a smarter selection in e.g. the US, where one doesn't want to be sent from one coast to the other. GeoIP regions were discussed for this. I considered going that route, but decided to implement a different concept, which I believe is more widely useful, because it works also when no mirror within the same state/region is found: using geographical distance between the client and the mirrors. I just released this new feature into the wild: http://mirrorbrain.org/news/2140-takes-geographical-distances-account/ You can try it out http://download.services.openoffice.org/files/stable/3.2.1/OOo-SDK_3.2.1_Lin... and feedback is appreciated.

5) consistency problems

Regarding problems with consistency of trees on mirrors / clients accessing them, this is indeed a hard problem to solve. From discussions with Fedora people I know that they also have/had major fights with that. It took me a long time to finally get this sorted out when I still worked on the openSUSE infrastructure. The following have proved useful for me in the past:

- Always take care of setting appropriate cache headers. must-revalidate is the key, because it doesn't prevent caching, but causes clients (and intermediaries) to always validate that a resource is still fresh.

It is hopeless to get all mirrors to run the same configuration in this regard, and there are also some FTP mirrors (and FTP doesn't have a feature to control caching at all), so for certain content, there is no other option than delivering it from defined places _with_ proper headers. Luckily, this concerns mostly small metadata files.

This is against inconsistency as it happens when things come from different places (different age). If cache control is not exerted by the server (or client), intermediaries (web caches) commonly "guess" how long they should deliver stuff from their cache, without revalidation freshness. Typically, a squid assumes freshness for 4-18 hours by default, and the exact time is hard to predict, because cache pruning is complex and may take file size into account. Thus, it is inevitable that clients see an inconsistent picture.

- The second (and even more important) measure is to version metadata. Actually, any data. Always and Everywhere. With RPMs, one is in the lucky situation that this is usually done anyway (reliably increasing version/release numbers with each rebuild). Exception files like "MD5SUMS" definitely need to be treated separately and should never be redirected to a mirror, not only for security reasons. repo-md metadata, as used by Yum, exists in various incarnations. Unfortunately, the ones I dealt with in the past were not versioned, and files had names like "filelists.xml.gz", which leaves only non-redirection as the only 100% solution. (So I did that.) Nowadays, at least the repo-md metadata that the Fedora and openSUSE people build is versioned, as can be seen in this example: http://download.opensuse.org/repositories/Apache:/MirrorBrain/Apache_openSUS... I suppose that createrepo does that these days. Anyway, this is certainly a point where tight cooperation (and appropriate input) with the build system folks is very important.

- A third line of "defense" can be a client that double-checks itself that it doesn't get old metadata, by checking with cryptohashes if the download is the expected one, _and_ falls back to a different mirror if it isn't the case. That's what Yum does, since MirrorManager sends hashes/timestamps via Metalinks, and what Zypper does, since it uses a Metalink client for all downloads that allows it to fall back to other mirrors until it got the expected data. You won't be able to do something fancy like that with CentOS 5 I guess, but maybe with the next version. (Actually, it's not that difficult to teach Yum using a Metalink client -- I once tried it out, and it was a one-liner to replace its usage of python-urlgrabber with a call to aria2c (powerful Metalink client) for all downloads. Another great option would be to extend python-urlgrabber to be a Metalink client.)

That's what I learnt anyway... maybe some of it can be useful to you. Verifying 400 mirrors in realtime is no option, with our limited means, IMO -- simply not doable. Of course, if anyone knows how to do that, I am *very* interested :-)

6) content verification

Regarding content verification: I don't know how you currently check exactly, but what can be done with MirrorBrain is: - there currently is a tool for downloading a file from one or all mirrors and displaying a hash of it. - this obviously doesn't work well for huge files (DVDs) (if it's not about a close, fast mirror). - since recently, MB can keep all hashes of all files in a database. The hashes include block (piece-wise) hashes. It would be fairly easy to fetch the hash of a random block (or a defined one) and download just that piece from all mirrors. (Since the hashes in the database are retrievable from everywhere, such checkers could also run _very_ distributed in fact.) If you look at http://download.documentfoundation.org/libreoffice/testing/3.3.0-beta2/rpm/x... there is various metadata, including the block hashes inside the linked IETF Metalink in the form of XML.

I'm open (and happy) to implement more means of content verification. So far, I either didn't have more need for it, or time was lacking. But it would be very useful. I just would like to point out that I see a need for it mainly for debugging purposes, when something goes wrong, and not as a security measure. Content verification is too easy to spoof as to significantly trust it. It is much more important to give clients the top hash from a trusted source, maybe even over TLS-encrypted web server, and rely on cryptographic signatures for the rest (which is easy with RPM, luckily).

In the context of file-tree consistency and content verification, I should note that verifying only certain critical files might not prevent that a mirror is "half synced", and thus inconsistent. I think that running something after syncing is a smart way to discover the moment when the mirror is "ready". That's where MirrorManager is very clever.

I wondered if there is a crucial file that can be used as "marker" to determine whether a mirror is up to date or not. A timestamp file might work, but maybe there need to be several of them, in different parts of the tree, if some setups are complex and sync parts of the tree with different scripts. MirrorBrain can also download files from mirrors (to look at the timestamp content), but that said, one wouldn't want to disable a mirror necessarily when it hasn't synced since a day, when it is still up to date (when no new content has come, except new timestamps). Or how do you handle this?

I was tossing around the idea whether the mirror scanner should integrate such a timestamp check, maybe comparing the timestamp of a certain "marker" file on the mirror with the known timestamp in its database. But I'm not clear yet where this would lead and how it could be made useful.

...Maybe the mirror scanner should simply check all repodata/repomd.xml files in the tree frequently, comparing with the current version. With the yum mirrorlist implementation described above, it would be easy to have only mirrors end up on the lists that are known to have the current file.

7) backwards compatibility to existing installations

I don't see an issue, once mirror lists work. However, I know much to few things about CentOS. :-)

BTW, one idea for the future, that I would like to at least mention, is that you could change Yum to contact a/the redirector for each request, instead of only in the beginning. I cannot judge if that would be better or worse -- I use Yum since many years, but always in that mode, and not with the mirror lists that you guys use. Anyway, that would give you more control over what Yum downloads where, let alone because of the ability of exerting proper cache control. It's also good for security if critical hash files (those containing the top hash) are downloaded from a trusted server only.

8) (maybe) satellite setups

Here I didn't get the details.

Curious what you think about all this.

Again, sorry sorry sorry for the long mail.

Thanks, Peter

Jim Kusznir

11:44 p.m.

For me as a mirror admin, the only feature I don't like about MirrorBrain is that I don't have the ability to log in and "check on" or admin my mirror.

I mirror for a few different distros, and ubuntu's mirror manager is quite poor as well. I have an account, but can't get to it. When I fail a test of some sort, I get a not-very-useful e-mail, and no way to get more info on what happened. I usually end up just "waiting it out". It would be nice if I get an e-mail allerting me to something being wrong, and then allowing me to log in and see.

I also like being able to specify some IP ranges I'm authoritative for. As my mirror is on a university campus, I'd love to be able to enter my campus' IP ranges, and that way ensure that all my campus gets my mirror. So far, none of the OSes I mirror for (I don't mirror Fedora presently) allows me to do that.

MirrorBrain sounds like it has a lot of the functionality, but only available to the distro managers. They're busy people; I'd rather not bother them if I can handle stuff myself.

--Jim

On Mon, Nov 8, 2010 at 1:02 AM, Peter Pöml peter@poeml.de wrote:

...

Hi everybody,

[resending, after realizing that I was subscribed with an old address]

On Wed, Oct 27, 2010 at 11:31:56PM +0200, Ralph Angenendt wrote:

...
There is a wiki page for that process now. I put down the notes I took at the meeting for now. There's also a log of the IRC meeting, which I want to redact a bit first, as there is some off topic chatting in there (and several joins/leaves during the meeting). I won't have time for that before friday, though.

Here's the page, which will fill up with more information:

http://wiki.centos.org/InfraWiki/Mirrors

I like to thank the people who were there and gave us input about other solutions (and questioned why we do things like we do).

Regards,

Ralph

I would also like to thank you for the good meeting, and also for considering MirrorBrain.

This mail is very long -too long-, which I would like to apologize for, but I thought it would be good to provide a comprehensive overview of the options that I see.

First off, I think you can't go wrong if you go with MirrorManager, because it works for Fedora, and it already has support for the somewhat more special requirement that you have, which is yum mirror lists. The similarity of Fedora and Centos might make many things easier. MirrorBrain doesn't have this yet, because none of its users needed it so far. As MirrorBrain tries to be a generic solution, it is generally agnostic of project or metadata structure, and does everything on file level. That doesn't mean that support for "special" features is unwanted, of course. Especially if it can be implemented in a way that it fits into the concept, and doesn't make deployment for other users more difficult. It is certainly a nice option - there are many Yum-based distros, after all.

(background: Being usable not only by Linux distros is a declared goal of the MirrorBrain project, in order to get as many users (and potential developers) into the boat and collaborate.

For a mirroring infrastructure, I believe that only collaboration across organization borders can yield a mature, flexible and long-lived solution. And there are not really many people working on this, only a handful. It would be cool to merge MirrorBrain and MirrorManager somehow. Might be a lot of work but useful in the long-term. )

Having said all that, I thought that Yum mirrorlist in MirrorBrain should not be hard to implement. I spent some time on it today and got quite far; configuring mapping of URL query arguments to directories/files is done, and actual mapping works. I chose Apache config as vehicle for that, and the following is a working config:

MirrorBrainYumDir release=(5.5) \ repo=(os|extras|addons|updates|centosplus|contrib) \ arch=x86_64 \ $1/$2/x86_64 repodata/repomd.xml

For instance, $1/$2/x86_64 is the base URL to a repository, and the match groups can optionally be replaced with what the client specified to the query arguments. ($1 is the first group from the configuration line, $2 the second, and so on. The names and number of query args are all arbitrary.) The last argument is a relative path, and the file that must be present on eligible mirrors. The resulting path here would be e.g. 5.5/os/x86_64/repodata/repomd.xml, and the client would get a list of mirrors in the form of http://mirror.example.com/path/to/centos/5.5/os/x86_64/ (That's what's missing to be implemented, but it's the easiest part :-) So I'm confident that I can promise Yum mirror list soon. Maybe I can finish it this week, maybe the week after, I don't now.

Meanwhile, I would appreciate input from you: is this reasonable? Would it serve your needs?

If it does, I think the only feature in missing in MirrorBrain for you would be sorted out.

(Needless to say that the mirror list that yum gets will be sorted by suitability of the mirrors)

So, on to the other issues that were raised in the meeting.

Summarizing what I heard, the following are the problems that you would like to solve:

scalability

cleaning up the historic DVD/nonDVD setup

partial mirroring

finer mirror selection (by prefix, autonomous system, state/region, in

addition to country/continent) 5) consistency problems 6) content verification 7) (presumably) backwards compatibility to existing installations 8) (maybe) satellite setups

scalability

The dimensions are:

70.000 files in 500 directories

...
400 mirrors

40 requests per second

Sounds fine from my point of view. MB has handled more files, and more requests. The number of mirrors I have run it with was smaller, 150 at most, but I wouldn't expect big problems. The little mirrorprobe that runs every minute might run into a system limit when starting 400 threads, to check all mirrors at the same time, so maybe it needs to be tweaked, or changed to a different model, using a pool of threads or starting some processes as well.

cleaning up the historic DVD/nonDVD setup

Sounds like a good idea :-)

partial mirroring

Supported well by MirrorBrain.

finer mirror selection (by prefix, autonomous system, state/region, in

addition to country/continent)

MirrorBrain uses BGP/routing data to find out the network prefix and AS of clients and mirrors, and matches them. Other criteria are GeoIP country and continent. The closest match is used for mirror selection. If several mirrors are there to choose from, a weighted randomization is also applied, to be able to give some mirrors more requests and others less. We talked in our meeting about the need for a smarter selection in e.g. the US, where one doesn't want to be sent from one coast to the other. GeoIP regions were discussed for this. I considered going that route, but decided to implement a different concept, which I believe is more widely useful, because it works also when no mirror within the same state/region is found: using geographical distance between the client and the mirrors. I just released this new feature into the wild: http://mirrorbrain.org/news/2140-takes-geographical-distances-account/ You can try it out http://download.services.openoffice.org/files/stable/3.2.1/OOo-SDK_3.2.1_Lin... and feedback is appreciated.

consistency problems

Regarding problems with consistency of trees on mirrors / clients accessing them, this is indeed a hard problem to solve. From discussions with Fedora people I know that they also have/had major fights with that. It took me a long time to finally get this sorted out when I still worked on the openSUSE infrastructure. The following have proved useful for me in the past:

Always take care of setting appropriate cache headers. must-revalidate

is the key, because it doesn't prevent caching, but causes clients (and intermediaries) to always validate that a resource is still fresh.

It is hopeless to get all mirrors to run the same configuration in this regard, and there are also some FTP mirrors (and FTP doesn't have a feature to control caching at all), so for certain content, there is no other option than delivering it from defined places _with_ proper headers. Luckily, this concerns mostly small metadata files.

This is against inconsistency as it happens when things come from different places (different age). If cache control is not exerted by the server (or client), intermediaries (web caches) commonly "guess" how long they should deliver stuff from their cache, without revalidation freshness. Typically, a squid assumes freshness for 4-18 hours by default, and the exact time is hard to predict, because cache pruning is complex and may take file size into account. Thus, it is inevitable that clients see an inconsistent picture.

The second (and even more important) measure is to version metadata.

Actually, any data. Always and Everywhere. With RPMs, one is in the lucky situation that this is usually done anyway (reliably increasing version/release numbers with each rebuild). Exception files like "MD5SUMS" definitely need to be treated separately and should never be redirected to a mirror, not only for security reasons. repo-md metadata, as used by Yum, exists in various incarnations. Unfortunately, the ones I dealt with in the past were not versioned, and files had names like "filelists.xml.gz", which leaves only non-redirection as the only 100% solution. (So I did that.) Nowadays, at least the repo-md metadata that the Fedora and openSUSE people build is versioned, as can be seen in this example: http://download.opensuse.org/repositories/Apache:/MirrorBrain/Apache_openSUS... I suppose that createrepo does that these days. Anyway, this is certainly a point where tight cooperation (and appropriate input) with the build system folks is very important.

A third line of "defense" can be a client that double-checks itself

that it doesn't get old metadata, by checking with cryptohashes if the download is the expected one, _and_ falls back to a different mirror if it isn't the case. That's what Yum does, since MirrorManager sends hashes/timestamps via Metalinks, and what Zypper does, since it uses a Metalink client for all downloads that allows it to fall back to other mirrors until it got the expected data. You won't be able to do something fancy like that with CentOS 5 I guess, but maybe with the next version. (Actually, it's not that difficult to teach Yum using a Metalink client -- I once tried it out, and it was a one-liner to replace its usage of python-urlgrabber with a call to aria2c (powerful Metalink client) for all downloads. Another great option would be to extend python-urlgrabber to be a Metalink client.)

That's what I learnt anyway... maybe some of it can be useful to you. Verifying 400 mirrors in realtime is no option, with our limited means, IMO -- simply not doable. Of course, if anyone knows how to do that, I am *very* interested :-)

content verification

Regarding content verification: I don't know how you currently check exactly, but what can be done with MirrorBrain is:

there currently is a tool for downloading a file from one or all mirrors

and displaying a hash of it.

this obviously doesn't work well for huge files (DVDs) (if it's not

about a close, fast mirror).

since recently, MB can keep all hashes of all files in a database. The

hashes include block (piece-wise) hashes. It would be fairly easy to fetch the hash of a random block (or a defined one) and download just that piece from all mirrors. (Since the hashes in the database are retrievable from everywhere, such checkers could also run _very_ distributed in fact.) If you look at http://download.documentfoundation.org/libreoffice/testing/3.3.0-beta2/rpm/x... there is various metadata, including the block hashes inside the linked IETF Metalink in the form of XML.

I'm open (and happy) to implement more means of content verification. So far, I either didn't have more need for it, or time was lacking. But it would be very useful. I just would like to point out that I see a need for it mainly for debugging purposes, when something goes wrong, and not as a security measure. Content verification is too easy to spoof as to significantly trust it. It is much more important to give clients the top hash from a trusted source, maybe even over TLS-encrypted web server, and rely on cryptographic signatures for the rest (which is easy with RPM, luckily).

In the context of file-tree consistency and content verification, I should note that verifying only certain critical files might not prevent that a mirror is "half synced", and thus inconsistent. I think that running something after syncing is a smart way to discover the moment when the mirror is "ready". That's where MirrorManager is very clever.

I wondered if there is a crucial file that can be used as "marker" to determine whether a mirror is up to date or not. A timestamp file might work, but maybe there need to be several of them, in different parts of the tree, if some setups are complex and sync parts of the tree with different scripts. MirrorBrain can also download files from mirrors (to look at the timestamp content), but that said, one wouldn't want to disable a mirror necessarily when it hasn't synced since a day, when it is still up to date (when no new content has come, except new timestamps). Or how do you handle this?

I was tossing around the idea whether the mirror scanner should integrate such a timestamp check, maybe comparing the timestamp of a certain "marker" file on the mirror with the known timestamp in its database. But I'm not clear yet where this would lead and how it could be made useful.

...Maybe the mirror scanner should simply check all repodata/repomd.xml files in the tree frequently, comparing with the current version. With the yum mirrorlist implementation described above, it would be easy to have only mirrors end up on the lists that are known to have the current file.

backwards compatibility to existing installations

I don't see an issue, once mirror lists work. However, I know much to few things about CentOS. :-)

BTW, one idea for the future, that I would like to at least mention, is that you could change Yum to contact a/the redirector for each request, instead of only in the beginning. I cannot judge if that would be better or worse -- I use Yum since many years, but always in that mode, and not with the mirror lists that you guys use. Anyway, that would give you more control over what Yum downloads where, let alone because of the ability of exerting proper cache control. It's also good for security if critical hash files (those containing the top hash) are downloaded from a trusted server only.

(maybe) satellite setups

Here I didn't get the details.

Curious what you think about all this.

Again, sorry sorry sorry for the long mail.

Thanks, Peter

CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror

Claire M. Connelly

9 Nov 9 Nov

6:49 p.m.

"JK" == Jim Kusznir jkusznir@gmail.com

JK> I mirror for a few different distros, and ubuntu's mirror JK> manager is quite poor as well. I have an account, but JK> can't get to it. When I fail a test of some sort, I get a JK> not-very-useful e-mail, and no way to get more info on JK> what happened. I usually end up just "waiting it out". JK> It would be nice if I get an e-mail allerting me to JK> something being wrong, and then allowing me to log in and JK> see.

I agree that having all the information in the e-mail message would be nice, but I haven't had problems getting that information From the log files in my Ubuntu Launchpad account -- maybe you should try to get them to reset your password or something?

JK> I also like being able to specify some IP ranges I'm JK> authoritative for. As my mirror is on a university JK> campus, I'd love to be able to enter my campus' IP ranges, JK> and that way ensure that all my campus gets my mirror. So JK> far, none of the OSes I mirror for (I don't mirror Fedora JK> presently) allows me to do that.

Very much agreed -- we do just that for Fedora (which makes it easy). For CentOS, I just deploy replacement *.repo files that point to our mirrors, as there doesn't seem to be any other way to ensure that our machines pull from our mirrors, but it would be nice for more casual users to just get updates from the local mirror without having to understand how YUM works.

Claire

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Claire Connelly cmc@math.hmc.edu System Administrator (909) 621-8754 Department of Mathematics Harvey Mudd College *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* For System News: http://www.math.hmc.edu/computing/news/ or http://twitter.com/hmcmathcomp/. *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*

Peter Pöml

10:52 p.m.

Hi Jim,

thanks for your detailed feedback.

Am 09.11.2010 um 00:44 schrieb Jim Kusznir:

...

For me as a mirror admin, the only feature I don't like about MirrorBrain is that I don't have the ability to log in and "check on" or admin my mirror.

Does that mainly concern you when something stops working, and you wonder why?

From administering different MirrorBrain setups for a few years, I can say that this is not a question that pops up frequently. One of the most frequently asked questions here on this list, why someone's mirror does not get requests, never occured to me. Maybe MirrorBrain does less checking than other frameworks? Anyway, MirrorBrain does not disable all redirection to a mirror just because some files are not yet there.

The primary need I saw so far for mirror admins to change something would be to modify the URL their mirror is reachable at, and (in some cases) adjust the amount of requests they are assigned. Also, it can be convenient to be able to temporarily switch off redirection completely. Looking at scan/monitoring logs would come to mind as well. Triggering a scan. (Other suggestions?)

In order to make these changes modifiable to the mirrors' admins, the main obstacle would be to set up a user account handling. Some projects already have a system for that, so a MirrorBrain setup could be connected to it. Where such a framework doesn't exist yet, MirrorBrain would need its own system, and I was wondering how to best implement that. I think the three options would be 1) a simple self-contained system, 2) using OpenID, so existing Google/Yahoo/AOL/whatever accounts could be used. I would prefer the latter, but it is technically challenging enough for me to implement that it is not a matter of a few hours.

...

I mirror for a few different distros, and ubuntu's mirror manager is quite poor as well. I have an account, but can't get to it. When I fail a test of some sort, I get a not-very-useful e-mail, and no way to get more info on what happened. I usually end up just "waiting it out". It would be nice if I get an e-mail allerting me to something being wrong, and then allowing me to log in and see.

Sending out an (informative) email when something goes wrong is indeed a good idea.

...

I also like being able to specify some IP ranges I'm authoritative for. As my mirror is on a university campus, I'd love to be able to enter my campus' IP ranges, and that way ensure that all my campus gets my mirror. So far, none of the OSes I mirror for (I don't mirror Fedora presently) allows me to do that.

Regarding this, I would like to question the need for such manual configuration. centos.eecs.wsu.edu is your mirror, right? Without any configuration, MirrorBrain would send you all requests from clients out of 134.121.0.0/16 (if there isn't any mirror in the same network of course). If a client is not in that particular network, but within AS10430, it would still get sent to your mirror -- if there is no other mirror in that autonomous system. Would there be a second mirror in your autonomous system? That's the question. If not, everything would happen automatically anyway. No need to juggle lists of network prefixes. (And no need to make such configuration accessible, which could result in a security issue after all, if not handled carefully.)

So far, I didn't encounter a case where clients are outside the network prefix of a mirror, but within the same AS, and there is a second mirror in that AS -- so there was no need to add a way to specify network prefixes at all.

However, if you see the need, it would be easy to implement. (In the same way, one could define other autonomous systems to be handled by a mirror.)

...

MirrorBrain sounds like it has a lot of the functionality, but only available to the distro managers. They're busy people; I'd rather not bother them if I can handle stuff myself.

--Jim

Thanks again for your feedback, Peter

Peter Pöml

11:06 p.m.

A short addition...

Am 09.11.2010 um 23:52 schrieb Peter Pöml:

...

...
I also like being able to specify some IP ranges I'm authoritative for. As my mirror is on a university campus, I'd love to be able to enter my campus' IP ranges, and that way ensure that all my campus gets my mirror. So far, none of the OSes I mirror for (I don't mirror Fedora presently) allows me to do that.

Regarding this, I would like to question the need for such manual configuration. centos.eecs.wsu.edu is your mirror, right? Without any configuration, MirrorBrain would send you all requests from clients out of 134.121.0.0/16 (if there isn't any mirror in the same network of course). If a client is not in that particular network, but within AS10430, it would still get sent to your mirror -- if there is no other mirror in that autonomous system. Would there be a second mirror in your autonomous system? That's the question. If not, everything would happen automatically anyway. No need to juggle lists of network prefixes. (And no need to make such configuration accessible, which could result in a security issue after all, if not handled carefully.)

So far, I didn't encounter a case where clients are outside the network prefix of a mirror, but within the same AS, and there is a second mirror in that AS -- so there was no need to add a way to specify network prefixes at all.

However, if you see the need, it would be easy to implement. (In the same way, one could define other autonomous systems to be handled by a mirror.)

I actually forgot about the latest feature, which helps even one step further: Provided that GeoIP works, two mirrors within the same AS would be prioritized by geographical distance to the client. This should take care of most other cases...

A more challenging case is when clients are connected through IPs that use address space allocated in another country. For example, clients in Europe using a VPN to their employer could be geolocated (by GeoIP) to the US, because their employer (the formal network operator) is based there. Many corporations own several network prefixes, but not all of them are physically in the same country. GeoIP typically misses these extra networks. With a mirror present in those networks/ASs, that's no problem. But otherwise it can be interesting. Does anyone have this problem here?

Peter

Randy McAnally

10 Nov 10 Nov

1:22 a.m.

...

Regarding this, I would like to question the need for such manual configuration. centos.eecs.wsu.edu is your mirror, right? Without any configuration, MirrorBrain would send you all requests from clients out of 134.121.0.0/16 (if there isn't any mirror in the same network of course). If a client is not in that particular network, but within AS10430, it would still get sent to your mirror -- if there is no other mirror in that autonomous system. Would there be a second mirror in your autonomous system? That's the question. If not, everything would happen automatically anyway. No need to juggle lists of network prefixes. (And no need to make such configuration accessible, which could result in a security issue after all, if not handled carefully.)

So far, I didn't encounter a case where clients are outside the network prefix of a mirror, but within the same AS, and there is a second mirror in that AS -- so there was no need to add a way to specify network prefixes at all.

We have two mirrors in our AS, which has several IP ranges split across two datacenters over 2000 miles apart. Being able to specify a CIDR for each mirror would be nice since there's no guarantee an AS lookup alone is going to get it right.. but I suppose as long as BOTH mirrors are returned on a mirror list, the fastest mirror plug in could easily choose the correct mirror.

-Randy

Peter Pöml

2:54 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

Hi Randy,

On Tue, Nov 09, 2010 at 08:22:10 -0500, Randy McAnally wrote:

...

...
Regarding this, I would like to question the need for such manual configuration. centos.eecs.wsu.edu is your mirror, right? Without any configuration, MirrorBrain would send you all requests from clients out of 134.121.0.0/16 (if there isn't any mirror in the same network of course). If a client is not in that particular network, but within AS10430, it would still get sent to your mirror -- if there is no other mirror in that autonomous system. Would there be a second mirror in your autonomous system? That's the question. If not, everything would happen automatically anyway. No need to juggle lists of network prefixes. (And no need to make such configuration accessible, which could result in a security issue after all, if not handled carefully.)

So far, I didn't encounter a case where clients are outside the network prefix of a mirror, but within the same AS, and there is a second mirror in that AS -- so there was no need to add a way to specify network prefixes at all.

We have two mirrors in our AS, which has several IP ranges split across two datacenters over 2000 miles apart. Being able to specify a CIDR for each mirror would be nice since there's no guarantee an AS lookup alone is going to get it right.. but I suppose as long as BOTH mirrors are returned on a mirror list, the fastest mirror plug in could easily choose the correct mirror.

Might be that your case would be handled just automatically. Are 208.85.240.29 & 208.85.242.118 your two mirrors? (That's what mirror.fast-serv.com resolves to.)

Okay, let's see :-)

The two mirrors are in AS29889, for which the following prefixes are announced:

select * from pfx2asn where asn = 29889; pfx | asn -----------------+------- 74.115.208.0/22 | 29889 74.115.212.0/22 | 29889 208.85.240.0/23 | 29889 208.85.242.0/23 | 29889 209.9.238.0/24 | 29889

This is what GeoIP currently has to say about these network:

1) 74.115.208.0/22 AS TW Taipei 2) 74.115.212.0/22 NA US Maryland Crownsville 39.030102,-76.606903 3) 208.85.240.0/23 NA US Maryland Crownsville 39.030102,-76.606903 4) 208.85.242.0/23 NA US California Escondido 33.134899,-117.041603 5) 209.9.238.0/24 NA US Virginia Herndon 38.984100,-77.382698

So we have:

1) I'm not sure if that is a GeoIP glitch regarding 74.115.208.0/22, claiming that it is in Taipei? Might that be wrong? Anyway, clients from that range will be sent to one of your two mirrors, because they are in the same AS (unless another mirror in the same IP range exists).

2) 74.115.212.0/22 clients will be sent to your mirror in Maryland. None of your mirrors is in the same IP range, both are in the same AS, and the Maryland mirror is preferred because it is geographically closer. Which should be what you want, right?

3) 208.85.240.0/23 clients will be sent to your mirror in Maryland, because it is in the same IP range.

4) 208.85.242.0/23 clients will be sent to your mirror in California, because it is in the same IP range.

5) 209.9.238.0/24 clients are sent to your mirror in Maryland. None of the two mirrors is in that IP range, but both are in the same AS, and the Maryland is geographically much closer, while the other one is over 2000 miles away. MirrorBrain picks the one in Maryland therefore.

Is that the situation, or did I guess it incorrectly? Are there further networks that need consideration? Are the results which I described above how you want it to be?

Thanks, Peter

Jonathan Thurman

5:07 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

...

Might be that your case would be handled just automatically.

[snip]

...

Is that the situation, or did I guess it incorrectly? Are there further networks that need consideration? Are the results which I described above how you want it to be?

While these examples are very interesting and show a lot of thought in the design of Mirror Brain, I think you may be missing the point. I see it as being able to control your own traffic. Right now there is really no control for anyone without going through the primary contacts. While it appears that 95% of the time no manual modification would be required, there will always be exceptions people would like to make for some obscure reason. In the last example (trimmed out) the GeoIP data might show two sites as closer, but in reality the WAN connections between them may be really different.

For example: GeoIP/AS data doesn't let you see peering relationships. From a cost point of view, it may be more beneficial to have a netblock under your AS use a mirror over a peer connection than your own mirror over transit/T-1 backside connection.

This is one area where MirrorManager also falls short. You can specify your own mirrors, but not those of peers to use as a primary/secondary for a netblock.

-Jonathan

Randy McAnally

10:20 p.m.

...

Might be that your case would be handled just automatically. Are 208.85.240.29 & 208.85.242.118 your two mirrors? (That's what mirror.fast-serv.com resolves to.)

Okay, let's see :-)

The two mirrors are in AS29889, for which the following prefixes are announced:

select * from pfx2asn where asn = 29889; pfx | asn -----------------+------- 74.115.208.0/22 | 29889 74.115.212.0/22 | 29889 208.85.240.0/23 | 29889 208.85.242.0/23 | 29889 209.9.238.0/24 | 29889

This is what GeoIP currently has to say about these network:

74.115.208.0/22 AS TW Taipei

74.115.212.0/22 NA US Maryland Crownsville 39.030102,-76.606903

208.85.240.0/23 NA US Maryland Crownsville 39.030102,-76.606903

208.85.242.0/23 NA US California Escondido 33.134899,-117.041603

209.9.238.0/24 NA US Virginia Herndon 38.984100,-77.382698

So we have:

I'm not sure if that is a GeoIP glitch regarding 74.115.208.0/22,

claiming that it is in Taipei? Might that be wrong? Anyway, clients from that range will be sent to one of your two mirrors, because they are in the same AS (unless another mirror in the same IP range exists).

74.115.212.0/22 clients will be sent to your mirror in Maryland. None

of your mirrors is in the same IP range, both are in the same AS, and the Maryland mirror is preferred because it is geographically closer. Which should be what you want, right?

208.85.240.0/23 clients will be sent to your mirror in Maryland,

because it is in the same IP range.

208.85.242.0/23 clients will be sent to your mirror in California,

because it is in the same IP range.

209.9.238.0/24 clients are sent to your mirror in Maryland. None

of the two mirrors is in that IP range, but both are in the same AS, and the Maryland is geographically much closer, while the other one is over 2000 miles away. MirrorBrain picks the one in Maryland therefore.

Is that the situation, or did I guess it incorrectly? Are there further networks that need consideration? Are the results which I described above how you want it to be?

Other than the taiwan glitch (should be california) looks good. I was thinking there may have been some aggregation on the CIDRs which isn't the case (good!).

Like I mentioned previously, as long as both our mirrors are returned for any IP in AS29889 fastestmirror will easily be able to choose the correct mirror for our clients regardless of GeoIP issues.

-Randy

Jim Kusznir

6:46 p.m.

Your thoughts are helpful.

yes, I would like to log in and view test results, logs, and any info MB has on my mirror (any deficiencies it sees or checks I'm failing). I would also like the ability to initiate an immediate test so if I've corrected something, I can get MB to know about it.

As to the IPs, I'm not familiar enough with ASes yet, but I know there is another subnet that we're responsible for: 69.166.0.0/16. I don't know if that is included in the AS you mention, or a seperate one...but both are on campus IP ranges, and should be directed to my mirror. There are no other mirrors presently.

Secondarily, our campus has a dedicated fiber link to University of Idaho (uidao.edu) which also runs a mirror. I'd like the second choice (or perhaps in rotation with) our IPs to point to them. In the mirror admin system, I'd have to request that op enter that as well, but that's doable. In the end, this should prevent any mirror requests from using up our outbound Internet bandwidth, which is the end goal. I'm suspicious as to how well the AS system knows what's plugged into what and what costs and doesn't cost, and therefore its ability to choose proper mirrors for IPs.

For example, at home (On Time Warner cable), I've found that the uidaho mirror is by far the best mirror for me...but I doubt its the "closest" via AS/network topology. I don't expect MB to be able to fix this of course....

--Jim

On Tue, Nov 9, 2010 at 2:52 PM, Peter Pöml peter@poeml.de wrote:

...

Hi Jim,

thanks for your detailed feedback.

Am 09.11.2010 um 00:44 schrieb Jim Kusznir:

...
For me as a mirror admin, the only feature I don't like about MirrorBrain is that I don't have the ability to log in and "check on" or admin my mirror.

Does that mainly concern you when something stops working, and you wonder why?

...
From administering different MirrorBrain setups for a few years, I can say that this is not a question that pops up frequently. One of the most frequently asked questions here on this list, why someone's mirror does not get requests, never occured to me. Maybe MirrorBrain does less checking than other frameworks? Anyway, MirrorBrain does not disable all redirection to a mirror just because some files are not yet there.

The primary need I saw so far for mirror admins to change something would be to modify the URL their mirror is reachable at, and (in some cases) adjust the amount of requests they are assigned. Also, it can be convenient to be able to temporarily switch off redirection completely. Looking at scan/monitoring logs would come to mind as well. Triggering a scan. (Other suggestions?)

In order to make these changes modifiable to the mirrors' admins, the main obstacle would be to set up a user account handling. Some projects already have a system for that, so a MirrorBrain setup could be connected to it. Where such a framework doesn't exist yet, MirrorBrain would need its own system, and I was wondering how to best implement that. I think the three options would be 1) a simple self-contained system, 2) using OpenID, so existing Google/Yahoo/AOL/whatever accounts could be used. I would prefer the latter, but it is technically challenging enough for me to implement that it is not a matter of a few hours.

...
I mirror for a few different distros, and ubuntu's mirror manager is quite poor as well. I have an account, but can't get to it. When I fail a test of some sort, I get a not-very-useful e-mail, and no way to get more info on what happened. I usually end up just "waiting it out". It would be nice if I get an e-mail allerting me to something being wrong, and then allowing me to log in and see.

Sending out an (informative) email when something goes wrong is indeed a good idea.

...
I also like being able to specify some IP ranges I'm authoritative for. As my mirror is on a university campus, I'd love to be able to enter my campus' IP ranges, and that way ensure that all my campus gets my mirror. So far, none of the OSes I mirror for (I don't mirror Fedora presently) allows me to do that.

Regarding this, I would like to question the need for such manual configuration. centos.eecs.wsu.edu is your mirror, right? Without any configuration, MirrorBrain would send you all requests from clients out of 134.121.0.0/16 (if there isn't any mirror in the same network of course). If a client is not in that particular network, but within AS10430, it would still get sent to your mirror -- if there is no other mirror in that autonomous system. Would there be a second mirror in your autonomous system? That's the question. If not, everything would happen automatically anyway. No need to juggle lists of network prefixes. (And no need to make such configuration accessible, which could result in a security issue after all, if not handled carefully.)

So far, I didn't encounter a case where clients are outside the network prefix of a mirror, but within the same AS, and there is a second mirror in that AS -- so there was no need to add a way to specify network prefixes at all.

However, if you see the need, it would be easy to implement. (In the same way, one could define other autonomous systems to be handled by a mirror.)

...
MirrorBrain sounds like it has a lot of the functionality, but only available to the distro managers. They're busy people; I'd rather not bother them if I can handle stuff myself.

--Jim

Thanks again for your feedback, Peter _______________________________________________ CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror

Randy McAnally

10:24 p.m.

...

Your thoughts are helpful.

yes, I would like to log in and view test results, logs, and any info MB has on my mirror (any deficiencies it sees or checks I'm failing). I would also like the ability to initiate an immediate test so if I've corrected something, I can get MB to know about it.

As to the IPs, I'm not familiar enough with ASes yet, but I know there is another subnet that we're responsible for: 69.166.0.0/16. I don't know if that is included in the AS you mention, or a seperate one...but both are on campus IP ranges, and should be directed to my mirror. There are no other mirrors presently.

Secondarily, our campus has a dedicated fiber link to University of Idaho (uidao.edu) which also runs a mirror. I'd like the second choice (or perhaps in rotation with) our IPs to point to them. In the mirror admin system, I'd have to request that op enter that as well, but that's doable. In the end, this should prevent any mirror requests from using up our outbound Internet bandwidth, which is the end goal. I'm suspicious as to how well the AS system knows what's plugged into what and what costs and doesn't cost, and therefore its ability to choose proper mirrors for IPs.

For example, at home (On Time Warner cable), I've found that the uidaho mirror is by far the best mirror for me...but I doubt its the "closest" via AS/network topology. I don't expect MB to be able to fix this of course....

My only concern is that I wouldn't want just anyone to be able to choose my mirrors as a default. Let's say the new guy accidentally tells his 10,000 servers to hit our california mirror as their primary mirror, we'd be in trouble.

Summary: It should be up to the mirror maintainer to determine what traffic is artificially steered to it, and he should only be only allowed to be steer traffic from IPs under his direct control (AS).

-Randy

Jonathan Thurman

10:52 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

...

My only concern is that I wouldn't want just anyone to be able to choose my mirrors as a default. Let's say the new guy accidentally tells his 10,000 servers to hit our california mirror as their primary mirror, we'd be in trouble.

What's to stop someone from doing this now? It is a risk that you take being a public mirror. Hopefully the only one in trouble is the 'new guy' when you rate/connection limit him and his updates take forever...

...

Summary: It should be up to the mirror maintainer to determine what traffic is artificially steered to it, and he should only be only allowed to be steer traffic from IPs under his direct control (AS).

I completely agree that any manipulation of a netblock need to be somehow validated (whois contact email used for authorizing block addition for example). However I don't believe this should be limited to the mirror maintainer and only the AS/netblocks they control. For example, say a customer of ours wants to point their entire AS at our mirror as we have a dedicated link and plenty of bandwidth. They announce their own AS, so that's out. GeoIP says that the OSU mirror is closer, but it's really a horrible 19 hop slow transit connection that is a much larger financial impact.

So the options are to allow the mirror maintainer to add additional AS / netblocks (with validation), or allow anyone to create an account and change the traffic flows (for validated blocks). I would favor the second option, especially if an OpenID system was used for authentication. The first option doesn't give the actual netblock owner the ability to change their mind. Perhaps as an additional safe-guard against the 10,000 server hit there could be a request sent to the mirror maintainer to approve custom routes to their servers.

-Jonathan

Peter Pöml

12 Nov 12 Nov

11:42 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

Hi!

On Mon, Nov 08, 2010 at 10:02:08AM +0100, Peter Pöml wrote:

...

Having said all that, I thought that Yum mirrorlist in MirrorBrain should not be hard to implement. I spent some time on it today and got quite far; configuring mapping of URL query arguments to directories/files is done, and actual mapping works. I chose Apache config as vehicle for that, and the following is a working config:

MirrorBrainYumDir release=(5.5) \ repo=(os|extras|addons|updates|centosplus|contrib) \ arch=x86_64 \ $1/$2/x86_64 repodata/repomd.xml

For instance, $1/$2/x86_64 is the base URL to a repository, and the match groups can optionally be replaced with what the client specified to the query arguments. ($1 is the first group from the configuration line, $2 the second, and so on. The names and number of query args are all arbitrary.) The last argument is a relative path, and the file that must be present on eligible mirrors. The resulting path here would be e.g. 5.5/os/x86_64/repodata/repomd.xml, and the client would get a list of mirrors in the form of http://mirror.example.com/path/to/centos/5.5/os/x86_64/ (That's what's missing to be implemented, but it's the easiest part :-) So I'm confident that I can promise Yum mirror list soon. Maybe I can finish it this week, maybe the week after, I don't now.

Meanwhile, I would appreciate input from you: is this reasonable? Would it serve your needs?

I finished implementing MirrorBrain's yum mirrorlist support. It's committed to trunk http://svn.mirrorbrain.org/viewvc/mirrorbrain?view=revision&revision=821... and I'm going to release MirrorBrain 2.15.0 soon I think.

(I'll be very busy in the next weeks though, and might not be able to reply for a few days.)

Peter

Peter Pöml

14 Nov 14 Nov

6:14 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

Hi,

MirrorBrain with Yum mirrorlist support is released now. http://mirrorbrain.org/news/2150-support-yum-mirror-lists/

Furthermore, I have set up a test instance on my little host. You find it here: http://centos.mirrorbrain.org/

You can play with it and see if it behaves as expected.

It takes the usual Yum queries, although I should note that I added only some mappings (those that I knew of:

I imported all the mirrors (which I found on the centos website), minus three or four whose entries in the html table were not quite correct. That's 334 mirrors, of which ~323 seem to be reachable. I didn't do a lot of checking obviously. So if your mirror isn't there, or with wrong location, don't panic :-)

Many mirrors don't have any URLs listed than HTTP, so scanning them is rather inefficient, and may not work in some cases. (I implemented scanning of nginx indexes today because there were so many of them. However, it is much more efficient to scan over rsync or FTP.) With a mirror network that large, it is really very helpful when rsync or FTP are available (best is rsync), because resources become a limiting factor with such a large number of mirrors.

Some mirrors were listed with round-robin DNS names, which obviously means that there is no guarantuee that the scanned mirror is the one that a client is redirected to. It is better to handle these mirrors as separate machines (unless it can be guarantueed that they are tightly synced), especially if they are in separate regions (as sometimes is the case). So, the special handling required here has not been considered in my setup yet.

I fixed a number of broken (mostly rsync) URLs.

My setup contains no file hashes at all (the hashes that you might have seen in other MirrorBrain instances). That's because I don't have enough space on my disk for the CentOS file tree -- I have just a pseudo file tree consisting of sparse files filled with zeroes. Thus, no useful hashes can be served of course. But the rest is fully functional.

So, now your feedback would obviously be very interesting!

(Let me know if I should create further query->directory mappings to enable more real-world testing!)

Peter

Adrian Reber

15 Nov 15 Nov

2:46 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

On Sun, Nov 14, 2010 at 07:14:50PM +0100, Peter Pöml wrote:

...

MirrorBrain with Yum mirrorlist support is released now. http://mirrorbrain.org/news/2150-support-yum-mirror-lists/

Furthermore, I have set up a test instance on my little host. You find it here: http://centos.mirrorbrain.org/

That is funny. I just set up a MirrorManager test instance. Only 15 mirrors:

http://134.108.44.54/mm.centos/publiclist/

Examples for the different mirrorlists:

curl "http://134.108.irrorlist.centos?repo=centos-5.5&arch=i386&country=gl..." curl "http://134.108.irrorlist.centos?repo=centos-5.5&arch=i386&country=us" curl "http://134.108.irrorlist.centos?repo=centos-5.5&arch=i386" curl "http://134.108.irrorlist.centos?repo=centos-5.5&arch=i386&ip=8.8.8.8"

Adrian

Carsten Otto

2:50 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

On Mon, Nov 15, 2010 at 03:46:34PM +0100, Adrian Reber wrote:

...

That is funny. I just set up a MirrorManager test instance. Only 15 mirrors:

We (ftp.halifax.rwth-aachen.de) have 10 GBit/sec.

Bye,

-- Carsten Otto otto@informatik.rwth-aachen.de LuFG Informatik 2 http://verify.rwth-aachen.de/otto/ RWTH Aachen phone: +49 241 80-21211

Carsten Otto

2:54 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

On Mon, Nov 15, 2010 at 03:50:56PM +0100, Carsten Otto wrote:

...

We (ftp.halifax.rwth-aachen.de) have 10 GBit/sec.

Sorry - I just understood that this is a test with (more or less) fake data.

-- Carsten Otto otto@informatik.rwth-aachen.de LuFG Informatik 2 http://verify.rwth-aachen.de/otto/ RWTH Aachen phone: +49 241 80-21211

Jonathan Thurman

4:17 p.m.

New subject: IRC meeting regarding new mirroring system for CentOS

________________________________________ From: Adrian Reber [adrian@lisas.de] Sent: Monday, November 15, 2010 6:46 AM

On Sun, Nov 14, 2010 at 07:14:50PM +0100, Peter Pöml wrote:

...

...
MirrorBrain with Yum mirrorlist support is released now. http://mirrorbrain.org/news/2150-support-yum-mirror-lists/

Furthermore, I have set up a test instance on my little host. You find it here: http://centos.mirrorbrain.org/

...

That is funny. I just set up a MirrorManager test instance. Only 15 mirrors:

...

http://134.108.44.54/mm.centos/publiclist/

Any chance of showing people the Admin interface for MirrorManager? Unless people have used the instance for the Fedora Project, they might know all of the options available.

-Jonathan

Nyamul Hassan

4:25 p.m.

On Mon, Nov 15, 2010 at 22:17, Jonathan Thurman JThurman@nwresd.k12.or.uswrote:

...

From: Adrian Reber [adrian@lisas.de] Sent: Monday, November 15, 2010 6:46 AM

On Sun, Nov 14, 2010 at 07:14:50PM +0100, Peter Pöml wrote:

...
...
MirrorBrain with Yum mirrorlist support is released now. http://mirrorbrain.org/news/2150-support-yum-mirror-lists/

Furthermore, I have set up a test instance on my little host. You find it here: http://centos.mirrorbrain.org/

...
That is funny. I just set up a MirrorManager test instance. Only 15 mirrors:

...
http://134.108.44.54/mm.centos/publiclist/

Any chance of showing people the Admin interface for MirrorManager? Unless people have used the instance for the Fedora Project, they might know all of the options available.

-Jonathan

Yes, that would be nice I guess. We mirror Fedora and Ubuntu also, and can say, MirrorManager is an impressive platform for mirror maintainers. You can add multiple mirrors under your organization, and you can even set which IP ranges get automatically routed to these mirrors. You even can put your AS Number into MirrorManager. The public list is generated at regular intervals, and if my mirror is not listed, a quick look at the last crawler (runs every hour) log tells me why it s so. Very helpful for troubleshooting. Even the "report_mirror" script that is run at the end of every rsync is really convenient, as have been pointed out by someone else earlier in this discussion.

The LaunchPad that is used by Ubuntu is really nothing in comparison. We see our mirrors as being labeled as "xxx behind", and the logs are not very useful in finding out why they are so.

I, as a mirror maintainer, think MirrorManager is a platform that we can replicate for CentOS.

Regards HASSAN

Adrian Reber

16 Nov 16 Nov

9:08 a.m.

New subject: IRC meeting regarding new mirroring system for CentOS

On Mon, Nov 15, 2010 at 04:17:26PM +0000, Jonathan Thurman wrote:

...

On Sun, Nov 14, 2010 at 07:14:50PM +0100, Peter Pöml wrote:

...
...
MirrorBrain with Yum mirrorlist support is released now. http://mirrorbrain.org/news/2150-support-yum-mirror-lists/

Furthermore, I have set up a test instance on my little host. You find it here: http://centos.mirrorbrain.org/

...
That is funny. I just set up a MirrorManager test instance. Only 15 mirrors:

...
http://134.108.44.54/mm.centos/publiclist/

Any chance of showing people the Admin interface for MirrorManager? Unless people have used the instance for the Fedora Project, they might know all of the options available.

Ralph got an email from me with the URL to the admin interface with an admin user. So he can have a look at it.

Adrian

5610

Age (days ago)

5637

Last active (days ago)

mirror@lists.centos.org

32 comments

14 participants

tags (0)

participants (14)

Adrian Reber
Bangladeshi CentOS Mirror Maintainer [BD-SERVERS.NET]
Carsten Otto
Claire M. Connelly
J.H.
Jeff Sheltren
Jim Kusznir
Jonathan Thurman
Karanbir Singh
Nyamul Hassan
Peter Pöml
R P Herrold
Ralph Angenendt
Randy McAnally