Hi all:
I've been following the whole DVD iso access questions and such for a while, and while talking to some people on #centos-mirror during the 5.5 push, some thoughts occurred to me.
Presently, it is my understanding that there are two different repos maintained: ones with dvd images, and ones without. This seems needlessly painful and wasteful of the time and resources of our dedicated Centos volunteer administrators. I understand at least some of the reasons it exists: there are some mirrors that don't have the space for the DVDs or for other reasons wish not to carry them. However, times are changing, and I think that many, if not most, centos mirror operators would prefer to carry the DVD isos. In fact, I'd almost expect future major releases of CentOS (6+) to distribute DVD isos instead of the CD disk 1 of xx isos.
So, I propose that the dvd-less mirror system is eliminated (all msync mirrors carry DVDs). No special ACLs either... Its easy to add a --exclude "*DVD*" to the rsync line for those that wish to not carry the DVDs. I also think that with sufficient announcement and time (say 1-3 months), this shouldn't be an issue. My understanding is that officially its required for anyone who operates a CentOS mirror to be subscribed to this list, and thus they should receive the announcement. In addition, a note can be added to the msync MOTD conveying this. If others have problems, I think the CentOS team would have done their due diligence; the fault would rest with those not paying attention to lists. After all, all things change and evolve, and I think this is a natural evolution for the CentOS mirror team.
--Jim Administrator, centos.eecs.wsu.edu.
On 5/19/2010 12:15 PM, Jim Kusznir wrote:
Hi all:
I've been following the whole DVD iso access questions and such for a while, and while talking to some people on #centos-mirror during the 5.5 push, some thoughts occurred to me.
Presently, it is my understanding that there are two different repos maintained: ones with dvd images, and ones without. This seems needlessly painful and wasteful of the time and resources of our dedicated Centos volunteer administrators. I understand at least some of the reasons it exists: there are some mirrors that don't have the space for the DVDs or for other reasons wish not to carry them. However, times are changing, and I think that many, if not most, centos mirror operators would prefer to carry the DVD isos. In fact, I'd almost expect future major releases of CentOS (6+) to distribute DVD isos instead of the CD disk 1 of xx isos.
So, I propose that the dvd-less mirror system is eliminated (all msync mirrors carry DVDs). No special ACLs either... Its easy to add a --exclude "*DVD*" to the rsync line for those that wish to not carry the DVDs. I also think that with sufficient announcement and time (say 1-3 months), this shouldn't be an issue. My understanding is that officially its required for anyone who operates a CentOS mirror to be subscribed to this list, and thus they should receive the announcement. In addition, a note can be added to the msync MOTD conveying this. If others have problems, I think the CentOS team would have done their due diligence; the fault would rest with those not paying attention to lists. After all, all things change and evolve, and I think this is a natural evolution for the CentOS mirror team.
--Jim Administrator, centos.eecs.wsu.edu. _______________________________________________ CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror
I would have to agree. I Have always carried DVD's (private mirror). The size difference isn't that great. I also agree that the msync pool should be open, And it should carry DVD's. If a mirror doesn't want DVD's they can exclude them. Now, Granted, Many mirrors do not carry DVD's as we speak. But I think every mirror should have DVD's. Just my 2 Cents.
On 5/19/2010 12:15 PM, Jim Kusznir wrote:
<snip>
So, I propose that the dvd-less mirror system is eliminated (all msync mirrors carry DVDs). No special ACLs either... It's easy to add a --exclude "*DVD*" to the rsync line for those that wish to not carry the DVDs. I also think that with sufficient announcement and time (say 1-3 months), this shouldn't be an issue. My understanding is that officially its required for anyone who operates a CentOS mirror to be subscribed to this list, and thus they should receive the announcement. In addition, a note can be added to the msync MOTD conveying this. If others have problems, I think the CentOS team would have done their due diligence; the fault would rest with those not paying attention to lists. After all, all things change and evolve, and I think this is a natural evolution for the CentOS mirror team.
I agree. It should be the responsibility of the mirror host to decide what content they will serve, and how much space they can allocate. Making things easier on the msync volunteers is also very important.
Being a newer mirror, I don't know the background of why the DVDs are more restricted. Is it a bandwidth issue for the msync mirrors, or a past practice that hasn't changed?
On 5/19/2010, Nick Olsen wrote:
I would have to agree. I Have always carried DVD's (private mirror). The size difference isn't that great. I also agree that the msync pool should be open, And it should carry DVD's. If a mirror doesn't want DVD's they can exclude them. Now, Granted, Many mirrors do not carry DVD's as we speak. But I think every mirror should have DVD's. Just my 2 Cents.
I don't think that the msync pool should be wide open for anyone to access. Those that are hosting public mirrors of content should have a pool that they can sync to that is restricted, or at least have priority over unknown users. Otherwise it could be more difficult for the public mirror system to stay up to date.
-Jonathan Administrator, mirror.nwresd.org
On 05/19/2010 06:08 PM, Jonathan Thurman wrote:
I don't think that the msync pool should be wide open for anyone to access. Those that are hosting public mirrors of content should have a pool that they can sync to that is restricted, or at least have priority over unknown users. Otherwise it could be more difficult for the public mirror system to stay up to date.
Yeah, thats the main thing - being able to get the rsync tree's out to the public mirrors asap, while still having enough resources within .centos.org.
So here is a question for you - as a mirror admin, would you host an rsync target that msync.c.o could push into ? It could be ither based on a user/pass acl or a key. And we would give you a list of ip's that will push to your machine.
- KB
On 5/19/2010 1:30 PM, Karanbir Singh wrote:
On 05/19/2010 06:08 PM, Jonathan Thurman wrote:
I don't think that the msync pool should be wide open for anyone to access. Those that are hosting public mirrors of content should have a pool that they can sync to that is restricted, or at least have priority over unknown users. Otherwise it could be more difficult for the public mirror system to stay up to date.
Yeah, thats the main thing - being able to get the rsync tree's out to the public mirrors asap, while still having enough resources within .centos.org.
So here is a question for you - as a mirror admin, would you host an rsync target that msync.c.o could push into ? It could be ither based on a user/pass acl or a key. And we would give you a list of ip's that will push to your machine.
- KB
Lets call msync tier 0, Public mirrors tier 1, and private mirrors that pull tier 2. For sake of this next paragraph. If were going to talk ACL's, I think IP based would be the best. And really, I think its the most secure, Without having to get to complicated (keys). Passwords wouldn't work as every tier 1 mirror would need that password (Shared passwords=Bad). If it were IP based, All a tier 1 would have to do is state the IP they would be pulling with for centos to add to the ACL.
On 05/19/2010 06:08 PM, Jonathan Thurman wrote:
I don't think that the msync pool should be wide open for anyone to
access. Those that are hosting public mirrors of content should have a pool that they can sync to that is restricted, or at least have priority over unknown users. Otherwise it could be more difficult for the public mirror system to stay up to date.
Yeah, thats the main thing - being able to get the rsync tree's out to the public mirrors asap, while still having enough resources within .centos.org.
So here is a question for you - as a mirror admin, would you host an rsync target that msync.c.o could push into ? It could be ether based on a user/pass acl or a key. And we would give you a list of ip's that will push to your machine.
I personally would consider push, but there are some major concerns that would have to be addressed.
Our environment doesn't lend itself to this as our mirror is really a load balanced cluster with a node that is designated for syncing. Of course with a little work, the push traffic could be sent to that node.
The major issue with Push is control. When I am pulling updates, I set the times that the pull happens. I can schedule the updates during known low-bandwidth times of the day. I can also specifically exclude things that I don't want to host (I don't, but I could).
I also see this as more work for the msync maintainers.
I do like the idea of key based syncing. I use keys frequently for automation, and find it easier and more secure than maintaining lists of IPs. So msync.centos.org creates a single account for the public mirrors to sync with, and each public mirror provides a key. Just append all of the keys to the authorized_keys file and sync that between the msync servers. When a mirror is added/removed, update the file once and have it sync automatically. No more IP ACLs to worry about, because no one really cares what IP I sync from.
-Jonathan
"KS" == Karanbir Singh mail-lists@karan.org
KS> So here is a question for you - as a mirror admin, would KS> you host an rsync target that msync.c.o could push into ? KS> It could be ither based on a user/pass acl or a key. And KS> we would give you a list of ip's that will push to your KS> machine.
We host a Debian mirror that's pushed; the upstream mirror connects to us using an SSH key that then triggers a script that does the sync.
I would certainly be happy with CentOS doing something similar (subject to some concerns about syncs interfering with one another). I would be less enthusiastic about setting up and maintaining an rsync server that an upstream mirror could sync to directly.
Claire
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Claire Connelly cmc@math.hmc.edu System Administrator (909) 621-8754 Department of Mathematics Harvey Mudd College *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* For System News: http://www.math.hmc.edu/computing/news/ or http://twitter.com/hmcmathcomp/. *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
I concur with what Claire and Jonathan said. And, I also feel the trouble that the current msync admins go through during such massive updates. The "chasmd" project that Karan pointed to in another thread is really interesting, and would make life easy if done properly.
We also don't mind having a push update request. But, we already update every 2 hours, and can even go for every hour, if needed. Would a push also be required for such frequency?
rsync is a great tool, but it hogs down the bandwidth of the server it is syncing from. Is it insane to think about other file-sharing protocols for syncing, like BitTorrent, for keeping your mirrors upto date?
Regards HASSAN
On Thu, May 20, 2010 at 02:36, Claire M. Connelly cmc@math.hmc.edu wrote:
"KS" == Karanbir Singh mail-lists@karan.org
KS> So here is a question for you - as a mirror admin, would KS> you host an rsync target that msync.c.o could push into ? KS> It could be ither based on a user/pass acl or a key. And KS> we would give you a list of ip's that will push to your KS> machine.
We host a Debian mirror that's pushed; the upstream mirror connects to us using an SSH key that then triggers a script that does the sync.
I would certainly be happy with CentOS doing something similar (subject to some concerns about syncs interfering with one another). I would be less enthusiastic about setting up and maintaining an rsync server that an upstream mirror could sync to directly.
Claire
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Claire Connelly cmc@math.hmc.edu System Administrator (909) 621-8754 Department of Mathematics Harvey Mudd College *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* For System News: http://www.math.hmc.edu/computing/news/ or http://twitter.com/hmcmathcomp/. *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror
I actually really like the idea of a push mirror. I've always thought the polling rather inefficient, and can be problematic. It also takes more time to stabilize the mirror tree, and results in a complete loss of control of bandwidth distribution at the msync mirrors, resulting in everyone getting slower speeds, and sync'ing against a different mirror each time (and as we saw, sometimes the mirrors' content isn't stable). A push model would allow the msync servers to exercise more control over their bandwidth utilization, getting more "compete" mirrors out there more quickly. It would also let them establish which public mirrors are getting pushed from which msync mirrors, in case some msync mirrors have a better connection to some sites (I2 mirrors, for example).
Of course, all this is focusing on major releases. The push system is REALLY nice when 'minor updates' are released. A push cycle kicks off to push out the few packages that need updating. No more wasting bandwidth, etc. probing the mirror regularly. And the smaller updates (which would generally not be bandwidth constrained) can get out there much faster, resulting in a more stable mirror tree.
So, in short, I would like to see push mirroring for my mirror.
--Jim
On Wed, May 19, 2010 at 10:30 AM, Karanbir Singh mail-lists@karan.org wrote:
On 05/19/2010 06:08 PM, Jonathan Thurman wrote:
I don't think that the msync pool should be wide open for anyone to access. Those that are hosting public mirrors of content should have a pool that they can sync to that is restricted, or at least have priority over unknown users. Otherwise it could be more difficult for the public mirror system to stay up to date.
Yeah, thats the main thing - being able to get the rsync tree's out to the public mirrors asap, while still having enough resources within .centos.org.
So here is a question for you - as a mirror admin, would you host an rsync target that msync.c.o could push into ? It could be ither based on a user/pass acl or a key. And we would give you a list of ip's that will push to your machine.
- KB
CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror
Hi,
[resending, after realizing that I was subscribed with an old address]
On Fri, May 21, 2010 at 08:22:41AM -0700, Jim Kusznir wrote:
I actually really like the idea of a push mirror. I've always thought the polling rather inefficient, and can be problematic. It also takes more time to stabilize the mirror tree, and results in a complete loss of control of bandwidth distribution at the msync mirrors, resulting in everyone getting slower speeds, and sync'ing against a different mirror each time (and as we saw, sometimes the mirrors' content isn't stable). A push model would allow the msync servers to exercise more control over their bandwidth utilization, getting more "compete" mirrors out there more quickly. It would also let them establish which public mirrors are getting pushed from which msync mirrors, in case some msync mirrors have a better connection to some sites (I2 mirrors, for example).
Of course, all this is focusing on major releases. The push system is REALLY nice when 'minor updates' are released. A push cycle kicks off to push out the few packages that need updating. No more wasting bandwidth, etc. probing the mirror regularly. And the smaller updates (which would generally not be bandwidth constrained) can get out there much faster, resulting in a more stable mirror tree.
So, in short, I would like to see push mirroring for my mirror.
(Real) push syncing is a very powerful tool, which I have worked extensively with openSUSE in the past. For extreme cases, it scales better than anything else.
Recently, I set up a completely new mirror infrastructure for the Document Foundation (http://www.documentfoundation.org/) which grew to about 50 mirrors in just a few days. To about 10 of the mirrors, I can push content directly. All I can say is that it is a blessing, for me as content provider.
It is not an option for every mirror, because of various site specific restrictions that are in place here or there. But that is no problem, because the fact that I can change content on even some mirrors is already extremely helpful for the content provider. It allows me to do things quickly that take hours of waiting otherwise, e.g. when moving files around (which doesn't occur frequently, but it can happen).
All in all, I would summarize the advantages as this:
- allows for timely syncs without unnecessary delays - controlling the order in which things are synced (e.g. rpms before metadata, or deletion of old metadata in second step) - instant publication of staged content when I release it - instant redirections to a mirror once a file arrived there - instant stopping of redirections when I delete files from mirrors - the possibility to corrent some things almost instantly, when something has gone wrong.
It also means that mirrors don't have to take care of setting up a periodic sync, locking, and unnecessary syncs can be avoided.
For mirrors that are far away, where it takes long to get them uptodate, it is good to start syncing timely (and not 4-8 hours later).
Again, it doesn't matter if this method is not used with all mirrors -- for me as content provider it helps a lot if just some mirrors can be synced this way. The background is that a content provider is really "helpless" when certain files are on _no_ mirror yet, because the own bandwidth is limited. Having like 10 mirrors to sync instantly helps a lot because it immediately allows me to redirect, keeping traffic for essential things. And other mirrors quickly catch up.
Push syncing is obviously very useful to prime the tier 1 mirrors.
Technically, where mirror admins were interested in pushing, granting rsync write access has been acceptable to them, restricting access by IP and/or password (with rsync over ssh as option, but seldomly used).
(A while ago, I started working on a framework to handle automate push syncing, but it is making extremely slow progress. If someone would be interested in working together I would be very happy.)
(And I keep telling myself that I ought to learn more about the way that Debian handles this - I believe they simply cascade triggered pull syncs, and I'm sure that also works well.)
Peter
On Wed, 19 May 2010, Karanbir Singh wrote:
On 05/19/2010 06:08 PM, Jonathan Thurman wrote:
I don't think that the msync pool should be wide open for anyone to access. Those that are hosting public mirrors of content should have a pool that they can sync to that is restricted, or at least have priority over unknown users. Otherwise it could be more difficult for the public mirror system to stay up to date.
Yeah, thats the main thing - being able to get the rsync tree's out to the public mirrors asap, while still having enough resources within .centos.org.
So here is a question for you - as a mirror admin, would you host an rsync target that msync.c.o could push into ? It could be ither based on a user/pass acl or a key. And we would give you a list of ip's that will push to your machine.
I think closing the msync machines (tier 0, in Fedora-speak) to the general public (at least for rsync) is probably a good idea. It would allow more bandwidth and connections to be used by public tier-1s. People wanting to create a new tier-1 can get their initial sync from another tier-1.
I have reservations about requiring push mirroring. The main advantage I see with push is that an rsync is only started when there is new content. It would reduce the load on the tier-0s when there is no new data.
I see two downsides, however. First, I can't coordinate when my server syncs from different projects. Currently, I know that (for example) CentOS and Fedora won't try to update at the same time, because I control when those syncs start. I lose that with push.
The second concern is the security aspect. To allow push, I have to open ssh to machines outside my network and outside my control. I don't know how happy my security folks will be with that.
I think it would be better to make push mirroring an option, rather than a requirement.
DR
Hi,
On 05/19/2010 05:15 PM, Jim Kusznir wrote:
Presently, it is my understanding that there are two different repos maintained: ones with dvd images, and ones without.
The reasons for this split are mostly due to issues that have been a major factor in msync setup in the past, many of the issues are no longer relevant. We have been talking about refactoring the entire setup and over the next few weeks will start the process of.
In fact, I'd almost expect future major releases of CentOS (6+) to distribute DVD isos instead of the CD disk 1 of xx isos.
At the present moment, this is speculative, I dont think we should predecide how centos-6 or even > 5.5 are going to shape up, but we should make sure that we keep the doors open for any major change that comes in - or hasto be brought in.
So, I propose that the dvd-less mirror system is eliminated (all msync mirrors carry DVDs). No special ACLs either...
I dont think ACL's should go at all. I think we need to have a good system in place, that makes it possible for large ublic mirrors to not need to contest with smaller localised private mirrors in order to get the tree out there, as soon as we are able to = and do that in a stable, sane manner.
One of the options that is on the cards is to reduce the number of machines we have in msync down to maybe 8 - 10, and have them serve up a public rsync targets, while we move a bulk ( 20 to 25 odd ) of the msync machines into a private push only network, so in order to recieve the tree from these machines, admins would need to host a key and allow rsync from specific IP's. The exact details of how that might work, or even IF we want to consider that, need to be worked out - but its one of the options to consider.
One thing that we all need to keep in mind is that the .centos.org network of machines is hosted almost exclusively out of donor machines, running in DC's run by hosting companies and we rarely ever get more than 60 - 70mbps out of a single machine. There are a few exceptions, but only a 'few'. So we ideally want to focus on pushing to public mirrors with as much b/w as we can - and have the user-end of the spectrum pull from these public mirrors. And I dont see how to achieve something like that without ACL's in place.
- KB
Sorry, I need to clarify one point of my original post.
When I talk about no ACLs, I didn't mean that there were no restrictions at all; I meant no special ACLs such that "this host gets the mirror content modified in this way; that host gets it modified that way...". Either fully open, or just a allow/deny based on public mirrors that have registered with the centos mirrors team.
In my opinion, I don't see a reason for non-public mirrors to ever talk to msync. There are enough of the "tier 2" mirrors (non-msync mirrors) that allow rsync, etc. that people should pull from them. Many of them even have considerably more bandwidth than the msync ones do (I regularly serve 40MB/s, and have gone as high as 60MB/s, and could probably do more...). Many of these public mirrors also run rsync, as I now do, so its easy to pull from them. For a couple years, I ran a private/local-only mirror, and I always just rsync'ed against a tier 2 mirror (such as osuosl.org).
--Jim
On Wed, May 19, 2010 at 10:26 AM, Karanbir Singh mail-lists@karan.org wrote:
Hi,
On 05/19/2010 05:15 PM, Jim Kusznir wrote:
Presently, it is my understanding that there are two different repos maintained: ones with dvd images, and ones without.
The reasons for this split are mostly due to issues that have been a major factor in msync setup in the past, many of the issues are no longer relevant. We have been talking about refactoring the entire setup and over the next few weeks will start the process of.
> In fact,
I'd almost expect future major releases of CentOS (6+) to distribute DVD isos instead of the CD disk 1 of xx isos.
At the present moment, this is speculative, I dont think we should predecide how centos-6 or even > 5.5 are going to shape up, but we should make sure that we keep the doors open for any major change that comes in - or hasto be brought in.
So, I propose that the dvd-less mirror system is eliminated (all msync mirrors carry DVDs). No special ACLs either...
I dont think ACL's should go at all. I think we need to have a good system in place, that makes it possible for large ublic mirrors to not need to contest with smaller localised private mirrors in order to get the tree out there, as soon as we are able to = and do that in a stable, sane manner.
One of the options that is on the cards is to reduce the number of machines we have in msync down to maybe 8 - 10, and have them serve up a public rsync targets, while we move a bulk ( 20 to 25 odd ) of the msync machines into a private push only network, so in order to recieve the tree from these machines, admins would need to host a key and allow rsync from specific IP's. The exact details of how that might work, or even IF we want to consider that, need to be worked out - but its one of the options to consider.
One thing that we all need to keep in mind is that the .centos.org network of machines is hosted almost exclusively out of donor machines, running in DC's run by hosting companies and we rarely ever get more than 60 - 70mbps out of a single machine. There are a few exceptions, but only a 'few'. So we ideally want to focus on pushing to public mirrors with as much b/w as we can - and have the user-end of the spectrum pull from these public mirrors. And I dont see how to achieve something like that without ACL's in place.
- KB
CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror