Hi,
There is now close to 200 public CentOS mirrors listed at http://mirror-status.centos.org/. Only a handfull of them is late/down at any given time so I would like to thank you all for your support and efforts on behalf of the CentOS team and the silent users.
Here is the current situation and some proposition to further improve it. - centos.org servers are donated to the project. We don't control the disk size not the bandwith available on these servers. - regular centos mirroring is fine - recent requests for extending the mirror with the dvd isos have been delayed due to limitations on centos.org side(bandwith/resources). - limited number of rsync servers allowed to reach the centos + dvd iso (ACL on the centos.org servers) - msync.centos.org are round-robin (eu/us) and need time to sync together from a "master" server. - no ACL on the regular centos tree (~110 GB for all versions/all arches) <6G 2.1 31G 3.9 47G 4.6 28G 5.1 - the separation between Tier1/2/3 has not been written down (except for bandwith availability).
Requirements for CentOS public mirror: - request for no bandwith limitation (if you need to throttle, we can just remove your mirror from the mirrorlist generation and leave it on the main list). - round-robin for CentOS clients (http/ftp) via geo-ip from http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&rep...
RFC for possible future direction: rational: several public servers have more bandwith than the centos.org machines, by serving more files from the big pipes, we reduce the bottleneck for everyone. - dvd mirror (Tier 1) are the only one allowed to reach the master dvd repository - other public dvd mirror are re-directed to the dvd Tier 1 - leave it like that (no additional public dvd serveri: not satisfying...)
jidgo: feedback from debian mirror maintainer - rational: reduce the bandwith/time needed for the initial sync for each point release * drawback: the burden to generate the so goes to the public mirror willing to use jidgo * drawback: another target needed (hardlinked) for mirror who don't want/can't use jidgo * jidgo receipe needed for each iso (how easy/hard is it to work with?)
- doing it the otherway round? iso -> os/$arch tree
Best regards,
Tru PS: 5.2 is in QA now
On Mon, 9 Jun 2008, Tru Huynh wrote:
RFC for possible future direction: rational: several public servers have more bandwith than the centos.org machines, by serving more files from the big pipes, we reduce the bottleneck for everyone.
One more thing to consider in restructuring the mirrors is the research networks (e.g. Internet2, National Lambda Rail, Geant). A lot of the mirrors are universities, with access to these research nets.
I know my organization doesn't care about bandwidth usage on the research nets, just on the commodity internet links. Perhaps have a few I2-connected mirrors sync from the centos.org masters, and then other I2-mirrors sync from them (instead of the masters)?
The Fedora Project has already run into many of these issues. Something kinda like their tiering structure might make sense. http://fedoraproject.org/wiki/Infrastructure/Mirroring/Tiering
DR
Dear Tru
I want to give you some input what I think is the key aspect for a content delivery network (CDN) - what a mirror network in fact is:
Simply Cost!
I do not want to play down the efforts of the people who gave their dedication to Geo IP, which is useful for many things but not so much for managing the cost of a content delivery network since an IP Address is not really a geographical commodity. Every IP belongs to a parent, the autonomous system (AS). In terms of cost, an IP can theoretically be located literally meters away but in terms of traffic cost be far more expensive than an IP at the other side of the planet to which the network owner has a settlement free peering established.
Of course someone can rent a "flatrate" server with "unlimited" bandwith. However, this is only a marketing concept. Every single bit needs to be carried by a pipe which someone pays for. Both provider-ends involved in a bi-directional data stream are likely interested to deliver the service to the paying end-customer, be it the server owner (content) or the person who accesses the server (eyeball). I don't want to get into the discussion what network type is more valuable since this is a hot topic in the "net neutrality" discussion. There is a small club of "Tier1" carriers (http://en.wikipedia.org/wiki/Tier_1_network) sitting in the middle, not paying anyone for transit and only collecting money from the rest of the crowd for carrying traffic to networks which a peered for them but which the not-Tier1 networks can't reach themselves through direct interconnection. Everyone else is by definition not Tier 1 and pays someone cash for traffic. The incentive is to avoid the Tier 1 networks to cut on that cost. This is a little bit of theory for everyone who is not a carrier himself and just purchases upstream traffic from a provider who blends the price for transit and peering traffic. If you understand this, you will understand why a provider hosting a mirror has no problems committing 1 Gbps on peering routes but is afraid from having too much paid transit traffic which could not only spike but permanently increase his 95% traffic quota on paid routes.
Let's take my mirror as an example: There is no real problem with only 20 Gb daily average transfer (statistics on http://mirror.silyus.net) but with 200 servers participating on the CentOS CDN, this globally results in quite some traffic which could be engineered far better than by GeoIP round robbing. If you look at the AS numbers of eyballs sucking from the mirror, there are more transit than peering requesters thanks to Geo IP's unawareness of AS network topology (http://mirror.silyus.net/webalizer/usage_200806.html#TOPASNS).
I have two solutions in mind:
1. the centralized one
The domain name server only returns a round robbing IP to the requester/eyeball of the CentOS mirror URL, if the hoster of the mirror has authorized the IP range of the requester because it can be reached "local" without extensive transit cost. The drawback is that this database needs to be kept up to date since IP prefixes (ranges) are dynamically assigned to autonomous systems. If nobody wants to serve the IP of a requester because the provider of that eyeball does not peer for free, the eyeball will end up in a black hole unless a failover transit "take all" mirror is provided.
2. the decentralized one
The domain name server behaves like now but the mirror itself bounces all file requests which are not "local" according to his ACL. The eyeball still contacts the server and causes some minor transit bandwith overhead but content delivery is denied by the access control mechanism on the mirror. This can be developed into some "inter-server-peer-to-peer mirror" network if the mirror further suggests the next mirror to be tried until a server accepts the request. This is a bit like the mechanism currently in use among telephony routers: If a prefix does not match the locally connected numbers, the call is routed on to the next switch (default route) until an authoritative switch terminates (accepts or releases) the call. We still need to make sure that every IP in the numberspace has an authoritative mirror server or failover default route and the download client must be able to understand the "hint/redirect" to the next mirror on a protocol level. The mirror giving the hint should not hint to other mirrors which have been retired so they should regularly talk to the peer servers whether they are still alive and what numberspace they want to serve and what their current update status is to determine whether they should still recommend a lagging server or in case they lag themselves retreive content from a peer server if it has more current content than the local one. To summarize: in this design the intelligence transferred to the mirror servers and the master only needs to seed the content to a few well connected peer servers which then propagate the content to their nighbors. The problem to ensure the integrity of the files on each node (and lock out zombie mirrors) is still unsolved and I am not really competent to suggest anything right now. I guess there is also some risk in the current production architecture that a mirror server delivers malware files unless the master would build and compare the md5 sum of each and every file on each and every downstream mirror. I guess I am getting too paranoid now since all people hosting a mirror would never ever have bad intentions.
Well, I have released this for now. Anyone wants to comment or pick this up as a project?
Regards, Florian
----- Original Message ----- From: "Tru Huynh" tru@centos.org To: centos-mirror@centos.org Sent: Monday, June 09, 2008 4:01 PM Subject: [CentOS-mirror] RFC on public centos mirrors
CentOS-mirror mailing list CentOS-mirror@centos.org http://lists.centos.org/mailman/listinfo/centos-mirror
On Wed, 2008-06-11 at 22:18 +0200, florian@gruendler.net wrote:
Well, I have released this for now. Anyone wants to comment or pick this up as a project?
Florian,
I think you've hit the nail on the head!
We are an ISP and going by the description on the wikipedia article that you referenced, we would be a Tier 3 or just barely a Tier 2 since we do peer with some relatively small networks. We currently have our CentOS mirror bandwidth capped at 10 Mbps. The only thing stopping us from completely removing that cap is the potential costs associated with serving that bandwidth to users who aren't on our network.
This is a real issue. We would like to provide our clients the very fastest, unlimited access to our mirror. However, we must limit the bandwidth that our mirror presents to users who are off our network. One idea that has rolled around in my head a bit is this: 1) Serve the mirror from 2 separate IPs - One with unlimited bandwidth only accessible by our network ranges, the other with a bandwidth cap accessible by anyone. 2) Have our DNS server return the unlimited IP to requests form our clients and the limited IP to everyone else for our mirror name.
I like both of your ideas. I think they are both doable, and I think they have much greater potential than my simple solution. I wonder if the server freshness and hint list mentioned in your second scenario might be something that could be synced from the master to the rest of the servers along with the rest of the mirror files.
If we could build a solution around these ideas, we could have a really nice CDN. The end user could see faster speeds as a result of the Mirrors willingly removing most bandwidth caps. And, the mirrors should be quite happy to remove those caps since there would be no risk of charges from increased off network bandwidth.
Part of me does hope that in all of this we will still be able to maintain some sort of QA of the files that are being transferred. I personally like the current scenario because we have direct access to the CentOS Master mirrors from our mirror. If we were to move away from granting each mirror that access, we would have to have to provide some sort of assurance that downstream mirrors will receive their updates in a timely fashion and that the files have not been altered.
As far as picking this up as a project, I'd be willing to discuss some of these ideas further and help with some the scripting that might be necessary to pull this off.
Bob