[CentOS-mirror] RFC on public centos mirrors

Dear Tru

I want to give you some input what I think is the key aspect for a content 
delivery network (CDN) - what a mirror network in fact is:

Simply Cost!

I do not want to play down the efforts of the people who gave their 
dedication to Geo IP, which is useful for many things but not so much for 
managing the cost of a content delivery network since an IP Address is not 
really a geographical commodity. Every IP belongs to a parent, the 
autonomous system (AS). In terms of cost, an IP can theoretically be located 
literally meters away but in terms of traffic cost be far more expensive 
than an IP at the other side of the planet to which the network owner has a 
settlement free peering established.

Of course someone can rent a "flatrate" server with "unlimited" bandwith. 
However, this is only a marketing concept. Every single bit needs to be 
carried by a pipe which someone pays for. Both provider-ends involved in a 
bi-directional data stream are likely interested to deliver the service to 
the paying end-customer, be it the server owner (content) or the person who 
accesses the server (eyeball). I don't want to get into the discussion what 
network type is more valuable since this is a hot topic in the "net 
neutrality" discussion. There is a small club of "Tier1" carriers 
(http://en.wikipedia.org/wiki/Tier_1_network) sitting in the middle, not 
paying anyone for transit and only collecting money from the rest of the 
crowd for carrying traffic to networks which a peered for them but which the 
not-Tier1 networks can't reach themselves through direct interconnection. 
Everyone else is by definition not Tier 1 and pays someone cash for traffic. 
The incentive is to avoid the Tier 1 networks to cut on that cost. This is a 
little bit of theory for everyone who is not a carrier himself and just 
purchases upstream traffic from a provider who blends the price for transit 
and peering traffic. If you understand this, you will understand why a 
provider hosting a mirror has no problems committing 1 Gbps on peering 
routes but is afraid from having too much paid transit traffic which could 
not only spike but permanently increase his 95% traffic quota on paid 
routes.

Let's take my mirror as an example: There is no real problem with only 20 Gb 
daily average transfer (statistics on http://mirror.silyus.net) but with 200 
servers participating on the CentOS CDN, this globally results in quite some 
traffic which could be engineered far better than by GeoIP round robbing. If 
you look at the AS numbers of eyballs sucking from the mirror, there are 
more transit than peering requesters thanks to Geo IP's unawareness of AS 
network topology 
(http://mirror.silyus.net/webalizer/usage_200806.html#TOPASNS).

I have two solutions in mind:

1. the centralized one

The domain name server only returns a round robbing IP to the 
requester/eyeball of the CentOS mirror URL, if the hoster of the mirror has 
authorized the IP range of the requester because it can be reached "local" 
without extensive transit cost. The drawback is that this database needs to 
be kept up to date since IP prefixes (ranges) are dynamically assigned to 
autonomous systems. If nobody wants to serve the IP of a requester because 
the provider of that eyeball does not peer for free, the eyeball will end up 
in a black hole unless a failover transit "take all" mirror is provided.

2. the decentralized one

The domain name server behaves like now but the mirror itself bounces all 
file requests which are not "local" according to his ACL. The eyeball still 
contacts the server and causes some minor transit bandwith overhead but 
content delivery is denied by the access control mechanism on the mirror. 
This can be developed into some "inter-server-peer-to-peer mirror" network 
if the mirror further suggests the next mirror to be tried until a server 
accepts the request. This is a bit like the mechanism currently in use among 
telephony routers: If a prefix does not match the locally connected numbers, 
the call is routed on to the next switch (default route) until an 
authoritative switch terminates (accepts or releases) the call. We still 
need to make sure that every IP in the numberspace has an authoritative 
mirror server or failover default route and the download client must be able 
to understand the "hint/redirect" to the next mirror on a protocol level. 
The mirror giving the hint should not hint to other mirrors which have been 
retired so they should regularly talk to the peer servers whether they are 
still alive and what numberspace they want to serve and what their current 
update status is to determine whether they should still recommend a lagging 
server or in case they lag themselves retreive content from a peer server if 
it has more current content than the local one. To summarize: in this design 
the intelligence transferred to the mirror servers and the master only needs 
to seed the content to a few well connected peer servers which then 
propagate the content to their nighbors. The problem to ensure the integrity 
of the files on each node (and lock out zombie mirrors) is still unsolved 
and I am not really competent to suggest anything right now. I guess there 
is also some risk in the current production architecture that a mirror 
server delivers malware files unless the master would build and compare the 
md5 sum of each and every file on each and every downstream mirror. I guess 
I am getting too paranoid now since all people hosting a mirror would never 
ever have bad intentions.

Well, I have released this for now. Anyone wants to comment or pick this up 
as a project?

Regards, Florian

----- Original Message ----- 
From: "Tru Huynh" <tru at centos.org>
To: <centos-mirror at centos.org>
Sent: Monday, June 09, 2008 4:01 PM
Subject: [CentOS-mirror] RFC on public centos mirrors

> _______________________________________________
> CentOS-mirror mailing list
> CentOS-mirror at centos.org
> http://lists.centos.org/mailman/listinfo/centos-mirror
>