[CentOS-mirror] RFC on public centos mirrors
florian at gruendler.net
florian at gruendler.net
Wed Jun 11 20:18:11 UTC 2008
Dear Tru
I want to give you some input what I think is the key aspect for a content
delivery network (CDN) - what a mirror network in fact is:
Simply Cost!
I do not want to play down the efforts of the people who gave their
dedication to Geo IP, which is useful for many things but not so much for
managing the cost of a content delivery network since an IP Address is not
really a geographical commodity. Every IP belongs to a parent, the
autonomous system (AS). In terms of cost, an IP can theoretically be located
literally meters away but in terms of traffic cost be far more expensive
than an IP at the other side of the planet to which the network owner has a
settlement free peering established.
Of course someone can rent a "flatrate" server with "unlimited" bandwith.
However, this is only a marketing concept. Every single bit needs to be
carried by a pipe which someone pays for. Both provider-ends involved in a
bi-directional data stream are likely interested to deliver the service to
the paying end-customer, be it the server owner (content) or the person who
accesses the server (eyeball). I don't want to get into the discussion what
network type is more valuable since this is a hot topic in the "net
neutrality" discussion. There is a small club of "Tier1" carriers
(http://en.wikipedia.org/wiki/Tier_1_network) sitting in the middle, not
paying anyone for transit and only collecting money from the rest of the
crowd for carrying traffic to networks which a peered for them but which the
not-Tier1 networks can't reach themselves through direct interconnection.
Everyone else is by definition not Tier 1 and pays someone cash for traffic.
The incentive is to avoid the Tier 1 networks to cut on that cost. This is a
little bit of theory for everyone who is not a carrier himself and just
purchases upstream traffic from a provider who blends the price for transit
and peering traffic. If you understand this, you will understand why a
provider hosting a mirror has no problems committing 1 Gbps on peering
routes but is afraid from having too much paid transit traffic which could
not only spike but permanently increase his 95% traffic quota on paid
routes.
Let's take my mirror as an example: There is no real problem with only 20 Gb
daily average transfer (statistics on http://mirror.silyus.net) but with 200
servers participating on the CentOS CDN, this globally results in quite some
traffic which could be engineered far better than by GeoIP round robbing. If
you look at the AS numbers of eyballs sucking from the mirror, there are
more transit than peering requesters thanks to Geo IP's unawareness of AS
network topology
(http://mirror.silyus.net/webalizer/usage_200806.html#TOPASNS).
I have two solutions in mind:
1. the centralized one
The domain name server only returns a round robbing IP to the
requester/eyeball of the CentOS mirror URL, if the hoster of the mirror has
authorized the IP range of the requester because it can be reached "local"
without extensive transit cost. The drawback is that this database needs to
be kept up to date since IP prefixes (ranges) are dynamically assigned to
autonomous systems. If nobody wants to serve the IP of a requester because
the provider of that eyeball does not peer for free, the eyeball will end up
in a black hole unless a failover transit "take all" mirror is provided.
2. the decentralized one
The domain name server behaves like now but the mirror itself bounces all
file requests which are not "local" according to his ACL. The eyeball still
contacts the server and causes some minor transit bandwith overhead but
content delivery is denied by the access control mechanism on the mirror.
This can be developed into some "inter-server-peer-to-peer mirror" network
if the mirror further suggests the next mirror to be tried until a server
accepts the request. This is a bit like the mechanism currently in use among
telephony routers: If a prefix does not match the locally connected numbers,
the call is routed on to the next switch (default route) until an
authoritative switch terminates (accepts or releases) the call. We still
need to make sure that every IP in the numberspace has an authoritative
mirror server or failover default route and the download client must be able
to understand the "hint/redirect" to the next mirror on a protocol level.
The mirror giving the hint should not hint to other mirrors which have been
retired so they should regularly talk to the peer servers whether they are
still alive and what numberspace they want to serve and what their current
update status is to determine whether they should still recommend a lagging
server or in case they lag themselves retreive content from a peer server if
it has more current content than the local one. To summarize: in this design
the intelligence transferred to the mirror servers and the master only needs
to seed the content to a few well connected peer servers which then
propagate the content to their nighbors. The problem to ensure the integrity
of the files on each node (and lock out zombie mirrors) is still unsolved
and I am not really competent to suggest anything right now. I guess there
is also some risk in the current production architecture that a mirror
server delivers malware files unless the master would build and compare the
md5 sum of each and every file on each and every downstream mirror. I guess
I am getting too paranoid now since all people hosting a mirror would never
ever have bad intentions.
Well, I have released this for now. Anyone wants to comment or pick this up
as a project?
Regards, Florian
----- Original Message -----
From: "Tru Huynh" <tru at centos.org>
To: <centos-mirror at centos.org>
Sent: Monday, June 09, 2008 4:01 PM
Subject: [CentOS-mirror] RFC on public centos mirrors
> _______________________________________________
> CentOS-mirror mailing list
> CentOS-mirror at centos.org
> http://lists.centos.org/mailman/listinfo/centos-mirror
>
More information about the CentOS-mirror
mailing list