Ok...I see some of your points, but if SITE_A talked to SITE_B via the internet and they use a 1:1NAT, if the internet goes down at SITE_A, it breaks the 1:1 NAT. Servers down are different than the Internet connection being down.
I never said RRD should be used for failover. I said it could be in combination with my idea.
I guess it would help to know if the web services are serving only the company, or are they serving the public/Internet?
--Todd
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Bryan J. Smith Sent: Thursday, January 05, 2006 2:55 PM To: CentOS mailing list Subject: RE: Linux HA may not be the best choice in your situation. [CentOS]High Availability using 2 sites
Todd Reed treed@astate.edu wrote:
As for Brian's Message:
- You are right, to a point.
To what point? Round-robin DNS is not, and never will be, failover. And even some logic we've discussed here is rather subjective and arbitrary, even for one, specific app -- working on a corporate network (before considering across the Internet).
I'm not saying that my idea was a replacement for BGP. I
even
state that by saying "This is no substitution for BGP".
Again, I wish I would have _never_ said BGP. I mean an AS.
- You are correct in saying that you need to change in
how
the world sees you (whether that be in the application or layer3 routing). That can be done at the application layer
or
at layer 3. But, if you are a small company that has only
1
internet link at your primary site and 1 internet
connection at
your remote site and you rely on the internet for communications between the sites, then you technically do not qualify for an AS number. That makes BGP useless. That makes your solution be somewhere in the upper layer of the OSI model.
That's why I gave a _second_ recommendation.
If you can guarantee your borders will _at_least_ be up, even if the servers behind it are down, you can implement 1-to-1 NAT at the border. I.e., if site A's servers are down, site A's border can use 1-to-1 NAT to target site B's servers.
Please recognize I'm giving not just the "high end" solution, but I'm also giving a "feasible" solution for SMBs too. ;->
[ SIDE NOTE: I was _not_ the person who brough up Google either. But when someone did, I (as well as at least 1 other) showed that I wasn't off-the-mark on how Google does it either. ;-]
- My idea was more on the lines of internet failure at
the
primary site and using RRDNS on top of that. If the
secondary
DNS server kicked in and pulled it's configs from the
hidden
master, then the hidden record wouldn't be configured for
RRDNS
because it is only used when the primary site fails.
The problem is _still_ propogation.
That's why, in the absence of your own AS, you need the failed site to redirect all traffic under the guise of 1-to-1 NAT to the site that is up. It's very simple to do, and they even have affordable devices to do so with Linux+ASICs (i.e., faster than a host-based Linux solution).
- My idea was also an application layer solution.
Last time I checked, BGP was in Layer 3.
Forget I even mentioned ASNs for a moment. I _also_ mentioned using 1-to-1 NAT between sites, and it works _well_ too.
It not only _avoids_ the propogation issue, but better yet, it work work _while_ propogation is still occuring.
- Yes, there are some delays when using DNS. Likewise
there are delays with BGP. Certainly BGP delays are a lot quicker. No argument there.
Sigh, you're not getting my point at all on layer-3. You've totally missed it. It's not comparable to application-level. It's absolute.
But again, my idea is looking at it from an application
layer
POV.
_Eventually_ you'd have to change the application-layer as well, _if_ the site was down awhile. But in the meantime, you _need_ to do layer-3 redirection for the immediate failover.
1-to-1 NAT does this. It's very simple. It just works.
Todd Reed treed@astate.edu wrote:
Ok...I see some of your points, but if SITE_A talked to SITE_B via the internet and they use a 1:1NAT, if the
internet
goes down at SITE_A, it breaks the 1:1 NAT. Servers down
are
different than the Internet connection being down.
Didn't I basically put everything short of a big asterisk on that in the 3-some-odd times I explained it?! ;->
I never said RRD should be used for failover. I said it could be in combination with my idea.
If you have any failover, it's best to update the DNS just in case it's longer than you expect. ;->
I guess it would help to know if the web services are serving only the company, or are they serving the public/Internet?
Exactly! That's why I keep both agreeing _and_ dismissing many suggestions, because most are only feasible _if_ they are for a corporate intranet. Most are too arbitrary for the Internet.
I believe the original poster was talking about the Internet, but I could be wrong.
Bryan J. Smith wrote:
Exactly! That's why I keep both agreeing _and_ dismissing many suggestions, because most are only feasible _if_ they are for a corporate intranet. Most are too arbitrary for the Internet.
I believe the original poster was talking about the Internet, but I could be wrong.
Yes I am talking about the Internet, not an Intranet. Thanks for all your replies, especially Brian, they've helped me see more clearly what the options are. I'd already given up on Round Robin or any other kind of DNS 'solution' before I posted, after reading this: http://homepages.tesco.net./~J.deBoynePollard/FGA/dns-round-robin-is-useless...
I don't think the difficulties and expense of getting an AS number and setting up BGP to work with our ISP will be worth it, but I'm not sure - what steps exactly are involved in doing that?
On Thu, 2006-01-05 at 18:33, Tim Edwards wrote:
I believe the original poster was talking about the Internet, but I could be wrong.
Yes I am talking about the Internet, not an Intranet. Thanks for all your replies, especially Brian, they've helped me see more clearly what the options are. I'd already given up on Round Robin or any other kind of DNS 'solution' before I posted, after reading this: http://homepages.tesco.net./~J.deBoynePollard/FGA/dns-round-robin-is-useless...
That page seems to be written with the premises that all clients are in the same location, served by the same dns cache which certainly won't be the case on the internet, and that browsers don't try anything but the first address in the DNS response which isn't true either, and that statistically distributing the load among servers isn't useful.
Les Mikesell wrote: I'd already given up on Round Robin or any other kind
of DNS 'solution' before I posted, after reading this: http://homepages.tesco.net./~J.deBoynePollard/FGA/dns-round-robin-is-useless...
That page seems to be written with the premises that all clients are in the same location, served by the same dns cache which certainly won't be the case on the internet,
Isn't it the other way around - the article is explaining why Round Robin DNS won't be effective when the clients aren't all in the same location, ie. when they're spread around the net.
and that browsers don't try anything but the first address in the DNS response which isn't true either
I can't see where it says that. All it says that there is no provision for ordering of records in DNS and therefore clients will most likely disregard any ordering of records you try to impose.
and that statistically distributing the load among servers isn't useful.
I can't say where it says that either. The author suggests that people who want load balancing should use SRV records or a real load balancer (things like LVS I assume), instead of trying to do it through RR DNS. This is not saying load balancing isn't useful.
On Thu, 2006-01-05 at 21:22, Tim Edwards wrote:
I'd already given up on Round Robin or any other kind
of DNS 'solution' before I posted, after reading this: http://homepages.tesco.net./~J.deBoynePollard/FGA/dns-round-robin-is-useless...
That page seems to be written with the premises that all clients are in the same location, served by the same dns cache which certainly won't be the case on the internet,
Isn't it the other way around - the article is explaining why Round Robin DNS won't be effective when the clients aren't all in the same location, ie. when they're spread around the net.
No, it describes what happens when a large number of clients are behind the same local DNS server - and not very realistically at that.
and that browsers don't try anything but the first address in the DNS response which isn't true either
I can't see where it says that. All it says that there is no provision for ordering of records in DNS and therefore clients will most likely disregard any ordering of records you try to impose.
What really happens is that the authoritative server will rotate the order of the list on each request, but downstream DNS servers and the client itself will cache a certain order once received. With a large number of clients connecting from a large number of places, the order distribution will be essentially random. Most client apps will try the first address in the list first and many won't continue if that fails. The versions of IE that I've tested do try additional addresses although that might not happen the same way behind a web proxy.
and that statistically distributing the load among servers isn't useful.
I can't say where it says that either. The author suggests that people who want load balancing should use SRV records or a real load balancer (things like LVS I assume), instead of trying to do it through RR DNS. This is not saying load balancing isn't useful.
SRV records have their advantages including the ability to specify preferences and ports for a server. However, multiple A records work reasonably well to distribute the connections from a large number of clients. With applications like IE that will retry using the additional addresses, you also get failover. It isn't perfect and doesn't work even for all http apps (note the occasional complaint here about yum not connecting to the mirror.centos.org repository when only a single site is down) but it will keep a lot of web browsers happy when your main site is down.
Les Mikesell wrote:
No, it describes what happens when a large number of clients are behind the same local DNS server - and not very realistically at that.
It describes what happens with a typical ISP where all the users are looking up addresses using the ISP's caching nameservers, this is not a local network. If it was a local network I was dealing with I could just have all the clients resolve off the one DNS server and there'd be no propagation delay. The problem with the Internet is the propagation delay out to all those seperate ISP's caching nameservers.
What really happens is that the authoritative server will rotate the order of the list on each request, but downstream DNS servers and the client itself will cache a certain order once received. With a large number of clients connecting from a large number of places, the order distribution will be essentially random. Most client apps will try the first address in the list first and many won't continue if that fails. The versions of IE that I've tested do try additional addresses although that might not happen the same way behind a web proxy.
Yes so because the order is disregarded with enough clients it will essentially be random. Its better than just having the single address I suppose, but its not necessarily redundant. It'd be good to know if its just IE that does this or other web browsers on other platforms.
On Thu, 2006-01-05 at 22:41, Tim Edwards wrote:
Les Mikesell wrote:
No, it describes what happens when a large number of clients are behind the same local DNS server - and not very realistically at that.
It describes what happens with a typical ISP where all the users are looking up addresses using the ISP's caching nameservers, this is not a local network. If it was a local network I was dealing with I could just have all the clients resolve off the one DNS server and there'd be no propagation delay. The problem with the Internet is the propagation delay out to all those seperate ISP's caching nameservers.
If all of your users connect through the same ISP, perhaps you should consider moving the server(s) to a data center provided by that ISP to reduce the chances they would be unreachable. However, even when everyone uses the same DNS cache a new lookup should happen every time the TTL expires so unless everyone connects at the same time you should still get random distribution. If the ISP's DNS server doesn't observe the DNS TTL (you can watch it count down with 'dig'), find a different ISP or hand-configure the clients to use servers that work.