[CentOS] High Availability using 2 sites

Thu Jan 5 19:01:29 UTC 2006
Bryan J. Smith <thebs413 at earthlink.net>

Les Mikesell <lesmikesell at gmail.com> wrote:
> The 'round-robin' concept just means that the server will
> rotate the order of the addresses in the answer.  All
> addresses are still visible to the client and in the
caches.
> Try 'nslookup www.ibm.com'  to see the effect of multiple
> A records for the same name. 

Yes, I know how it works.  What I'm saying is that I don't
think the Windows resolver, before they even get to MS IE,
works as you believe.  At least not in an Internet
environment.  The Windows resolver is very, very different
than most UNIX resolvers, including a "hold down" for not
just failed resolution, but failed acces.

> IE will try them all.  Try setting up multiple A records
> in your DNS with one pointing to a working web server and
> one not and see if you even notice a difference when
> connecting to that name.

Furthermore, I made the addition point that I think you're
crossing some attributes of DNS with those of ActiveDirectory
Server (ADS) integrated DNS.

This is the Windows Resolver at work, not so much MS IE,
although the integration for ADS-integrated DNS and
ADS-integrated application like MS IE, do some interesting
things very _differently_ and _separate_ from how the Windows
resolver works for _Internet_ addresses.  ;->

> On the other hand, if you've given it a single
> IP address in the first DNS lookup, then change the DNS
> response you'll have to close all instances of IE to make
it
> pick up the change.

Again, there's a lot of logic at the Windows resolver at work
that you're not considering.  And then there are resolution
issues both at the Windows resolver and the application that
work very differently than MS IE.

> No, I mean multiple A records.

But on what server?

A true BIND or similar DNS server or Windows DNS Server?

> Most apps are dumb and only try the first one in the list
> returned so the round robin rotation on the server side
gives
> statistical load balancing but apps other than web browsers
> tend to fail if the first address doesn't respond.

I think you're crossing some concepts that MS IE doesn't
handle, but the Windows resolver does.  And then there are
ADS considerations as well.

> F5 uses a 30 second TTL by default on responses that can
> change dynamically.  It works well enough through normal
> caches but apps normally keep their first answer until
> you restart them.

But there is a lot of arbitrary cache/resolution between
their authority and your end-usage.  That's always going to
be an issue.

> On the contrary, the app is the best place to deal with it
> if you can.  That is, always return all possible IP
> addresses in the DNS query (or at least all working sites)
> and let the app walk through the list until it gets a
> connection that works.

Again, arbitrary and you can not only _not_ trust the apps to
work that way, but worse yet, there's a lot of
cache/resolution between you, the authority, and the end
system.

If you're going directly to the authority (especially if you
are the authority), then yeah, it can and will work.  But for
any arbitrary Internet user, there is a lot left to chance
and layers between the authority and them.

IP address is the only guarantee.  That's why people get AS
numbers.  You have to appear to be a single point from the
standpoint of the Internet, even if you're getting your
connections from 2-3 different providers.

> I have quite a bit of experience with this and that
approach
> is even better than trying to juggle DNS dynamically except
> for the case where you want to force clients to one
location
> or the other.  For example, you might temporarily have
local
> routing problems at some location that make it impossible
to
> connect to one site or the other that no other test could
> detect, and if the app has both IP addresses it can still
get
> to the one that works.

Yes, that works when _you_ can _guarantee_ that all clients
will talk _directly_ to the authority, or control intermedia
cache/non-authorities that guarantee adherence to the TTL. 
That's why it works for intranets as well as Internet
networks _you_ control.

But everything changes when you have people who don't access
the authority of the domain.  And to rely on an application
is rather arbitrary, especially how I've seen both the
Windows resolver and MS IE act.

> However, it only works for web apps and ones where
> you write the client yourself. The standard library
> 'connect' library routines will try one address and give
up.

Yes, which is why you can't trust it.
Even if you do write it, you're making the assumptions.
What if the service is not acting like you assume?
DNS does not provide what it seems from the standpoint of
different utilities (let alone versions), and Microsoft's
ADS-integrated works very, very different to make matters
worse.


-- 
Bryan J. Smith     Professional, Technical Annoyance                      b.j.smith at ieee.org      http://thebs413.blogspot.com
----------------------------------------------------
*** Speed doesn't kill, difference in speed does ***