[CentOS] High Availability using 2 sites

Thu Jan 5 21:29:06 UTC 2006
Les Mikesell <lesmikesell at gmail.com>

On Thu, 2006-01-05 at 14:21, Bryan J. Smith wrote:

> It's very clear both you and I are talking about 2 entirely
> different things.  I don't disagree with many of the concepts
> you are covering, I know how round robin DNS works.  But how
> these concept work with respect to high availability is what
> I'm taking major issue with.
> 
> > The DNS server is irrelevant here.
> 
> It's _very_relevant_ if MS-RPC calls are being used and
> resolution changes from standard DNS at the _client_!  That
> was my point!

Then we agree.  Don't change DNS. 

> > In the dynamic scenario, you have a possible problem of
> > cache admins configuring to use a minimum time of their
> > own choice rather than following the spec, but that is
> > rare.  And it doesn't affect an unchanging list.
> 
> Sigh, you're picking and choosing the context you wish to
> discuss.  When you're providing server failover, you can't
> rely on applications or DNS, but you must make the IP appear
> as the same.

Or let the client connect to it's choice of multiple IP's
which can be in different locations.

> On one site, that is doable with NAT -- be it 1-to-1 or
> destination, with additional considerations.  Across sites
> you have to get far more involved.  If, of course, assumes
> you're using stateless sessions (like HTTP), and changes
> radically (and NAT won't work) if you are using stateful
> sessions (like RPC, NFS, etc...).

The client app needs to know how to pick up after a
failed connection if you want it to be transparent to
the user.  With stateless http that just means that you
make another connection.  Stateful sessions can sort-of
be made to work if you mirror the session data between
sites but that's probably going to break along with whatever
takes the one site offline anyway. With anything else things
will break unless the client makes the necessary steps to
get back in sync. That's why this logic is best included
in the client app along with the reconnect logic that
tries the other address(s) that DNS provided.  Even if
you pretend that some other machine had the same address
most apps aren't going be graceful about restarting their
broken connections.

> You are _not_ going to address it with DNS.

You can if you always offer distributed locations and
let the client choose the address.

> > If you write the app you can trust it to work the way you
> > wrote it and you don't have to worry about anyone's cache.
> > That why I suggest doing it that way.  Always give out
> > multiple IP addresses and don't change DNS.  Write the app
> to
> > walk the list of returned addresses itself if the first one
> it
> > tries doesn't respond.
> 
> We're talking about web services spread across 2 sites.
> What the heck does this context have anything to do with it?

Web browsers already do that.

> > Not true for the case of supplying multiple A records that
> > don't change.  The DNS servers/resolvers may change the
> > order of the list but nothing else.
> 
> Again, you're continuing to make the assumption on the
> applications used, and that they magically handle this logic
> as you want them to arbitrarily do so.

If you write the app you can make it work that way.  I agree
that there are a lot of ways it can go wrong.  Ours does it
right, so it can be done...

> > If you can find a repeatable case where IE does the wrong
> > thing with multiple A records where some work and some
> > don't please let me know.  I don't claim to understand how
> it
> > works but it seems very robust in those circumstances.
> 
> And I would differ on that assessment, very much so.

And that repeatable case I asked for would be???

> > How can DNS not work according to the specifications at
> > least at the 'A' record level?

> Sigh, I'm not opening up that can of worms (don't get me
> started ;-).

How would any service work over the internet if you
can't resolve A records?

> I also think you're referring extended operations of ADS, and
> not DNS, with MS IE.  When you think you're just doing simple
> DNS resolution, there are MS-RPC calls being made if you have
> ADS for your DNS and MS IE for your client.

I'm not sure what you are talking about. We have two
colo sites with an assortment of web and proprietary
services.  No ADS in sight.  I have F5 3dns boxes as
the primary DNS servers but normally let them give out
both addresses for all services, all the time.  IE mostly
just works. Our own client software takes care of failover
using the addresses supplied by DNS. It has its own heartbeat
on the server connection and will reconnect anytime
it notices a problem with the connection, trying every address
in the list.  When it reconnects it refreshes certain things
from the new server connection.  If a site goes completely
off line, the F5 will remove the address from the DNS list
but that is mostly irrelevant to our own software which would
ignore the failing address anyway.

> As I mentioned before, I purposely have to hack the Windows
> registry (typically pushed via GPOs) just to get MS IE to
> stop doing so really stupid things on an intranet.  I
> seriously doubt it works so perfectly as you describe over
> the Internet with its resolution -- quite the opposite.

Try it.  If you are resolving names with netbios you might
see something different.  Put a name in dns that doesn't
exist anywhere else to test it.

> The "hold downs" on various things are my biggest issue. 
> Especially when it comes to non-availability.

Being able to get all the addresses from multiple A
records doesn't have anything to do with hold downs.

-- 
  Les Mikesell
    lesmikesell at gmail.com