[CentOS] how to increase DNS reliability?

Thu Jul 25 14:21:41 UTC 2019
hw <hw at gc-24.de>

On 7/25/19 3:48 PM, rainer at ultra-secure.de wrote:
> Am 2019-07-25 15:41, schrieb hw:
>> On 7/25/19 2:53 PM, rainer at ultra-secure.de wrote:
>>> Am 2019-07-25 14:51, schrieb hw:
>>>> Hi,
>>>> how can DNS reliability, as experienced by clients on the LAN who are
>>>> sending queries, be increased?
>>>> Would I have to set up some sort of cluster consisting of several
>>>> servers all providing DNS services which is reachable under a single
>>>> IP address known to the clients?
>>>> Just setting up several name servers and making them known to the clients
>>>> for the clients to automatically switch isn't a good solution because
>>>> the clients take their timeouts and users lacking even the most basic
>>>> knowledge inevitably panic when the first name server does not answer
>>>> queries.
>>> Run a local cache (unbound) and enter all your local resolvers as upstreams.
>> That can fail just as well --- or be even worse when the clients can't switch
>> over anymore.  I have that and am avoiding to use it for some clients because
>> it takes a while for the cache to get updated when I make changes.
>> However, if that cache fails, chances are that the internet connection is also
>> down in which case it can be troublesome to even get local host names resolved.
>> When that happens, trouble is to be expected.
> Anything else is - IMHO - much more work, much more complicated

That's what I was thinking.  Perhaps it is better to live with a main server and
one or two slaves so the clients can keep their alternatives.

But still ...  There's got to be a better way ...

> and much more likely to fail, in a more spectacular way.
> Especially all those keepalive "solutions".

You mean like probing if the DNS server is still responsive and somehow switching
over when it's not?  I never tried, though it is evident that more complicated
things may tend to be less reliable.

Yet it reminds me that I could actually check the name servers and dispatch a message
when one fails as I'm already doing for a couple other things.  That would suffice
and doesn't introduce more possibilites of failure to name resolution.

> I have found that I need to restart unbound if all upstreams had failed.