[CentOS] how to increase DNS reliability?

Thu Jul 25 20:10:34 UTC 2019
hw <hw at gc-24.de>

On 7/25/19 5:14 PM, Nataraj wrote:
> On 7/25/19 6:48 AM, rainer at ultra-secure.de wrote:
>> Am 2019-07-25 15:41, schrieb hw:
>>> On 7/25/19 2:53 PM, rainer at ultra-secure.de wrote:
>>>> Am 2019-07-25 14:51, schrieb hw:
>>>>> Hi,
>>>>>
>>>>> how can DNS reliability, as experienced by clients on the LAN who are
>>>>> sending queries, be increased?
>>>>>
>>>>> Would I have to set up some sort of cluster consisting of several
>>>>> servers all providing DNS services which is reachable under a single
>>>>> IP address known to the clients?
>>>>>
>>>>> Just setting up several name servers and making them known to the
>>>>> clients
>>>>> for the clients to automatically switch isn't a good solution because
>>>>> the clients take their timeouts and users lacking even the most basic
>>>>> knowledge inevitably panic when the first name server does not answer
>>>>> queries.
>>>>
>>>> Run a local cache (unbound) and enter all your local resolvers as
>>>> upstreams.
>>>
>>> That can fail just as well --- or be even worse when the clients
>>> can't switch
>>> over anymore.  I have that and am avoiding to use it for some clients
>>> because
>>> it takes a while for the cache to get updated when I make changes.
>>>
>>> However, if that cache fails, chances are that the internet
>>> connection is also
>>> down in which case it can be troublesome to even get local host names
>>> resolved.
>>> When that happens, trouble is to be expected.
>>
>>
>> Anything else is - IMHO - much more work, much more complicated and
>> much more likely to fail, in a more spectacular way.
>> Especially all those keepalive "solutions".
>>
>> I have found that I need to restart unbound if all upstreams had failed. 
> 
> 
> Configure all dns servers as primary slaves (plus 1 primary master) for
> your own domains.  I have never seen problems with resolution of local
> dns domains when the Internet was down.

It seemed to have to do with the TTL for the local names being too short 
and DNS being designed to generally query root servers rather than 
sticking to their local information.

> Depending on the size of your network, you can run a caching server on
> each host (configured as a primary slave for your own domains) and  then
> configure that local server to use forwarders.  When you use multiple
> forwarders the local server does not have to wait for timeouts before
> querying another server.  Then you just run 2 or more servers to use for
> forwarding.  Use forward-only to force all local servers to use only
> forwarding (for security and caching reasons).  Much simpler than using
> keepalived.

Hm.  I thought about something like that, but without the separation 
into local slaves using forwarders and the forwarders.  I will probably 
do that; it seems like the most reasonable solution, and I should have 
at least one forwarder anyway so as not to leak information to the 
internet-only VLANs.  It would be an improvement in several ways and 
give better reliability.

It doesn't really help those clients I can not run name servers on, though.

 > In recent years I *have not had any* problems with bind9 or
> powerdns crashing.
> 
> As far as using the ISC server vs powerdns, you may want to check on
> peoples recent experiences.  There was a time when many thought powerdns
> had much better performance and fewer security issues.  For various
> reasons  I've seen some people including myself, switch back to ISC
> bind9.  I switched about 1.5 years ago because I was getting better
> performance from bind9.  You may want to check out other peoples
> experience before switching to powerdns.

Bind has been around for ages, and I've never had any issues with it for 
the last 25 years or so.  Just set it up and let it do its thing, and it 
does.

If there were performance problems, I imagine they would be more likely 
due to insufficient internet bandwidth.  Perhaps it would take all the 
DNS queries that come up during a week or even a month to arrive within 
a second before any performance considerations become relevant ...