how to increase DNS reliability?

List overview All Threads
Download

newer

older

initramfs annoyances (I think)

Fastboot and Google pixel 3a

25 Jul 2019 25 Jul '19

2:51 p.m.

Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Show replies by date

rainer＠ultra-secure.de

25 Jul 25 Jul

2:53 p.m.

Am 2019-07-25 14:51, schrieb hw:

...

Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Run a local cache (unbound) and enter all your local resolvers as upstreams.

3:41 p.m.

On 7/25/19 2:53 PM, rainer@ultra-secure.de wrote:

...

Am 2019-07-25 14:51, schrieb hw:

...
Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Run a local cache (unbound) and enter all your local resolvers as upstreams.

That can fail just as well --- or be even worse when the clients can't switch over anymore. I have that and am avoiding to use it for some clients because it takes a while for the cache to get updated when I make changes.

However, if that cache fails, chances are that the internet connection is also down in which case it can be troublesome to even get local host names resolved. When that happens, trouble is to be expected.

rainer＠ultra-secure.de

3:48 p.m.

Am 2019-07-25 15:41, schrieb hw:

...

On 7/25/19 2:53 PM, rainer@ultra-secure.de wrote:

...
Am 2019-07-25 14:51, schrieb hw:

...
Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Run a local cache (unbound) and enter all your local resolvers as upstreams.

That can fail just as well --- or be even worse when the clients can't switch over anymore. I have that and am avoiding to use it for some clients because it takes a while for the cache to get updated when I make changes.

However, if that cache fails, chances are that the internet connection is also down in which case it can be troublesome to even get local host names resolved. When that happens, trouble is to be expected.

Anything else is - IMHO - much more work, much more complicated and much more likely to fail, in a more spectacular way. Especially all those keepalive "solutions".

I have found that I need to restart unbound if all upstreams had failed.

4:21 p.m.

On 7/25/19 3:48 PM, rainer@ultra-secure.de wrote:

...

Am 2019-07-25 15:41, schrieb hw:

...
On 7/25/19 2:53 PM, rainer@ultra-secure.de wrote:

...
Am 2019-07-25 14:51, schrieb hw:

...
Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Run a local cache (unbound) and enter all your local resolvers as upstreams.

That can fail just as well --- or be even worse when the clients can't switch over anymore. I have that and am avoiding to use it for some clients because it takes a while for the cache to get updated when I make changes.

However, if that cache fails, chances are that the internet connection is also down in which case it can be troublesome to even get local host names resolved. When that happens, trouble is to be expected.

Anything else is - IMHO - much more work, much more complicated

That's what I was thinking. Perhaps it is better to live with a main server and one or two slaves so the clients can keep their alternatives.

But still ... There's got to be a better way ...

...

and much more likely to fail, in a more spectacular way. Especially all those keepalive "solutions".

You mean like probing if the DNS server is still responsive and somehow switching over when it's not? I never tried, though it is evident that more complicated things may tend to be less reliable.

Yet it reminds me that I could actually check the name servers and dispatch a message when one fails as I'm already doing for a couple other things. That would suffice and doesn't introduce more possibilites of failure to name resolution.

...

I have found that I need to restart unbound if all upstreams had failed.

Nataraj

5:14 p.m.

On 7/25/19 6:48 AM, rainer@ultra-secure.de wrote:

...

Am 2019-07-25 15:41, schrieb hw:

...
On 7/25/19 2:53 PM, rainer@ultra-secure.de wrote:

...
Am 2019-07-25 14:51, schrieb hw:

...
Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Run a local cache (unbound) and enter all your local resolvers as upstreams.

That can fail just as well --- or be even worse when the clients can't switch over anymore. I have that and am avoiding to use it for some clients because it takes a while for the cache to get updated when I make changes.

However, if that cache fails, chances are that the internet connection is also down in which case it can be troublesome to even get local host names resolved. When that happens, trouble is to be expected.

Anything else is - IMHO - much more work, much more complicated and much more likely to fail, in a more spectacular way. Especially all those keepalive "solutions".

I have found that I need to restart unbound if all upstreams had failed.

Configure all dns servers as primary slaves (plus 1 primary master) for your own domains. I have never seen problems with resolution of local dns domains when the Internet was down.

Depending on the size of your network, you can run a caching server on each host (configured as a primary slave for your own domains) and then configure that local server to use forwarders. When you use multiple forwarders the local server does not have to wait for timeouts before querying another server. Then you just run 2 or more servers to use for forwarding. Use forward-only to force all local servers to use only forwarding (for security and caching reasons). Much simpler than using keepalived. In recent years I *have not had any* problems with bind9 or powerdns crashing.

As far as using the ISC server vs powerdns, you may want to check on peoples recent experiences. There was a time when many thought powerdns had much better performance and fewer security issues. For various reasons I've seen some people including myself, switch back to ISC bind9. I switched about 1.5 years ago because I was getting better performance from bind9. You may want to check out other peoples experience before switching to powerdns.

Nataraj

7:10 p.m.

On 7/25/19 8:14 AM, Nataraj wrote:

...

On 7/25/19 6:48 AM, rainer@ultra-secure.de wrote:

...
Am 2019-07-25 15:41, schrieb hw:

...
On 7/25/19 2:53 PM, rainer@ultra-secure.de wrote:

...
Am 2019-07-25 14:51, schrieb hw:

...
Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Run a local cache (unbound) and enter all your local resolvers as upstreams.

That can fail just as well --- or be even worse when the clients can't switch over anymore. I have that and am avoiding to use it for some clients because it takes a while for the cache to get updated when I make changes.

However, if that cache fails, chances are that the internet connection is also down in which case it can be troublesome to even get local host names resolved. When that happens, trouble is to be expected.

Anything else is - IMHO - much more work, much more complicated and much more likely to fail, in a more spectacular way. Especially all those keepalive "solutions".

I have found that I need to restart unbound if all upstreams had failed.

Configure all dns servers as primary slaves (plus 1 primary master) for your own domains. I have never seen problems with resolution of local dns domains when the Internet was down.

I meant to say:

Configure all dns servers as secondary/slaves (one should be the primary master) for your own domains. Thos means that all of your servers are authoritative for your own domains, so they cannot fail on local dns lookups due to Internet problems.

...

Depending on the size of your network, you can run a caching server on each host (configured as a primary slave for your own domains) and then configure that local server to use forwarders. When you use multiple forwarders the local server does not have to wait for timeouts before querying another server. Then you just run 2 or more servers to use for forwarding. Use forward-only to force all local servers to use only forwarding (for security and caching reasons). Much simpler than using keepalived. In recent years I *have not had any* problems with bind9 or powerdns crashing.

As far as using the ISC server vs powerdns, you may want to check on peoples recent experiences. There was a time when many thought powerdns had much better performance and fewer security issues. For various reasons I've seen some people including myself, switch back to ISC bind9. I switched about 1.5 years ago because I was getting better performance from bind9. You may want to check out other peoples experience before switching to powerdns.

Nataraj

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

10:51 p.m.

On 7/25/19 7:10 PM, Nataraj wrote:

...

[...]

I meant to say:

Configure all dns servers as secondary/slaves (one should be the primary master) for your own domains. Thos means that all of your servers are authoritative for your own domains, so they cannot fail on local dns lookups due to Internet problems.

Ah!?

When I had it happen a couple years ago and wondered why even local names couldn't be resolved (which didn't make sense to me because the server would always know about them from the zone files), I was told that nothing could be done about it because DNS is designed to do lookups no matter what.

However, that was a server acting as both a local master and as a forwarder. If what you say is true, I would now understand this much better --- and I'd need to change my setup.

10:10 p.m.

On 7/25/19 5:14 PM, Nataraj wrote:

...

On 7/25/19 6:48 AM, rainer@ultra-secure.de wrote:

...
Am 2019-07-25 15:41, schrieb hw:

...
On 7/25/19 2:53 PM, rainer@ultra-secure.de wrote:

...
Am 2019-07-25 14:51, schrieb hw:

...
Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Run a local cache (unbound) and enter all your local resolvers as upstreams.

That can fail just as well --- or be even worse when the clients can't switch over anymore. I have that and am avoiding to use it for some clients because it takes a while for the cache to get updated when I make changes.

However, if that cache fails, chances are that the internet connection is also down in which case it can be troublesome to even get local host names resolved. When that happens, trouble is to be expected.

Anything else is - IMHO - much more work, much more complicated and much more likely to fail, in a more spectacular way. Especially all those keepalive "solutions".

I have found that I need to restart unbound if all upstreams had failed.

Configure all dns servers as primary slaves (plus 1 primary master) for your own domains. I have never seen problems with resolution of local dns domains when the Internet was down.

It seemed to have to do with the TTL for the local names being too short and DNS being designed to generally query root servers rather than sticking to their local information.

...

Depending on the size of your network, you can run a caching server on each host (configured as a primary slave for your own domains) and then configure that local server to use forwarders. When you use multiple forwarders the local server does not have to wait for timeouts before querying another server. Then you just run 2 or more servers to use for forwarding. Use forward-only to force all local servers to use only forwarding (for security and caching reasons). Much simpler than using keepalived.

Hm. I thought about something like that, but without the separation into local slaves using forwarders and the forwarders. I will probably do that; it seems like the most reasonable solution, and I should have at least one forwarder anyway so as not to leak information to the internet-only VLANs. It would be an improvement in several ways and give better reliability.

It doesn't really help those clients I can not run name servers on, though.

...

In recent years I *have not had any* problems with bind9 or powerdns crashing.

As far as using the ISC server vs powerdns, you may want to check on peoples recent experiences. There was a time when many thought powerdns had much better performance and fewer security issues. For various reasons I've seen some people including myself, switch back to ISC bind9. I switched about 1.5 years ago because I was getting better performance from bind9. You may want to check out other peoples experience before switching to powerdns.

Bind has been around for ages, and I've never had any issues with it for the last 25 years or so. Just set it up and let it do its thing, and it does.

If there were performance problems, I imagine they would be more likely due to insufficient internet bandwidth. Perhaps it would take all the DNS queries that come up during a week or even a month to arrive within a second before any performance considerations become relevant ...

Nataraj

26 Jul 26 Jul

1:31 a.m.

On 7/25/19 1:10 PM, hw wrote:

...

...
Configure all dns servers as primary slaves (plus 1 primary master) for your own domains. I have never seen problems with resolution of local dns domains when the Internet was down.

It seemed to have to do with the TTL for the local names being too short and DNS being designed to generally query root servers rather than sticking to their local information.

It has nothing to do with the ttl. The TTL does cause expiration in an authoritative server. TTLs only affect caching servers. The primary master gets changed when you edit the local zone database. The secondary slave gets updated when the serial number in the SOA record on the primary master gets bumped. You must either do that manually or use a zone database management tool that does it for you.

If a dns server is configured as a primary master or a secondary slave for a domain, then it is authoritative for that domain and does not require queries to any other server on your network or on the Internet. The difference between a primary master and a secondary slave is the primary master is where you edit the zone records and the secondary slave replicates the zone database from the primary master. Even if the primary master goes down, the secondary slave still has a copy of the zone files in it's disk files (or other database format that you configure) and will server them flawlessly.

One way to see if a server is properly configured as authoritative for a domain is:

nataraj@pygeum:~$ dig mydomain.com. soa @127.0.0.1

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> mydomain.com. soa@127.0.0.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52104 ;; flags: qr *aa* rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 4

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 64f402c0c22d57aa2bbb10fc5d3a340d8c19377b924d01c2 (good) ;; QUESTION SECTION: ;mydomain.com. IN SOA

;; ANSWER SECTION: Mydomain.Com. 14400 IN SOA ns1.mydomain.com. postmaster.Mydomain.COM. 2019072505 1200 600 15552000 14400

;; AUTHORITY SECTION: Mydomain.Com. 14400 IN NS ns1.Mydomain.Com. Mydomain.Com. 14400 IN NS ns2.Mydomain.Com. Mydomain.Com. 14400 IN NS ns3.Mydomain.com.

;; ADDITIONAL SECTION: ns1.mydomain.com. 14400 IN A 8.8.8.8

;; Query time: 1 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Jul 25 15:58:21 PDT 2019 ;; MSG SIZE rcvd: 243

The AA flag in the flags section tells you that you have queried a dns server that is authoritative for the domain that you queried. If it doesn't have the AA flag then you have not properly set up the primary master or secondary slave for that domain.

If your masters and slaves are all configured correctly for a domain then they will all have the same serial number in the SOA record (and same results for any query in that domain). If they don't then something is wrong and your zone transfers are not occuring properly.

...

...
Depending on the size of your network, you can run a caching server on each host (configured as a primary slave for your own domains) and then configure that local server to use forwarders. When you use multiple forwarders the local server does not have to wait for timeouts before querying another server. Then you just run 2 or more servers to use for forwarding. Use forward-only to force all local servers to use only forwarding (for security and caching reasons). Much simpler than using keepalived.

Hm. I thought about something like that, but without the separation into local slaves using forwarders and the forwarders. I will probably do that; it seems like the most reasonable solution, and I should have at least one forwarder anyway so as not to leak information to the internet-only VLANs. It would be an improvement in several ways and give better reliability.

The local server can have forward-only either on or off. If off, It will go out directly to the Internet if it does not receive a response from a forwarder. Using forward only and putting your forwarders on a seperate network away from your inside network means if there is a security hole in the nameserver, your inside hosts are less likely to be compromised. You could also configure your ISP's or google or other public recursive name servers as forwarders if you don't want to run your own.

...

It doesn't really help those clients I can not run name servers on, though.

...
In recent years I *have not had any* problems with bind9 or powerdns crashing.

As far as using the ISC server vs powerdns, you may want to check on peoples recent experiences. There was a time when many thought powerdns had much better performance and fewer security issues. For various reasons I've seen some people including myself, switch back to ISC bind9. I switched about 1.5 years ago because I was getting better performance from bind9. You may want to check out other peoples experience before switching to powerdns.

Bind has been around for ages, and I've never had any issues with it for the last 25 years or so. Just set it up and let it do its thing, and it does.

If there were performance problems, I imagine they would be more likely due to insufficient internet bandwidth. Perhaps it would take all the DNS queries that come up during a week or even a month to arrive within a second before any performance considerations become relevant ...

Exactly, a simple bind9 configuration is adequate unless you run an application with huge numbers of DNS queries.

...

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Nataraj

1:42 a.m.

On 7/25/19 4:31 PM, Nataraj wrote:

...

It doesn't really help those clients I can not run name servers on, though.

Another alternative is to look at the multicast dns (mdns) protocol. I have no experience with it, so I can't say very much, but I know it exists. I'm pretty sure it's inplemented in avahi daemon, so it may just be an issue of enabling it on the client. If your client supports it then I would think that all you have to do is enable it.

Nataraj

Warren Young

9:34 p.m.

On Jul 25, 2019, at 5:42 PM, Nataraj incoming-centos@rjl.com wrote:

...

On 7/25/19 4:31 PM, Nataraj wrote:

...
It doesn't really help those clients I can not run name servers on, though.

Another alternative is to look at the multicast dns (mdns) protocol.

That’s for allowing a device to self-advertise its own name, along with other things, like available services. If you have such devices, then configuring the other machines on the network to pay attention to such advertisements allows them to see the new names and services when they appear.

…And much more importantly, when they *disappear*, since many ZeroConf/Bonjour/Avahi/mDNS speaking devices are mobile and aren’t always available.

This protocol is one common way for network printers to advertise their services, for example. (The other common way is SMB/CIFS.)

...

I'm pretty sure it's inplemented in avahi daemon

Yes, that’s an implementation of mDNS for POSIX type systems.

...

If your client supports it then I would think that all you have to do is enable it.

I’m not sure how this is relevant here. For mDNS to be the solution to the OP’s problems, he’d have to also have mDNS multicasts going out advertising services, so the Avahi daemon would have something to offer when a compatible program comes along looking for services to connect to.

I suppose you could use mDNS in datacenter type environments, but it’s a long way away from the protocol’s original intent.

You could imagine a load balancer that paid attention to mDNS advertisements to decide who’s available at the moment. But I don’t know of any such implementation.

Leroy Tennison

3:45 p.m.

This brings up one of the caveats for (at least ISC) DNS, if the master goes down the slaves will take over for a time but eventually will stop serving for the domains of the master if it remains down too long. If my (sometimes faulty) memory serves me well it is in the three day range (but configurable) which is ample time unless the problem occurs early in a holiday weekend and and the notification/escalation process isn't what it should be (Murphey's Law)...

________________________________ From: CentOS centos-bounces@centos.org on behalf of Nataraj incoming-centos@rjl.com Sent: Thursday, July 25, 2019 6:31:26 PM To: centos@centos.org centos@centos.org

Harriscomputer

Register now for the dataVoice User Conference, October 9-11 at the Gaylord Rockies in Denver, CO. To register click Herehttps://www.harriscomputer.com/en/events/

Leroy Tennison Network Information/Cyber Security Specialist E: leroy@datavoiceint.com

[cid:Data-Voice-International-LOGO_aa3d1c6e-5cfb-451f-ba2c-af8059e69609.PNG]

2220 Bush Dr McKinney, Texas 75070 www.datavoiceint.comhttp://www..com

This message has been sent on behalf of a company that is part of the Harris Operating Group of Constellation Software Inc. These companies are listed herehttp://subscribe.harriscomputer.com/.

If you prefer not to be contacted by Harris Operating Group please notify ushttp://subscribe.harriscomputer.com/.

This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message.

Subject: [EXTERNAL] Re: [CentOS] how to increase DNS reliability?

On 7/25/19 1:10 PM, hw wrote:

...

...
Configure all dns servers as primary slaves (plus 1 primary master) for your own domains. I have never seen problems with resolution of local dns domains when the Internet was down.

It seemed to have to do with the TTL for the local names being too short and DNS being designed to generally query root servers rather than sticking to their local information.

One way to see if a server is properly configured as authoritative for a domain is:

nataraj@pygeum:~$ dig mydomain.com. soa @127.0.0.1

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ; COOKIE: 64f402c0c22d57aa2bbb10fc5d3a340d8c19377b924d01c2 (good) ;; QUESTION SECTION: ;mydomain.com. IN SOA

;; ANSWER SECTION: Mydomain.Com. 14400 IN SOA ns1.mydomain.com. postmaster.Mydomain.COM. 2019072505 1200 600 15552000 14400

;; AUTHORITY SECTION: Mydomain.Com. 14400 IN NS ns1.Mydomain.Com. Mydomain.Com. 14400 IN NS ns2.Mydomain.Com. Mydomain.Com. 14400 IN NS ns3.Mydomain.com.

;; ADDITIONAL SECTION: ns1.mydomain.com. 14400 IN A 8.8.8.8

;; Query time: 1 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Jul 25 15:58:21 PDT 2019 ;; MSG SIZE rcvd: 243

...

...
Depending on the size of your network, you can run a caching server on each host (configured as a primary slave for your own domains) and then configure that local server to use forwarders. When you use multiple forwarders the local server does not have to wait for timeouts before querying another server. Then you just run 2 or more servers to use for forwarding. Use forward-only to force all local servers to use only forwarding (for security and caching reasons). Much simpler than using keepalived.

Hm. I thought about something like that, but without the separation into local slaves using forwarders and the forwarders. I will probably do that; it seems like the most reasonable solution, and I should have at least one forwarder anyway so as not to leak information to the internet-only VLANs. It would be an improvement in several ways and give better reliability.

...

It doesn't really help those clients I can not run name servers on, though.

...
In recent years I *have not had any* problems with bind9 or powerdns crashing.

As far as using the ISC server vs powerdns, you may want to check on peoples recent experiences. There was a time when many thought powerdns had much better performance and fewer security issues. For various reasons I've seen some people including myself, switch back to ISC bind9. I switched about 1.5 years ago because I was getting better performance from bind9. You may want to check out other peoples experience before switching to powerdns.

Bind has been around for ages, and I've never had any issues with it for the last 25 years or so. Just set it up and let it do its thing, and it does.

If there were performance problems, I imagine they would be more likely due to insufficient internet bandwidth. Perhaps it would take all the DNS queries that come up during a week or even a month to arrive within a second before any performance considerations become relevant ...

Exactly, a simple bind9 configuration is adequate unless you run an application with huge numbers of DNS queries.

...

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

_______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Giles Coochey

3:52 p.m.

On 26/07/2019 14:45, Leroy Tennison wrote:

...

This brings up one of the caveats for (at least ISC) DNS, if the master goes down the slaves will take over for a time but eventually will stop serving for the domains of the master if it remains down too long. If my (sometimes faulty) memory serves me well it is in the three day range (but configurable) which is ample time unless the problem occurs early in a holiday weekend and and the notification/escalation process isn't what it should be (Murphey's Law)...

The value you refer to is the SOA record _expire_ value for a zone, I believe is should be set to between 14 and 28 days.

https://en.wikipedia.org/wiki/SOA_record

Nataraj

6:35 p.m.

On 7/26/19 6:52 AM, Giles Coochey wrote:

...

On 26/07/2019 14:45, Leroy Tennison wrote:

...
This brings up one of the caveats for (at least ISC) DNS, if the master goes down the slaves will take over for a time but eventually will stop serving for the domains of the master if it remains down too long. If my (sometimes faulty) memory serves me well it is in the three day range (but configurable) which is ample time unless the problem occurs early in a holiday weekend and and the notification/escalation process isn't what it should be (Murphey's Law)...

The value you refer to is the SOA record _expire_ value for a zone, I believe is should be set to between 14 and 28 days.

https://en.wikipedia.org/wiki/SOA_record

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

If you administer the secondary slave servers, there is no reason not to use a very large number, 30 days or more for the SOA expiration. Only reason to use a lower number would be if you don't have control over the slave servers and don't want to have old zone files that you can't update.

Another alternative, which many people did for years in the early days when zone transfers were unreliable, is to use a script which replicates the entire DNS configuration to the secondaries and then run all the servers as primary masters. If the script is written cleanly, you can then edit the zone on any server and rsync it to the other servers. Main thing is to prevent multiple people applying updates simultaneously.

Nataraj

Giles Coochey

28 Jul 28 Jul

12:11 p.m.

On 26/07/2019 17:35, Nataraj wrote:

...

If you administer the secondary slave servers, there is no reason not to use a very large number, 30 days or more for the SOA expiration. Only reason to use a lower number would be if you don't have control over the slave servers and don't want to have old zone files that you can't update.

Another alternative, which many people did for years in the early days when zone transfers were unreliable, is to use a script which replicates the entire DNS configuration to the secondaries and then run all the servers as primary masters. If the script is written cleanly, you can then edit the zone on any server and rsync it to the other servers. Main thing is to prevent multiple people applying updates simultaneously.

Nataraj

PowerDNS supports MySQL backends for the zone files, so one way that they can work is in Native mode, as an alternative to Master / Slave, in which the replication and information resilience is handled by the backend (e.g. a MySQL cluster), and the servers just read the zone from the database, with no need to perform zone transfers at all. The expire timer in the SOA record then becomes pretty defunct, although if you export your zones to non-PowerDNS servers, e.g. bind, then they take effect.

Leroy Tennison

25 Jul 25 Jul

3:28 p.m.

If you don't want multiple DNS server entries on the client then a master and (possibly multiple) slave server configuration can be set up (I'm assuming ISC DNS - their solution to redundancy/failover is master and slave servers, this may be the way it is with all DNS). keepalived can be used for fail over and will present a single IP address (which the clients would use) shared among the servers. haproxy or alternatives might be another fail over option. Each technology has its own learning curve (and doing this will require at least two) and caveats. In particular systemd doesn't appear to play well with technologies creating IP addresses it doesn't manage. The version of keepalived we're using also has its own nasty quirk as well where it comes up assuming it is master until discovered otherwise, this is true even if it is configured as backup. In most cases this is probably either a non-issue (no scripts being used) or a minor annoyance. But if you're using scripts triggered by keepalived which make significant (and possibly conflicting) changes to the environment then you'll need to embed "intelligence" in them to wait until final state is reached or test state before acting or some other option.

________________________________ From: CentOS centos-bounces@centos.org on behalf of hw hw@gc-24.de Sent: Thursday, July 25, 2019 7:51:39 AM To: centos@centos.org centos@centos.org Subject: [EXTERNAL] [CentOS] how to increase DNS reliability?

Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Harriscomputer

Register now for the dataVoice User Conference, October 9-11 at the Gaylord Rockies in Denver, CO. To register click Herehttps://www.harriscomputer.com/en/events/

Leroy Tennison Network Information/Cyber Security Specialist E: leroy@datavoiceint.com

[cid:Data-Voice-International-LOGO_aa3d1c6e-5cfb-451f-ba2c-af8059e69609.PNG]

2220 Bush Dr McKinney, Texas 75070 www.datavoiceint.comhttp://www..com

This message has been sent on behalf of a company that is part of the Harris Operating Group of Constellation Software Inc. These companies are listed herehttp://subscribe.harriscomputer.com/.

If you prefer not to be contacted by Harris Operating Group please notify ushttp://subscribe.harriscomputer.com/.

Nux!

4:49 p.m.

I'm about to do an overhaul of the DNS service at work and my plan is to use powerdns recursor + dnsdist + keepalived.

--- Sent from the Delta quadrant using Borg technology!

On 2019-07-25 14:28, Leroy Tennison wrote:

...

If you don't want multiple DNS server entries on the client then a master and (possibly multiple) slave server configuration can be set up (I'm assuming ISC DNS - their solution to redundancy/failover is master and slave servers, this may be the way it is with all DNS). keepalived can be used for fail over and will present a single IP address (which the clients would use) shared among the servers. haproxy or alternatives might be another fail over option. Each technology has its own learning curve (and doing this will require at least two) and caveats. In particular systemd doesn't appear to play well with technologies creating IP addresses it doesn't manage. The version of keepalived we're using also has its own nasty quirk as well where it comes up assuming it is master until discovered otherwise, this is true even if it is configured as backup. In most cases this is probably either a non-issue (no scripts being used) or a minor annoyance. But if you're using scripts trigger ed by keepalived which make significant (and possibly conflicting) changes to the environment then you'll need to embed "intelligence" in them to wait until final state is reached or test state before acting or some other option.

From: CentOS centos-bounces@centos.org on behalf of hw hw@gc-24.de Sent: Thursday, July 25, 2019 7:51:39 AM To: centos@centos.org centos@centos.org Subject: [EXTERNAL] [CentOS] how to increase DNS reliability?

Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Harriscomputer

Register now for the dataVoice User Conference, October 9-11 at the Gaylord Rockies in Denver, CO. To register click Herehttps://www.harriscomputer.com/en/events/

Leroy Tennison Network Information/Cyber Security Specialist E: leroy@datavoiceint.com

[cid:Data-Voice-International-LOGO_aa3d1c6e-5cfb-451f-ba2c-af8059e69609.PNG]

2220 Bush Dr McKinney, Texas 75070 www.datavoiceint.comhttp://www..com

This message has been sent on behalf of a company that is part of the Harris Operating Group of Constellation Software Inc. These companies are listed herehttp://subscribe.harriscomputer.com/.

If you prefer not to be contacted by Harris Operating Group please notify ushttp://subscribe.harriscomputer.com/.

This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message.

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

7:13 p.m.

On 7/25/19 4:49 PM, Nux! wrote:

...

I'm about to do an overhaul of the DNS service at work and my plan is to use powerdns recursor + dnsdist + keepalived.

I've more or less done the overhaul, only some sort of failover thing is missing ... I'll check those out, thanks!

7:11 p.m.

On 7/25/19 3:28 PM, Leroy Tennison wrote:

...

If you don't want multiple DNS server entries on the client

I'm ok with them, only the problem is that the clients take their timeouts when a server is unreachable, and users panic.

...

then a master and (possibly multiple) slave server configuration can be set up (I'm assuming ISC DNS - their solution to redundancy/failover is master and slave servers, this may be the way it is with all DNS).

Yes, bind9, and I've set up a master and a slave. The router uses them to forward requests to on behalf of those clients that use the router as a name server while other clients know master and slave but not the router as name servers.

There was a failure a while ago (IIRC because of a UPS causing a server to shut down when the battery failed the self test), and things didn't quite work anymore with the master server being unreachable.

This is how I have a problem with the clients knowing multiple servers: The very setup is intended to keep things working during an outage and yet it doesn't help.

...

keepalived can be used for fail over and will present a single IP address (which the clients would use) shared among the servers. haproxy or alternatives might be another fail over option.

Thanks, I'll look into that! I've been searching for "dns proxy" and no useful results came up ...

...

Each technology has its own learning curve (and doing this will require at least two) and caveats. In particular systemd doesn't appear to play well with technologies creating IP addresses it doesn't manage. The version of keepalived we're using also has its own nasty quirk as well where it comes up assuming it is master until discovered otherwise, this is true even if it is configured as backup. In most cases this is probably either a non-issue (no scripts being used) or a minor annoyance. But if you're using scripts trigger ed by keepalived which make significant (and possibly conflicting) changes to the environment then you'll need to embed "intelligence" in them to wait until final state is reached or test state before acting or some other option.

I consider myself warned :)

Paul Heinlein

7:58 p.m.

On Thu, 25 Jul 2019, hw wrote:

...

On 7/25/19 3:28 PM, Leroy Tennison wrote:

...
If you don't want multiple DNS server entries on the client

I'm ok with them, only the problem is that the clients take their timeouts when a server is unreachable, and users panic.

On Linux systems, you can set the timeout in /etc/resolv.conf, e.g.,

# I think the default nameserver timeout is 5; use rotate # option if you prefer round-robin queries rather than # always using the first-listed first nameserver 10.11.12.13 timeout:2 rotate nameserver 10.11.12.14 timeout:2 rotate

I'll admit that I'm not sure if those options are configurable on Mac and/or Windows workstations.

-- Paul Heinlein heinlein@madboa.com 45°38' N, 122°6' W

John Pierce

8:14 p.m.

On Thu, Jul 25, 2019 at 11:00 AM Paul Heinlein heinlein@madboa.com wrote:

...

On Thu, 25 Jul 2019, hw wrote:

...
On 7/25/19 3:28 PM, Leroy Tennison wrote:

...
If you don't want multiple DNS server entries on the client

I'm ok with them, only the problem is that the clients take their

timeouts

...
when a server is unreachable, and users panic.

On Linux systems, you can set the timeout in /etc/resolv.conf, e.g.,...

Windows will 'rotate' the list of NS servers if the top one times out, so next time it will use the first alternate.... and if that times out, it will start using the next alternate, etc.

-- -john r pierce recycling used bits in santa cruz

Leon Fauster

8:40 p.m.

...

Am 25.07.2019 um 19:58 schrieb Paul Heinlein heinlein@madboa.com:

On Thu, 25 Jul 2019, hw wrote:

...
On 7/25/19 3:28 PM, Leroy Tennison wrote:

...
If you don't want multiple DNS server entries on the client

I'm ok with them, only the problem is that the clients take their timeouts when a server is unreachable, and users panic.

On Linux systems, you can set the timeout in /etc/resolv.conf, e.g.,

# I think the default nameserver timeout is 5; use rotate # option if you prefer round-robin queries rather than # always using the first-listed first nameserver 10.11.12.13 timeout:2 rotate nameserver 10.11.12.14 timeout:2 rotate

IMO such entries are done via "options" ...

yum install man-pages ; man resolv.conf

-- LF

10:13 p.m.

On 7/25/19 7:58 PM, Paul Heinlein wrote:

...

On Thu, 25 Jul 2019, hw wrote:

...
On 7/25/19 3:28 PM, Leroy Tennison wrote:

...
If you don't want multiple DNS server entries on the client

I'm ok with them, only the problem is that the clients take their timeouts when a server is unreachable, and users panic.

On Linux systems, you can set the timeout in /etc/resolv.conf, e.g.,

# I think the default nameserver timeout is 5; use rotate # option if you prefer round-robin queries rather than # always using the first-listed first nameserver 10.11.12.13 timeout:2 rotate nameserver 10.11.12.14 timeout:2 rotate

I'll admit that I'm not sure if those options are configurable on Mac and/or Windows workstations.

It was those showing problems.

Only 5 seconds isn't long enough that I would expect any problems. What do I need to put into the ifcf files or tell nmcli to set these options?

Paul Heinlein

10:42 p.m.

On Thu, 25 Jul 2019, hw wrote:

...

...
On Linux systems, you can set the timeout in /etc/resolv.conf, e.g.,

# I think the default nameserver timeout is 5; use rotate # option if you prefer round-robin queries rather than # always using the first-listed first nameserver 10.11.12.13 timeout:2 rotate nameserver 10.11.12.14 timeout:2 rotate

I'll admit that I'm not sure if those options are configurable on Mac and/or Windows workstations.

It was those showing problems.

Only 5 seconds isn't long enough that I would expect any problems. What do I need to put into the ifcf files or tell nmcli to set these options?

If you're using dhclient to manage addresses, then you can add the RES_OPTIONS variable to /etc/sysconfig/network:

# /etc/sysconfig/network RES_OPTIONS="timeout:2 rotate"

Or, with even less patience:

RES_OPTIONS="timeout:1 retries:1 rotate"

Grep for RES_OPTIONS in /sbin/dhclient-script for the gory details.

-- Paul Heinlein heinlein@madboa.com 45°38' N, 122°6' W

Giles Coochey

4:07 p.m.

On 25/07/2019 13:51, hw wrote:

...

Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Sounds like you're performing maintenance on your servers

(a) too often (b) during office / peak hours

You could load balance multiple servers (using lots of available load-balancing technologies) to allow you to perform maintenance at certain times, but it has its own issues.

I've recently been looking at PowerDNS, which separates the recursor and the authoritative server into two distinct packages. I'm just running the authoritative server as a master, and keeping my old bind/named servers as recursors / slaves. It's a home office network, but I only have issues when I'm tinkering, and if I were to be doing this kind of work in a larger commercial environment, then I would not be doing DNS server maintenance while others were relying on them.

For much of the back end infrastructure I use IP addresses rather than DNS names in their configuration, just to take DNS issues out of the equation completely.

7:31 p.m.

On 7/25/19 4:07 PM, Giles Coochey wrote:

...

On 25/07/2019 13:51, hw wrote:

...
Hi,

how can DNS reliability, as experienced by clients on the LAN who are sending queries, be increased?

Would I have to set up some sort of cluster consisting of several servers all providing DNS services which is reachable under a single IP address known to the clients?

Just setting up several name servers and making them known to the clients for the clients to automatically switch isn't a good solution because the clients take their timeouts and users lacking even the most basic knowledge inevitably panic when the first name server does not answer queries.

Sounds like you're performing maintenance on your servers

(a) too often (b) during office / peak hours

I can't help it when the primary name server goes down because the UPS fails the self test and tells the server it has 2 minutes or so left in wich case the server figures it needs to shut down. I wanted better UPSs ...

...

You could load balance multiple servers (using lots of available load-balancing technologies) to allow you to perform maintenance at certain times, but it has its own issues.

Load balancing or clustering? At least clustering seems not entirely trivial to do.

...

I've recently been looking at PowerDNS, which separates the recursor and the authoritative server into two distinct packages. I'm just running the authoritative server as a master, and keeping my old bind/named servers as recursors / slaves.

This can be done with bind, how does it require something called PowerDNS?

...

It's a home office network, but I only have issues when I'm tinkering, and if I were to be doing this kind of work in a larger commercial environment, then I would not be doing DNS server maintenance while others were relying on them.

The maintenance didn't cause any problems. You can edit the configuration just fine and restart the server when done ... :)

...

For much of the back end infrastructure I use IP addresses rather than DNS names in their configuration, just to take DNS issues out of the equation completely.

I think this is a very bad idea because it causes lots of work and is likely to cause issues. What if you, for example, migrate remote logging to another server? All the time, you have to document every place where you put an IP address; you have to keep the documentation always up to date and then change the address at every place when you make a change. Forget one place, and things break.

But when you use names instead of addresses, like 'log.example.com', you only need to make a single change at a single place such as you alter the address in your name server config.

DNS can be difficult to get right, though it's not all that difficult, and once it's working, there aren't really any issues other than that a server can become unreachable.

mark

9:11 p.m.

hw wrote:

...

On 7/25/19 4:07 PM, Giles Coochey wrote:

<snip>

...

...
Sounds like you're performing maintenance on your servers

(a) too often (b) during office / peak hours

I can't help it when the primary name server goes down because the UPS fails the self test and tells the server it has 2 minutes or so left in wich case the server figures it needs to shut down. I wanted better UPSs ...

<snip> Change that. Are you using apcupsd? You can set the config from SHUTDOWN=/sbin/shutdown to /bin/false. Then, the next time you see the UPS, change the battery. If it's just started to complain, it's not dead yet!

Works for me with all of our mostly APC SmartUPS 3000 rackmounts.

mark

10:31 p.m.

On 7/25/19 9:11 PM, mark wrote:

...

hw wrote:

...
On 7/25/19 4:07 PM, Giles Coochey wrote:

<snip> >> Sounds like you're performing maintenance on your servers >> >> >> (a) too often >> (b) during office / peak hours >> > > I can't help it when the primary name server goes down because the UPS > fails the self test and tells the server it has 2 minutes or so left in > wich case the server figures it needs to shut down. I wanted better UPSs > ... <snip> Change that. Are you using apcupsd? You can set the config from SHUTDOWN=/sbin/shutdown to /bin/false. Then, the next time you see the UPS, change the battery. If it's just started to complain, it's not dead yet!

Works for me with all of our mostly APC SmartUPS 3000 rackmounts.

I don't remember which UPS it was, either the crappy one for which a replacement battery was already waiting to be put in, or the normal one that already had a new battery in it which is either broken or doesn't get charged ...

That's how I rather have not everything go dark even when Murphy comes along. I have generally deprecated all non-rackmount UPSs, and being able to change batteries without outage has become a requirement.

John Pierce

9:39 p.m.

On Thu, Jul 25, 2019 at 10:32 AM hw hw@gc-24.de wrote:

...

I can't help it when the primary name server goes down because the UPS fails the self test and tells the server it has 2 minutes or so left in wich case the server figures it needs to shut down. I wanted better UPSs ...

critical infrastructure servers should have redudant PSUs, on seperate UPSs.

my last rack builds, I had 2 Eaton PowerWare 7KVA 4U UPS's in the bottom of each rack. one fed the left side PDUs, the other fed the right side PDUs, and each server had redundant PSU's, one plugged into each PDU.

those Eaton UPS's had hotswappable batteries, too.

-- -john r pierce recycling used bits in santa cruz

mark

9:45 p.m.

John Pierce wrote:

...

On Thu, Jul 25, 2019 at 10:32 AM hw hw@gc-24.de wrote:

...
I can't help it when the primary name server goes down because the UPS fails the self test and tells the server it has 2 minutes or so left in wich case the server figures it needs to shut down. I wanted better UPSs ...

critical infrastructure servers should have redudant PSUs, on seperate UPSs.

my last rack builds, I had 2 Eaton PowerWare 7KVA 4U UPS's in the bottom of each rack. one fed the left side PDUs, the other fed the right side PDUs, and each server had redundant PSU's, one plugged into each PDU. those Eaton UPS's had hotswappable batteries, too.

*shrug* All UPSes have hot-swappable. Mine beep while you disconnect the batteries, pull out the sled, replace all 8, shove it back in, and reconnect, and it shuts up.

For those that haven't done it, though, DO NOT BELIEVE WHAT ANYONE SAYS, DO NOT USE *ANYTHING* BUT HR (high rate) batteries in a UPS (maybe in a home one, but...). APC, for example, simply stays red, and insists that you still need to change them. *Good* battery vendors know this.

10:39 p.m.

On 7/25/19 9:39 PM, John Pierce wrote:

...

On Thu, Jul 25, 2019 at 10:32 AM hw hw@gc-24.de wrote:

...
I can't help it when the primary name server goes down because the UPS fails the self test and tells the server it has 2 minutes or so left in wich case the server figures it needs to shut down. I wanted better UPSs ...

critical infrastructure servers should have redudant PSUs, on seperate UPSs.

right, with hot swappable batteries ...

...

my last rack builds, I had 2 Eaton PowerWare 7KVA 4U UPS's in the bottom of each rack. one fed the left side PDUs, the other fed the right side PDUs, and each server had redundant PSU's, one plugged into each PDU.

those Eaton UPS's had hotswappable batteries, too.

... like this

Giles Coochey

11:17 p.m.

On 25/07/2019 20:39, John Pierce wrote:

...

On Thu, Jul 25, 2019 at 10:32 AM hw hw@gc-24.de wrote:

...
I can't help it when the primary name server goes down because the UPS fails the self test and tells the server it has 2 minutes or so left in wich case the server figures it needs to shut down. I wanted better UPSs ...

critical infrastructure servers should have redudant PSUs, on seperate UPSs.

Separate DNS servers must be on a different subnet according to RFC2182 (https://tools.ietf.org/html/rfc2182):

Secondary servers must be placed at both topologically and geographically dispersed locations on the Internet, to minimise the likelihood of a single failure disabling all of them.

I know that UPSs are physical, and subnets are logical, but the reasoning behind the requirement is due to having to be on a different infrastructure.

Giles Coochey

11:34 p.m.

On 25/07/2019 22:17, Giles Coochey wrote:

...

Separate DNS servers must be on a different subnet according to RFC2182 (https://tools.ietf.org/html/rfc2182):

Secondary servers must be placed at both topologically and geographically dispersed locations on the Internet, to minimise the likelihood of a single failure disabling all of them.

I know that UPSs are physical, and subnets are logical, but the reasoning behind the requirement is due to having to be on a different infrastructure.

__

Shock horror, replying to my own post, but in cloud cluster environments, you might consider anti-affinity rules to prevent multiple name servers going down at the same time due to a cluster node failure (i.e. rules to ensure that hypervisors keep different name servers on different hosts).

I know it doesn't help OP, who was looking for cluster based solutions, but the same applies if using load balancing virtual appliances, hosting IPs as name servers.

2198

Age (days ago)

2201

Last active (days ago)

discuss@lists.centos.org

33 comments

11 participants

tags (0)

participants (11)

Giles Coochey
hw
John Pierce
Leon Fauster
Leroy Tennison
mark
Nataraj
Nux!
Paul Heinlein
rainer＠ultra-secure.de
Warren Young