DNS lookup delay with centos & postfix

List overview All Threads
Download

newer

older

Re: [CentOS]...

Steve Lindemann

25 Jul 2012 25 Jul '12

8:57 p.m.

I'm a bit baffled by this and I'm looking for ideas...

background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)

situation: - all servers configured to use both DNS servers for lookups - ns1 server down for hardware problem - nagios alerts that smtp on email server taking longer than 2 seconds to respond - nagios alert for smtp on email server clears when ns1 returns to service

- when I use dig from the email server command line there is no problem or delay when ns1 is offline. It worked without a hitch using ns2.

Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.

-- Steve

Show replies by date

Tom Brown

25 Jul 25 Jul

9:21 p.m.

Does dig use libresolv or read directly from resolv.conf? Also do you have a timeout configured in resolv.conf or are you relying on the os default?

On 25 Jul 2012, at 21:57, Steve Lindemann steve@marmot.org wrote:

...

I'm a bit baffled by this and I'm looking for ideas...

background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)

situation:

all servers configured to use both DNS servers for lookups

ns1 server down for hardware problem

nagios alerts that smtp on email server taking longer than 2 seconds

to respond

nagios alert for smtp on email server clears when ns1 returns to service

when I use dig from the email server command line there is no problem

or delay when ns1 is offline. It worked without a hitch using ns2.

Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this. -- Steve _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Steve Lindemann

9:47 p.m.

On 7/25/2012 3:21 PM, Tom Brown wrote:

...

Does dig use libresolv or read directly from resolv.conf? Also do you have a timeout configured in resolv.conf or are you relying on the os default?

dig uses resolv.conf and no timeouts are configured there. I don't know there the OS would have a default configured or what it is. Another reply indicated there would be a 5 second delay. That seems a bit high to me.

I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.

For both tests I used: dig mx google.com

-- Steve

Tom Brown

9:58 p.m.

...

dig uses resolv.conf and no timeouts are configured there. I don't know there the OS would have a default configured or what it is. Another reply indicated there would be a 5 second delay. That seems a bit high to me.

I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.

For both tests I used: dig mx google.com

i would always have a timeout in your resolv.conf rather than relying on the OS default.

Set that to 1 second and test again to see if there is any difference.

Steve Lindemann

10:29 p.m.

On 7/25/2012 3:58 PM, Tom Brown wrote:

...

...
dig uses resolv.conf and no timeouts are configured there. I don't know there the OS would have a default configured or what it is. Another reply indicated there would be a 5 second delay. That seems a bit high to me.

I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.

For both tests I used: dig mx google.com

i would always have a timeout in your resolv.conf rather than relying on the OS default.

Set that to 1 second and test again to see if there is any difference.

and that sounds like the best solution so far. I hadn't considered that... haven't look at that file in ages.

I do like knowing why something doesn't work, but I'm good with just getting it to work too. I'll give this a try, thanks! -- Steve

Robert Spangler

26 Jul 26 Jul

10:14 p.m.

On Wednesday 25 July 2012 17:47, the following was written:

...

I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.

I believe the reason you noticed a faster response is because the second query used the cached information from the first look-up not because the second server is/was faster.

to verify this look at the TTL times in the response.

-- Regards Robert Linux The adventure of a lifetime. Linux User #296285 Get Counted http://linuxcounter.net/

Dennis Jacobfeuerborn

25 Jul 25 Jul

9:25 p.m.

On 07/25/2012 10:57 PM, Steve Lindemann wrote:

...

I'm a bit baffled by this and I'm looking for ideas...

background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)

situation:

all servers configured to use both DNS servers for lookups

ns1 server down for hardware problem

nagios alerts that smtp on email server taking longer than 2 seconds

to respond

nagios alert for smtp on email server clears when ns1 returns to service

when I use dig from the email server command line there is no problem

or delay when ns1 is offline. It worked without a hitch using ns2.

Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.

The default timeout for a DNS lookup is usually 5 seconds so the system will try ns1, time out after 5 seconds and then use ns2.

Regards, Dennis

Les Mikesell

9:31 p.m.

On Wed, Jul 25, 2012 at 4:25 PM, Dennis Jacobfeuerborn dennisml@conversis.de wrote:

...

On 07/25/2012 10:57 PM, Steve Lindemann wrote:

...
I'm a bit baffled by this and I'm looking for ideas...

background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)

situation:

all servers configured to use both DNS servers for lookups

ns1 server down for hardware problem

nagios alerts that smtp on email server taking longer than 2 seconds

to respond

nagios alert for smtp on email server clears when ns1 returns to service

when I use dig from the email server command line there is no problem

or delay when ns1 is offline. It worked without a hitch using ns2.

Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.

The default timeout for a DNS lookup is usually 5 seconds so the system will try ns1, time out after 5 seconds and then use ns2.

Yes, a delay is normal when the 1st dns server is down. You might want to run a caching nameserver on your email server (and perhaps others) so you don't wait for cached addresses. The caching servers can use the main ones as forwarders if necessary.

-- Les Mikesell lesmikesell@gmail.com

John R Pierce

9:55 p.m.

On 07/25/12 1:57 PM, Steve Lindemann wrote:

...

Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.

DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Steve Lindemann

10:23 p.m.

On 7/25/2012 3:55 PM, John R Pierce wrote:

...

On 07/25/12 1:57 PM, Steve Lindemann wrote:

...
Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.

DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?

good question but unlikely... all the servers are in the same dmz and sit on the same switch. -- Steve

Joseph L. Casale

26 Jul 26 Jul

1:27 a.m.

...

DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?

Unfortunately that is a common misconception.

Tcp is used far more often than "only" as stated such as for size of request exceeding udp response size etc...

Bottom line is both ports are needed, not just for zone xfers.

jlc

David McGuffey

1:40 a.m.

On Jul 25, 2012, at 21:27, "Joseph L. Casale" jcasale@activenetwerx.com wrote:

...

...
DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?

Unfortunately that is a common misconception.

Tcp is used far more often than "only" as stated such as for size of request exceeding udp response size etc...

Bottom line is both ports are needed, not just for zone xfers.

Except that the malware guys have figured out how to abuse port 53. Security recommendation is to block TCP unless you're running a DNS server. And also block oversize port 53 UDP packets.

Dave M

Tris Hoar

12:21 p.m.

On 26/07/2012 02:40, David McGuffey wrote:

...

On Jul 25, 2012, at 21:27, "Joseph L. Casale" jcasale@activenetwerx.com wrote:

...
...
DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?

Unfortunately that is a common misconception.

Tcp is used far more often than "only" as stated such as for size of request exceeding udp response size etc...

Bottom line is both ports are needed, not just for zone xfers.

Except that the malware guys have figured out how to abuse port 53. Security recommendation is to block TCP unless you're running a DNS server. And also block oversize port 53 UDP packets.

Blocking oversize UDP packets is a very bad idea. EDNS is used for a lot of look ups these days due to DNSSEC, and so blocking oversize UDP packets will force you to use TCP to get many of your DNS requests.

...

Dave M

Tris

************************************************************* This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster@bgfl.org

The views expressed within this email are those of the individual, and not necessarily those of the organisation *************************************************************

4778

Age (days ago)

4779

Last active (days ago)

discuss@lists.centos.org

12 comments

9 participants

tags (0)

participants (9)

David McGuffey
Dennis Jacobfeuerborn
John R Pierce
Joseph L. Casale
Les Mikesell
Robert Spangler
Steve Lindemann
Tom Brown
Tris Hoar