I'm a bit baffled by this and I'm looking for ideas...
background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)
situation: - all servers configured to use both DNS servers for lookups - ns1 server down for hardware problem - nagios alerts that smtp on email server taking longer than 2 seconds to respond - nagios alert for smtp on email server clears when ns1 returns to service
- when I use dig from the email server command line there is no problem or delay when ns1 is offline. It worked without a hitch using ns2.
Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.
Does dig use libresolv or read directly from resolv.conf? Also do you have a timeout configured in resolv.conf or are you relying on the os default?
On 25 Jul 2012, at 21:57, Steve Lindemann steve@marmot.org wrote:
I'm a bit baffled by this and I'm looking for ideas...
background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)
situation:
- all servers configured to use both DNS servers for lookups
- ns1 server down for hardware problem
- nagios alerts that smtp on email server taking longer than 2 seconds
to respond
nagios alert for smtp on email server clears when ns1 returns to service
when I use dig from the email server command line there is no problem
or delay when ns1 is offline. It worked without a hitch using ns2.
Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this. -- Steve _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 7/25/2012 3:21 PM, Tom Brown wrote:
Does dig use libresolv or read directly from resolv.conf? Also do you have a timeout configured in resolv.conf or are you relying on the os default?
dig uses resolv.conf and no timeouts are configured there. I don't know there the OS would have a default configured or what it is. Another reply indicated there would be a 5 second delay. That seems a bit high to me.
I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.
For both tests I used: dig mx google.com
dig uses resolv.conf and no timeouts are configured there. I don't know there the OS would have a default configured or what it is. Another reply indicated there would be a 5 second delay. That seems a bit high to me.
I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.
For both tests I used: dig mx google.com
i would always have a timeout in your resolv.conf rather than relying on the OS default.
Set that to 1 second and test again to see if there is any difference.
On 7/25/2012 3:58 PM, Tom Brown wrote:
dig uses resolv.conf and no timeouts are configured there. I don't know there the OS would have a default configured or what it is. Another reply indicated there would be a 5 second delay. That seems a bit high to me.
I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.
For both tests I used: dig mx google.com
i would always have a timeout in your resolv.conf rather than relying on the OS default.
Set that to 1 second and test again to see if there is any difference.
and that sounds like the best solution so far. I hadn't considered that... haven't look at that file in ages.
I do like knowing why something doesn't work, but I'm good with just getting it to work too. I'll give this a try, thanks! -- Steve
On Wednesday 25 July 2012 17:47, the following was written:
I used dig from the email svr command line with the primary DNS svr up and (naturally) it pulled from there as normal. Then I downed the primary DNS svr, saw the nagios check fail and tried again. The same dig lookup was actually faster and pulled from the secondary DNS svr just fine. And, again, the nagios alert cleared as soon as the primary DNS svr was back online.
I believe the reason you noticed a faster response is because the second query used the cached information from the first look-up not because the second server is/was faster.
to verify this look at the TTL times in the response.
On 07/25/2012 10:57 PM, Steve Lindemann wrote:
I'm a bit baffled by this and I'm looking for ideas...
background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)
situation:
- all servers configured to use both DNS servers for lookups
- ns1 server down for hardware problem
- nagios alerts that smtp on email server taking longer than 2 seconds
to respond
nagios alert for smtp on email server clears when ns1 returns to service
when I use dig from the email server command line there is no problem
or delay when ns1 is offline. It worked without a hitch using ns2.
Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.
The default timeout for a DNS lookup is usually 5 seconds so the system will try ns1, time out after 5 seconds and then use ns2.
Regards, Dennis
On Wed, Jul 25, 2012 at 4:25 PM, Dennis Jacobfeuerborn dennisml@conversis.de wrote:
On 07/25/2012 10:57 PM, Steve Lindemann wrote:
I'm a bit baffled by this and I'm looking for ideas...
background: two DNS servers (ns1 & ns2)(64bit CentOS 5.8) one email server (64bit CentOS 5.8 & postfix 2.3.3) one nagios server (64bit CentOS 5.8 & nagios 3.3.1)
situation:
- all servers configured to use both DNS servers for lookups
- ns1 server down for hardware problem
- nagios alerts that smtp on email server taking longer than 2 seconds
to respond
nagios alert for smtp on email server clears when ns1 returns to service
when I use dig from the email server command line there is no problem
or delay when ns1 is offline. It worked without a hitch using ns2.
Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.
The default timeout for a DNS lookup is usually 5 seconds so the system will try ns1, time out after 5 seconds and then use ns2.
Yes, a delay is normal when the 1st dns server is down. You might want to run a caching nameserver on your email server (and perhaps others) so you don't wait for cached addresses. The caching servers can use the main ones as forwarders if necessary.
On 07/25/12 1:57 PM, Steve Lindemann wrote:
Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.
DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?
On 7/25/2012 3:55 PM, John R Pierce wrote:
On 07/25/12 1:57 PM, Steve Lindemann wrote:
Anyone have any ideas for why nagios would have trouble testing smtp on the email server when the primary dns goes offline? I'm not even sure where to look or who else would make sense to ask the question of on this one. I'd appreciate any insight anyone out there has on this.
DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?
good question but unlikely... all the servers are in the same dmz and sit on the same switch. -- Steve
DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?
Unfortunately that is a common misconception.
Tcp is used far more often than "only" as stated such as for size of request exceeding udp response size etc...
Bottom line is both ports are needed, not just for zone xfers.
jlc
On Jul 25, 2012, at 21:27, "Joseph L. Casale" jcasale@activenetwerx.com wrote:
DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?
Unfortunately that is a common misconception.
Tcp is used far more often than "only" as stated such as for size of request exceeding udp response size etc...
Bottom line is both ports are needed, not just for zone xfers.
Except that the malware guys have figured out how to abuse port 53. Security recommendation is to block TCP unless you're running a DNS server. And also block oversize port 53 UDP packets.
Dave M
On 26/07/2012 02:40, David McGuffey wrote:
On Jul 25, 2012, at 21:27, "Joseph L. Casale" jcasale@activenetwerx.com wrote:
DNS lookups default to using 53/udp, and only use 53/tcp for zone transfers. could it be 53/udp is being lost/blocked between this host and your ns1 ?
Unfortunately that is a common misconception.
Tcp is used far more often than "only" as stated such as for size of request exceeding udp response size etc...
Bottom line is both ports are needed, not just for zone xfers.
Except that the malware guys have figured out how to abuse port 53. Security recommendation is to block TCP unless you're running a DNS server. And also block oversize port 53 UDP packets.
Blocking oversize UDP packets is a very bad idea. EDNS is used for a lot of look ups these days due to DNSSEC, and so blocking oversize UDP packets will force you to use TCP to get many of your DNS requests.
Dave M
Tris
************************************************************* This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postmaster@bgfl.org
The views expressed within this email are those of the individual, and not necessarily those of the organisation *************************************************************