[CentOS] NFS / DNS problem

Tue Aug 14 23:49:39 UTC 2007
Simone <dezmodue at gmail.com>

Hi all,

Today we have had a strange problem that has taken down our website, we 
understand what happened but not why so I am hoping someone has seen 
this before.

We have our web servers (web1 web2 web3 ..... web10) mounting an NFS 
share (/export/data) from server nfs1. On the web server side we use 
autofs in the format nfs-dedicated:/export/data where nfs-dedicated is 
an alias in our internal DNS servers pointing to server nfs1. We run a 
primary and a secondary DNS (bind) server ns0, ns1 authoritative for our 
zones and our webservers have them configured in /etc/resolv.conf
Today we had to run some upgrade on the dns servers (bios firmwares etc) 
so we took down ns0 and with it our website went down.
All the nfs shares disappeared from the web servers (the logs show 
requests to mount/unmount timing out), but at the same time on nfs1 the 
logs show requests (mount and unmount) coming from the web servers and 
no errors.

As soon as ns0 is back up, all gets back to normal. Minutes later we 
take down ns1 for maintenance and it doesn't have any impact on the website.

dig @ns0 nfs-web gives exactly the same results on ns0/1

Back to the office we try to reproduce the same scenario configuring 
iptables on web3 to block traffic to ns0 but the server (web3) keeps 
working fine reverting to ns1 for name resolution (as you would expect).

Has anybody seen this happening before? Any comment/suggestion much 
appreciated.

Thanks

Simone