boot-time NFS mount failures

List overview All Threads
Download

newer

older

local repo question

CentOS-announce Digest, Vol 86,...

Tilman Schmidt

19 Apr 2012 19 Apr '12

5:12 p.m.

Hello List,

I have a problem with a CentOS 5 server running Oracle DBMS with the transaction logs going to an NFS share on our CentOS 6/Bacula backup server. The Oracle server has this in its /etc/fstab file:

backup:/home/backup/Oracle /backup_nfs nfs hard,intr,noexec,rsize=32768,wsize=32768

The backup server, in its /etc/exports:

/home/backup/Oracle -rw,async,all_squash,anonuid=133 cen5-db-01

This works quite well, except that when all the servers are rebooted together (such as after a power outage) the Oracle server fails to mount the NFS share, but goes ahead to start Oracle anyway. Oracle then either complains that it cannot find its logs or, worse, starts to write them to the local disk.

The last time this happened, I found a message on the console:

mount: can't get address for backup

So it seems that the failure was caused by the nameserver not being available yet. Unfortunately that message isn't saved to any logfile, so I cannot say if it was the same the previous times.

How can I make sure that the system startup does not proceed until the NFS share is available and mounted successfully? From R'ingTFM I got the impression that this should be the default behaviour, but apparently it isn't. I don't know if hardcoding the backup server's IP address into the Oracle server's /etc/hosts file would help, but I would really like to avoid that, anyway.

Thanks in advance for any hints, Tilman

Show replies by date

Veli-Pekka Kestilä

19 Apr 19 Apr

5:30 p.m.

On 19.4.2012 20:12, Tilman Schmidt wrote:

...

Hello List,

backup:/home/backup/Oracle /backup_nfs nfs

The last time this happened, I found a message on the console:

mount: can't get address for backup

So it seems that the failure was caused by the nameserver not being available yet. Unfortunately that message isn't saved to any logfile, so I cannot say if it was the same the previous times.

You could set in fstab ipaddress instead of the server name, so there is no need for name lookup or you can put the ip and name in /etc/hosts

...

How can I make sure that the system startup does not proceed until the NFS share is available and mounted successfully? From R'ingTFM I got the impression that this should be the default behaviour, but apparently it isn't. I don't know if hardcoding the backup server's IP address into the Oracle server's /etc/hosts file would help, but I would really like to avoid that, anyway.

Where the nameserver is? On the backup server or on the oracle (or on completely separate machine?) - If on oracle server, you could make that one to start before oracle in init. - If on separate machine you could write your own init script which tests that the name resolution works and runs the oracle startup after that.

I would put the ip in hosts if the backup server has fixed ip-address. If not then making special init-script could be the trick.

-vpk

Tilman Schmidt

20 Apr 20 Apr

2:27 p.m.

Am 19.04.2012 19:30, schrieb Veli-Pekka Kestilä:

...

On 19.4.2012 20:12, Tilman Schmidt wrote:

...
backup:/home/backup/Oracle /backup_nfs nfs

The last time this happened, I found a message on the console:

mount: can't get address for backup

So it seems that the failure was caused by the nameserver not being available yet. Unfortunately that message isn't saved to any logfile, so I cannot say if it was the same the previous times.

You could set in fstab ipaddress instead of the server name, so there is no need for name lookup or you can put the ip and name in /etc/hosts

So you say it's only the name lookup failure that's causing startup to proceed without the NFS mount? All other failures like host unreachable or NFS port not open would cause the system to wait and retry?

...

Where the nameserver is? On the backup server or on the oracle (or on completely separate machine?)

Separate machine.

...

If on separate machine you could write your own init script which

tests that the name resolution works and runs the oracle startup after that.

It would have to go before the netfs service I think. That's the one which does the NFS mount. The oracle startup script runs after netfs, so all would be fine if netfs wouldn't exit without having mounted the NFS shares.

...

I would put the ip in hosts if the backup server has fixed ip-address. If not then making special init-script could be the trick.

My concern are possible other failure modes besides the name lookup. What happens if the IP address is available (hardcoded or via name resolution) but the NFS server is offline? What if the NFS server machine is online (say, pingable) but the NFS service doesn't listen (yet)? I have to make sure that in all these cases the Oracle processes do not get started until the NFS mount is available.

Thanks, Tilman

-- Tilman Schmidt Phoenix Software GmbH Bonn, Germany

Veli-Pekka Kestilä

3:16 p.m.

On 20.4.2012 17:27, Tilman Schmidt wrote:

...

Am 19.04.2012 19:30, schrieb Veli-Pekka Kestilä:

...
On 19.4.2012 20:12, Tilman Schmidt wrote:

...
backup:/home/backup/Oracle /backup_nfs nfs

The last time this happened, I found a message on the console:

mount: can't get address for backup

So it seems that the failure was caused by the nameserver not being available yet. Unfortunately that message isn't saved to any logfile, so I cannot say if it was the same the previous times.

You could set in fstab ipaddress instead of the server name, so there is no need for name lookup or you can put the ip and name in /etc/hosts

So you say it's only the name lookup failure that's causing startup to proceed without the NFS mount? All other failures like host unreachable or NFS port not open would cause the system to wait and retry?

...

If on separate machine you could write your own init script which

tests that the name resolution works and runs the oracle startup after that.

It would have to go before the netfs service I think. That's the one which does the NFS mount. The oracle startup script runs after netfs, so all would be fine if netfs wouldn't exit without having mounted the NFS shares.

...
I would put the ip in hosts if the backup server has fixed ip-address. If not then making special init-script could be the trick.

My concern are possible other failure modes besides the name lookup. What happens if the IP address is available (hardcoded or via name resolution) but the NFS server is offline? What if the NFS server machine is online (say, pingable) but the NFS service doesn't listen (yet)? I have to make sure that in all these cases the Oracle processes do not get started until the NFS mount is available.

As you are telling there is multitude of things which can cause the nfs-server not to work. So if you want to be sure you should really invest on writing the init script. Way I would propose to do it, would be to put it after the netfs to replace the oracle's original init script.

It would then do all the necessary tests to see if the nfs is mounted correctly and ready to use. (It could even troubleshoot some of the problems like trying to remount nfs mounts) It would then call the original oracle init script when everything works. It could also leave this functionality on backgroud and let rest of the system to boot, so that you can log in and troubleshoot if necessary.

-vpk

Tilman Schmidt

4:03 p.m.

Am 19.04.2012 19:30, schrieb Veli-Pekka Kestilä:

...

You could set in fstab ipaddress instead of the server name, so there is no need for name lookup or you can put the ip and name in /etc/hosts

Tried that now and it didn't help. When the backup server is down, the message is "no route to host" where it previously said "can't get address", but the startup sequence still proceeds without waiting for the NFS mount to appear.

-- Tilman Schmidt Phoenix Software GmbH Bonn, Germany

Vahan Yerkanian

19 Apr 19 Apr

9:10 p.m.

On Apr 19, 2012, at 9:12 PM, Tilman Schmidt wrote:

...

Hello List,

I have a problem with a CentOS 5 server running Oracle DBMS with the transaction logs going to an NFS share on our CentOS 6/Bacula backup server. The Oracle server has this in its /etc/fstab file:

backup:/home/backup/Oracle /backup_nfs nfs hard,intr,noexec,rsize=32768,wsize=32768

Just add _netdev to the mount options.

From man:

_netdev The filesystem resides on a device that requires network access (used to prevent the system from attempting to mount these filesystems until the network has been enabled on the system). HTH, Vahan

Tilman Schmidt

20 Apr 20 Apr

3:03 p.m.

Am 19.04.2012 23:10, schrieb Vahan Yerkanian:

...

...
backup:/home/backup/Oracle /backup_nfs nfs hard,intr,noexec,rsize=32768,wsize=32768

Just add _netdev to the mount options.

...
From man:

_netdev The filesystem resides on a device that requires network access (used to prevent the system from attempting to mount these filesystems until the network has been enabled on the system).

That doesn't seem to have any effect. In fact, I had found references to that option on the web, but they seemed to agree that it didn't apply to NFS mounts because CentOS 5 already takes care to do these after the network is up. To confirm, /etc/init.d/netfs contains the lines

...

NFSFSTAB=`LC_ALL=C awk '!/^#/ && $3 ~ /^nfs/ && $3 != "nfsd" && $4 !~ /noauto/ { print $2 }' /etc/fstab` SMBFSTAB=`LC_ALL=C awk '!/^#/ && $3 == "smbfs" && $4 !~ /noauto/ { print $2 }' /etc/fstab` CIFSFSTAB=`LC_ALL=C awk '!/^#/ && $3 == "cifs" && $4 !~ /noauto/ { print $2 }' /etc/fstab` NCPFSTAB=`LC_ALL=C awk '!/^#/ && $3 == "ncpfs" && $4 !~ /noauto/ { print $2 }' /etc/fstab` NETDEVFSTAB=`LC_ALL=C awk '!/^#/ && $4 ~/_netdev/ && $4 !~ /noauto/ { print $1 }' /etc/fstab`

for finding the fstab entries to process, and the section processing NETDEVFSTAB explicitly excludes the fstypes "nfs,nfs4,smbfs,cifs,ncpfs,gfs" from its mount command.

The netfs service is started after the network service, so networking on the machine itself is up by that time. The problem AFAICS is that other servers on the network which are needed for the mount to succeed (the NFS server itself and possibly a nameserver) aren't up yet.

Thanks, Tilman

-- Tilman Schmidt Phoenix Software GmbH Bonn, Germany

4873

Age (days ago)

4874

Last active (days ago)

discuss@lists.centos.org

6 comments

3 participants

tags (0)

participants (3)

Tilman Schmidt
Vahan Yerkanian
Veli-Pekka Kestilä