NFS automount failure

List overview All Threads
Download

newer

older

C5: correct place for routing...

Xenserver installation problem

Trevor Cooper

27 Apr 2010 27 Apr '10

8:51 p.m.

I'm having a problem with automount (autofs) from a server running CentOS 5.4 to clients (example is CentOS 5.4). Client pulls automount maps from NIS. THIS particular server is also used for login so it is NIS bound as well (other servers are NOT).

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NIS Server Side... (content shortened) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ ypcat -k auto.master | sort | grep auto.mdkm1 /storage/mdkm1 auto.mdkm1 --timeout=600

$ ypcat -k auto.mdkm1 | sort 1 137.110.179.254:/mdkm1/1 . . . 10 137.110.179:254:/mdkm1/10

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NFS Server Side... (content shortened) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ cat /etc/exports /mdkm1/1 ###.###.172.0/25(rw,sync) ###.###.179.192/26(rw,async) . . . /mdkm1/10 ###.###.172.0/25(rw,sync) ###.###.179.192/26(rw,async)

$ cat /etc/fstab /dev/MDKM1/MDKM1-1 /mdkm1/1 ext3 defaults 1 2 . . . /dev/MDKM1/MDKM1-10 /mdkm1/10 ext3 defaults 1 2

# mount /dev/mapper/MDKM1-MDKM1--1 on /mdkm1/1 type ext3 (rw) . . . /dev/mapper/MDKM1-MDKM1--10 on /mdkm1/10 type ext3 (rw)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NIS/NFS Client Side... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ cat /etc/auto.master | egrep -v "^#" /misc /etc/auto.misc /net -hosts +auto.master

# service autofs status automount (pid 3714) is running...

$ ls /storage/mdkm1/1 kmdev lost+found

$ ls /storage/mdkm1/10 <--- wait here forever [CTRL-C] to abort ls: /storage/mdkm1/10: Interrupted system call

*NOTE: Above command never completes and will hang the shell from multiple clients.

# mkdir -p /mnt/mdkm1/10 # mount -t nfs ###.###.179.254:/mdkm1/10 /mnt/mdkm1/10

$ ls /mnt/mdkm1/10 kmdev lost+found

# umount /mnt/mdkm1/10

NOTE: During mount/umount the NFS server logs show the requests for the automount of mdkm1/1 AND the manual mount of mdkm1/10. The server log never shows an attempt to mount mdkm1/10 in the 'stalled' automount attempt. For example...

Apr 27 13:14:50 rilkm01 mountd[8002]: authenticated mount request from ###.###.172.70:730 for /mdkm1/1 (/mdkm1/1)

Apr 27 13:15:24 rilkm01 mountd[8002]: authenticated unmount request from ###.###.172.70:771 for /mdkm1/1 (/mdkm1/1)

Apr 27 13:16:17 rilkm01 mountd[8002]: authenticated mount request from ###.###.172.70:785 for /mdkm1/10 (/mdkm1/10)

Apr 27 13:16:31 rilkm01 mountd[8002]: authenticated unmount request from ###.###.172.70:794 for /mdkm1/10 (/mdkm1/10)

I've restarted the Server NFS services, Client autofs services AND the systems as a whole multiple times to no avail.

Any ideas?

Thanks, Trevor Cooper

-- Trevor Cooper, M.Sc. Data Systems Programmer / System Administrator University of California, San Diego Multimodal Imaging Laboratory 8950 Villa La Jolla Dr., Suite C101 La Jolla, CA 92037 Phone: (858) 534-8259 Fax: (858) 534-1078

Show replies by date

Agile Aspect

27 Apr 27 Apr

9:32 p.m.

On Tue, Apr 27, 2010 at 1:51 PM, Trevor Cooper tcooper@ucsd.edu wrote:

...

I'm having a problem with automount (autofs) from a server running CentOS 5.4 to clients (example is CentOS 5.4). Client pulls automount maps from NIS. THIS particular server is also used for login so it is NIS bound as well (other servers are NOT).

I inherited a similar problem but different environment, namely, SUSE, CentOS 4, and CentOS 5 - with SUSE the primary NIS server.

In the end, in order to get everyone to play nicely, I ended up using the following in the /etc/auto.master file

#+auto.master /home yp:auto.home /usr/grid yp:auto.grid

etc.

-- Enjoy global warming while it lasts.

Trevor Cooper

28 Apr 28 Apr

7:11 p.m.

On 04/27/2010 02:32 PM, Agile Aspect wrote:

...

On Tue, Apr 27, 2010 at 1:51 PM, Trevor Coopertcooper@ucsd.edu wrote:

...
I'm having a problem with automount (autofs) from a server running CentOS 5.4 to clients (example is CentOS 5.4). Client pulls automount maps from NIS. THIS particular server is also used for login so it is NIS bound as well (other servers are NOT).

I inherited a similar problem but different environment, namely, SUSE, CentOS 4, and CentOS 5 - with SUSE the primary NIS server.

In the end, in order to get everyone to play nicely, I ended up using the following in the /etc/auto.master file

#+auto.master /home yp:auto.home /usr/grid yp:auto.grid

etc.

Since your suggestion for changing /etc/auto.master referenced #+auto.master I assumed you were suggesting the change for a NFS client.

Trying to solve the issue this way doesn't seem to work.

On the NFS Server (also a NIS bound NFS client) I made the suggested changes and reloaded the automount maps.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is the directory tree on of the exported filesystem(s) LOCAL to the NFS server...

. . .

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is the same command looking at the NFS automounted directories...

. . .

|-- 10 [error opening dir]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Still NO access to /storage/mdkm1/10 from a client and in THIS example the client IS the NFS server talking to itself.

The filesystem is ext3 on an VG/LV from a RAID5 storage enclosure. The filesystem has been umounted, fsck'd and remounted. ALL other VG/LV's from the same enclosure (ie. /mdkm[1-9]) are fine.

Still scratching the head on this one...

Trevor Cooper

Agile Aspect

10:46 p.m.

Please post

/bin/ls -ld /mdkm1/*

And if I understand you correctly, there are 10 file systems mounted locally on this machine and you're only having trouble accessing file system "10" when it's mounted via autofs?

-- Enjoy global warming while it lasts.

Trevor Cooper

29 Apr 29 Apr

9:30 p.m.

On 04/28/2010 03:46 PM, Agile Aspect wrote:

...

Please post
/bin/ls -ld /mdkm1/*

[root@******* ~]# /bin/ls -ldn /mdkm1/* drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/1 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/10 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/2 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/3 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/4 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/5 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/6 drwxr-xr-x 5 4000 4000 4096 Feb 4 22:09 /mdkm1/7 drwxr-xr-x 5 4000 4000 4096 Apr 5 13:51 /mdkm1/8 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/9 -rw-r--r-- 1 4000 4000 0 Sep 18 2009 /mdkm1/testfile_mdkm1

...

And if I understand you correctly, there are 10 file systems mounted locally on this machine and you're only having trouble accessing file system "10" when it's mounted via autofs?

Ten file systems mounted under /mdkm1/ (of course there are others) and to be clear, /mdkm1/10 will mount 'manually' without problems but will not mount at all via autofs (nothing seen at the NFS server in the logs at all).

Agile Aspect

11:02 p.m.

On Thu, Apr 29, 2010 at 2:30 PM, Trevor Cooper tcooper@ucsd.edu wrote:

...

On 04/28/2010 03:46 PM, Agile Aspect wrote:

...
Please post

/bin/ls -ld /mdkm1/*

[root@******* ~]# /bin/ls -ldn /mdkm1/* drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/1 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/10 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/2 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/3 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/4 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/5 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/6 drwxr-xr-x 5 4000 4000 4096 Feb 4 22:09 /mdkm1/7 drwxr-xr-x 5 4000 4000 4096 Apr 5 13:51 /mdkm1/8 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/9 -rw-r--r-- 1 4000 4000 0 Sep 18 2009 /mdkm1/testfile_mdkm1

...
And if I understand you correctly, there are 10 file systems mounted locally on this machine and you're only having trouble accessing file system "10" when it's mounted via autofs?

Ten file systems mounted under /mdkm1/ (of course there are others) and to be clear, /mdkm1/10 will mount 'manually' without problems but will not mount at all via autofs (nothing seen at the NFS server in the logs at all).

I'll presume you've grep'd for 'kernel' and 'nfs' in /var/log/messages and the output doesn't suggest a problems with either.

And that RPCNFSDCOUNT in

/etc/sysconfig/nfs

has been changed from it's default value of 8 - if not I would double it and restart nfs.

Then I'd try the following if you haven't already

/usr/sbin/exportfs -rv /etc/init.d/nfs restart

If nothing changes, then look at the file rmtab, etab, xtab (later may be empty) in

/var/lib/nfs

and if the rpc.statd daemon is running, in

/var/lib/nfs/statd/sm

and see if you spot anything screwy when you access the mounts.

And if nothing works, I'd try renaming the filesystem from '10' to 'ten' - you could be tripping on a bug..

-- Enjoy global warming while it lasts.

Trevor Cooper

30 Apr 30 Apr

1:40 a.m.

On 04/29/2010 04:02 PM, Agile Aspect wrote:

...

On Thu, Apr 29, 2010 at 2:30 PM, Trevor Coopertcooper@ucsd.edu wrote:

...
On 04/28/2010 03:46 PM, Agile Aspect wrote:

...
Please post
 /bin/ls -ld /mdkm1/*
[root@******* ~]# /bin/ls -ldn /mdkm1/* drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/1 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/10 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/2 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/3 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/4 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/5 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/6 drwxr-xr-x 5 4000 4000 4096 Feb 4 22:09 /mdkm1/7 drwxr-xr-x 5 4000 4000 4096 Apr 5 13:51 /mdkm1/8 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/9 -rw-r--r-- 1 4000 4000 0 Sep 18 2009 /mdkm1/testfile_mdkm1

...
And if I understand you correctly, there are 10 file systems mounted locally on this machine and you're only having trouble accessing file system "10" when it's mounted via autofs?

Ten file systems mounted under /mdkm1/ (of course there are others) and to be clear, /mdkm1/10 will mount 'manually' without problems but will not mount at all via autofs (nothing seen at the NFS server in the logs at all).
I'll presume you've grep'd for 'kernel' and 'nfs' in /var/log/messages and the output doesn't suggest a problems with either.

Nothing obvious... it looks like all file systems are mounting at boot time...

Apr 27 00:06:04 rilkm01 kernel: EXT3 FS on dm-0, internal journal Apr 27 00:06:04 rilkm01 kernel: EXT3 FS on dm-1, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-2, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-3, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-4, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-5, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-6, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-7, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-8, internal journal Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-9, internal journal

I DID apply system updates during 'maintenance' after the machine was taken down by an overzealous user...

Apr 26 23:25:13 rilkm01 yum: Updated: nfs-utils-lib-1.0.8-7.6.el5.x86_64 Apr 26 23:32:30 rilkm01 yum: Updated: 1:nfs-utils-1.0.9-42.el5.x86_64 . . . Apr 27 00:06:10 rilkm01 kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de). Apr 27 00:06:10 rilkm01 kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory

...but see last comment.

...

And that RPCNFSDCOUNT in
 /etc/sysconfig/nfs
has been changed from it's default value of 8 - if not I would double it and restart nfs.

$ cat /etc/init.d/nfs | egrep "RPCNFSDCOUNT" [ -z "$RPCNFSDCOUNT" ] && RPCNFSDCOUNT=32 daemon rpc.nfsd $RPCNFSDARGS $RPCNFSDCOUNT

[root@rilkm01 tcooper-local]# /sbin/service nfs status rpc.mountd (pid 8002) is running... nfsd (pid 7999 7998 7997 7996 7995 7994 7993 7992 7991 7990 7989 7988 7987 7986 7985 7984 7983 7982 7981 7980 7979 7978 7977 7976 7975 7974 7973 7972 7971 7970 7969 7968) is running... rpc.rquotad (pid 7890) is running...

...

Then I'd try the following if you haven't already
   /usr/sbin/exportfs -rv

# /usr/sbin/exportfs -rv exporting 137.110.172.0/25:/mdkm1/10 exporting 137.110.179.192/26:/mdkm1/10 exporting 137.110.172.0/25:/mdkm1/1 exporting 137.110.179.192/26:/mdkm1/1 exporting 137.110.172.0/25:/mdkm1/2 exporting 137.110.179.192/26:/mdkm1/2 exporting 137.110.172.0/25:/mdkm1/3 exporting 137.110.179.192/26:/mdkm1/3 exporting 137.110.172.0/25:/mdkm1/4 exporting 137.110.179.192/26:/mdkm1/4 exporting 137.110.172.0/25:/mdkm1/5 exporting 137.110.179.192/26:/mdkm1/5 exporting 137.110.172.0/25:/mdkm1/6 exporting 137.110.179.192/26:/mdkm1/6 exporting 137.110.172.0/25:/mdkm1/7 exporting 137.110.179.192/26:/mdkm1/7 exporting 137.110.172.0/25:/mdkm1/8 exporting 137.110.179.192/26:/mdkm1/8 exporting 137.110.172.0/25:/mdkm1/9 exporting 137.110.179.192/26:/mdkm1/9

...

   /etc/init.d/nfs restart

[root@rilkm01 tcooper-local]# /sbin/service nfs restart Shutting down NFS mountd: [ OK ] Shutting down NFS daemon: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ]

...

If nothing changes, then look at the file rmtab, etab, xtab (later may be empty) in
  /var/lib/nfs

All look 'normal' (compared to another server)

...

and if the rpc.statd daemon is running, in
  /var/lib/nfs/statd/sm
and see if you spot anything screwy when you access the mounts.

Also looks 'normal' (compared to another server)

...

And if nothing works, I'd try renaming the filesystem from '10' to 'ten' - you could be tripping on a bug..

Attempted with the same fate...

I DO have other servers, same OS, same hardware RAID, same file system layout, same exports without any issues automounting /<parent>/10

I'm starting to wonder about a strange parse error on the exports file and/or automount map file(s).

Going to look into running automount is a 'debug' mode to see what it's output might be.

Trevor

Ross Walker

29 Apr 29 Apr

11:57 p.m.

On Apr 29, 2010, at 5:30 PM, Trevor Cooper tcooper@ucsd.edu wrote:

...

On 04/28/2010 03:46 PM, Agile Aspect wrote:

...
Please post

/bin/ls -ld /mdkm1/*

[root@******* ~]# /bin/ls -ldn /mdkm1/* drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/1 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/10 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/2 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/3 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/4 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/5 drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/6 drwxr-xr-x 5 4000 4000 4096 Feb 4 22:09 /mdkm1/7 drwxr-xr-x 5 4000 4000 4096 Apr 5 13:51 /mdkm1/8 drwxr-xr-x 4 4000 4000 4096 Jun 12 2009 /mdkm1/9 -rw-r--r-- 1 4000 4000 0 Sep 18 2009 /mdkm1/testfile_mdkm1

...
And if I understand you correctly, there are 10 file systems mounted locally on this machine and you're only having trouble accessing file system "10" when it's mounted via autofs?

Ten file systems mounted under /mdkm1/ (of course there are others) and to be clear, /mdkm1/10 will mount 'manually' without problems but will not mount at all via autofs (nothing seen at the NFS server in the logs at all).

What does the map look like?

It sounds like '10' might be interpreted as '1'.

-Ross

Trevor Cooper

30 Apr 30 Apr

1:25 a.m.

On 04/29/2010 04:57 PM, Ross Walker wrote:

...

On Apr 29, 2010, at 5:30 PM, Trevor Coopertcooper@ucsd.edu wrote:

...

What does the map look like?

It sounds like '10' might be interpreted as '1'.

Not sure what you're asking for that isn't in the OP? Can you elaborate?

Thanks, Trevor

...

-Ross

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

5564

Age (days ago)

5567

Last active (days ago)

discuss@lists.centos.org

8 comments

3 participants

tags (0)

participants (3)

Agile Aspect
Ross Walker
Trevor Cooper