[CentOS] NFS automount failure

On 04/29/2010 04:02 PM, Agile Aspect wrote:
> On Thu, Apr 29, 2010 at 2:30 PM, Trevor Cooper<tcooper at ucsd.edu>  wrote:
>> On 04/28/2010 03:46 PM, Agile Aspect wrote:
>>> Please  post
>>>
>>>      /bin/ls -ld /mdkm1/*
>>>
>>
>> [root@******* ~]# /bin/ls -ldn /mdkm1/*
>> drwxr-xr-x 4 4000 4000 4096 Jun 12  2009 /mdkm1/1
>> drwxr-xr-x 4 4000 4000 4096 Jun 12  2009 /mdkm1/10
>> drwxr-xr-x 4 4000 4000 4096 Jun 12  2009 /mdkm1/2
>> drwxr-xr-x 4 4000 4000 4096 Jun 12  2009 /mdkm1/3
>> drwxr-xr-x 4 4000 4000 4096 Jun 12  2009 /mdkm1/4
>> drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/5
>> drwxr-xr-x 5 4000 4000 4096 Dec 11 02:38 /mdkm1/6
>> drwxr-xr-x 5 4000 4000 4096 Feb  4 22:09 /mdkm1/7
>> drwxr-xr-x 5 4000 4000 4096 Apr  5 13:51 /mdkm1/8
>> drwxr-xr-x 4 4000 4000 4096 Jun 12  2009 /mdkm1/9
>> -rw-r--r-- 1 4000 4000    0 Sep 18  2009 /mdkm1/testfile_mdkm1
>>
>>> And if I understand you correctly, there are 10 file systems mounted
>>> locally on this machine and you're only having trouble accessing file
>>> system "10" when it's mounted via autofs?
>>
>> Ten file systems mounted under /mdkm1/ (of course there are others) and
>> to be clear, /mdkm1/10 will mount 'manually' without problems but will
>> not mount at all via autofs (nothing seen at the NFS server in the logs
>> at all).
>>
>
> I'll presume you've grep'd for 'kernel' and 'nfs' in /var/log/messages
> and the output doesn't suggest a problems with either.

Nothing obvious... it looks like all file systems are mounting at boot
time...

Apr 27 00:06:04 rilkm01 kernel: EXT3 FS on dm-0, internal journal
Apr 27 00:06:04 rilkm01 kernel: EXT3 FS on dm-1, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-2, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-3, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-4, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-5, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-6, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-7, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-8, internal journal
Apr 27 00:06:05 rilkm01 kernel: EXT3 FS on dm-9, internal journal

I DID apply system updates during 'maintenance' after the machine was
taken down by an overzealous user...

Apr 26 23:25:13 rilkm01 yum: Updated: nfs-utils-lib-1.0.8-7.6.el5.x86_64
Apr 26 23:32:30 rilkm01 yum: Updated: 1:nfs-utils-1.0.9-42.el5.x86_64
.
.
.
Apr 27 00:06:10 rilkm01 kernel: Installing knfsd (copyright (C) 1996
okir at monad.swb.de).
Apr 27 00:06:10 rilkm01 kernel: NFSD: Using /var/lib/nfs/v4recovery as
the NFSv4 state recovery directory

...but see last comment.

> And that RPCNFSDCOUNT in
>
>      /etc/sysconfig/nfs
>
> has been changed from it's default value of 8 - if not I would double
> it and restart nfs.

$ cat /etc/init.d/nfs | egrep "RPCNFSDCOUNT"
          [ -z "$RPCNFSDCOUNT" ] && RPCNFSDCOUNT=32
          daemon rpc.nfsd $RPCNFSDARGS $RPCNFSDCOUNT

[root at rilkm01 tcooper-local]# /sbin/service nfs status
rpc.mountd (pid 8002) is running...
nfsd (pid 7999 7998 7997 7996 7995 7994 7993 7992 7991 7990 7989 7988
7987 7986 7985 7984 7983 7982 7981 7980 7979 7978 7977 7976 7975 7974
7973 7972 7971 7970 7969 7968) is running...
rpc.rquotad (pid 7890) is running...

> Then I'd try the following if you haven't already
>
>        /usr/sbin/exportfs -rv

# /usr/sbin/exportfs -rv
exporting 137.110.172.0/25:/mdkm1/10
exporting 137.110.179.192/26:/mdkm1/10
exporting 137.110.172.0/25:/mdkm1/1
exporting 137.110.179.192/26:/mdkm1/1
exporting 137.110.172.0/25:/mdkm1/2
exporting 137.110.179.192/26:/mdkm1/2
exporting 137.110.172.0/25:/mdkm1/3
exporting 137.110.179.192/26:/mdkm1/3
exporting 137.110.172.0/25:/mdkm1/4
exporting 137.110.179.192/26:/mdkm1/4
exporting 137.110.172.0/25:/mdkm1/5
exporting 137.110.179.192/26:/mdkm1/5
exporting 137.110.172.0/25:/mdkm1/6
exporting 137.110.179.192/26:/mdkm1/6
exporting 137.110.172.0/25:/mdkm1/7
exporting 137.110.179.192/26:/mdkm1/7
exporting 137.110.172.0/25:/mdkm1/8
exporting 137.110.179.192/26:/mdkm1/8
exporting 137.110.172.0/25:/mdkm1/9
exporting 137.110.179.192/26:/mdkm1/9

>        /etc/init.d/nfs restart

[root at rilkm01 tcooper-local]# /sbin/service nfs restart
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]

>
> If nothing changes, then look at the file rmtab, etab, xtab (later may
> be empty) in
>
>       /var/lib/nfs

All look 'normal' (compared to another server)

>
> and if the rpc.statd daemon is running, in
>
>       /var/lib/nfs/statd/sm
>
> and see if you spot anything screwy when you access the mounts.

Also looks 'normal' (compared to another server)

>
> And if nothing works, I'd try renaming the filesystem from '10' to
> 'ten' - you could be tripping on a bug..

Attempted with the same fate...

I DO have other servers, same OS, same hardware RAID, same file system 
layout, same exports without any issues automounting /<parent>/10

I'm starting to wonder about a strange parse error on the exports file 
and/or automount map file(s).

Going to look into running automount is a 'debug' mode to see what it's 
output might be.

Trevor