[CentOS] Mount/automount fails with krb5-enabled nfs4

Thu Oct 21 13:09:14 UTC 2010
Hans Persson <hans at ifm.liu.se>

I have a problem that is driving me crazy. Our nfs server is running
Solaris. Most clients mount directories from it with no problems, but
not all. All clients that have problems run CentOS (5.4 and 5.5). I've
found one or two of each version that fail, but also a couple of each
version that work.

The mounting is done for user home directories via autofs but that
doesn't seem to make any difference, the same problem appears when
trying to mount manually. Kerberos is used for authentication.

When I try to mount a directory manually I get this:

    # mount -vvvv -t nfs4 -o sec=krb5 \
        triangulum.ifm.liu.se:/export/users/hans /mnt
    mount: pinging: prog 100003 vers 4 prot tcp port 2049
    mount.nfs4: Permission denied

I get this in /var/log/messages:

    Oct 15 15:15:12 pc13287 rpc.gssd[2780]: rpcsec_gss: 
        gss_init_sec_context: (major) Unspecified GSS failure. 
        Minor code may provide more information - (minor) Unknown 
        code krb5 60 
    Oct 15 15:15:12 pc13287 rpc.gssd[2780]: WARNING: Failed to create
        krb5 context for user with uid 0 with any credentials cache for
        server triangulum.ifm.liu.se 

The machines that can mount the disk differ slightly in what they log.
Some log nothing, others this:

    Oct 19 13:26:01 pc14113 rpc.gssd[2793]: ERROR: GSS-API: error in
        gss_acquire_cred(): Unspecified GSS failure. Minor code may 
        provide more information - Unknown code krb5 195 
    Oct 19 13:26:01 pc14113 rpc.gssd[2793]: WARNING: Failed to create
        krb5 context for user with uid 121 for server 
        triangulum.ifm.liu.se 

Note that there is still an error logged in the first line, but a
different one. In the second line, the uid if the user changes from 0
(I'm logged in as root when doing both tests) to 121 (which is the uid
of the user owning the home directory I'm trying to mount in both
cases). Perhaps this is a clue, but I don't know what it tries to tell
me.

I can't find any relevant differences in configuration. I've gone
through files in /etc on a working and a non-working machine looking for
changes but not finding anything relevant in /etc/sysconfig/nfs, 
/etc/hosts, /etc/idmapd.conf, /etc/krb5.conf, /etc/host.conf, 
/etc/nsswitch.conf, /etc/resolv.conf and others.

SELinux is not running.

This is what the keytab looks like on both working and non-working
machines:

    # klist -k -e
    Keytab name: FILE:/etc/krb5.keytab
    KVNO Principal
    ---- --------------------------------------------------------------------------
       3 host/pc13287.ad.ifm.liu.se at IFM.LIU.SE (DES cbc mode with RSA-MD5) 
       3 nfs/pc13287.ad.ifm.liu.se at IFM.LIU.SE (DES cbc mode with RSA-MD5) 

I have an yp master and an yp slave, but there are both working and
non-working clients connected to both of them.

There is plenty of space in /tmp and it is writable by all.

Among the total set of clients there are multiple versions of nfs-utils
and kernel used, but I can pick a set of one working and one non-working
client that have the same versions for both (nfs-utils-1.0.9-47.el5_5
and kernel-2.6.18-194.17.1.el5) so that doesn't appear to be the
problem. I've tried yum reinstall for the nfs package to no effect. That
doesn't work for the kernel package, but I've compared the md5 sums for
the gss modules between a working and a non-working machine and found no
differences.

Obviously, I need to check something else, but what? Please help!

Hans