[CentOS] NFS4 issue

Mon Nov 23 05:00:40 UTC 2009
Philip Manuel <phil at zomojo.com>


Philip Manuel wrote:
> We are running kernel 2.6.18-164.6.1.el5 with exporting 3 aoe provided 
> ext4 directories. For a couple of weeks we had a small number of users 
> using the system with no issues, today we added 7 users and the system 
> crashed and did not perform correctly since.
>
> Nov 23 10:20:03 sulphur rpc.idmapd[5199]: nfsdcb: id '-2' too big!
> Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: Setting version failed: 
> errno 16 (Device or resource busy)
> Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: unable to bind UPD socket: 
> errno 98 (Address already in use)
> Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy(): 
> cache `nfsd4_files': Can't free all objects
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88645efd>] 
> :nfsd:nfsd4_free_slab+0x11/0x4d
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88645f55>] 
> :nfsd:nfsd4_free_slabs+0x1c/0x33
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88646ecb>] 
> :nfsd:nfs4_state_shutdown+0x17e/0x18a
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630570>] 
> :nfsd:nfsd_last_thread+0x45/0x76
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: BUG: warning at 
> fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G     )
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88645f55>] 
> :nfsd:nfsd4_free_slabs+0x1c/0x33
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88646ecb>] 
> :nfsd:nfs4_state_shutdown+0x17e/0x18a
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630570>] 
> :nfsd:nfsd_last_thread+0x45/0x76
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy(): 
> cache `nfsd4_delegations': Can't free all objects
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88645efd>] 
> :nfsd:nfsd4_free_slab+0x11/0x4d
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88646ecb>] 
> :nfsd:nfs4_state_shutdown+0x17e/0x18a
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630570>] 
> :nfsd:nfsd_last_thread+0x45/0x76
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] 
> :nfsd:nfsd+0x0/0x2cb                   
> Nov 23 10:42:26 sulphur kernel: BUG: warning at 
> fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G     )
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88646ecb>] 
> :nfsd:nfs4_state_shutdown+0x17e/0x18a  
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630570>] 
> :nfsd:nfsd_last_thread+0x45/0x76       
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff88630856>] 
> :nfsd:nfsd+0x2b5/0x2cb                 
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] 
> :nfsd:nfsd+0x0/0x2cb                   
> Nov 23 10:42:26 sulphur kernel:  [<ffffffff886305a1>] 
> :nfsd:nfsd+0x0/0x2cb                   
> Nov 23 10:42:26 sulphur kernel: nfsd: last server has 
> exited                                 
> Nov 23 10:42:26 sulphur kernel: nfsd: unexporting all 
> filesystems                            
> Nov 23 10:42:44 sulphur kernel: kmem_cache_create: duplicate cache 
> nfsd4_files               
> Nov 23 10:42:44 sulphur kernel:  [<ffffffff88646f29>] 
> :nfsd:nfs4_state_start+0x52/0x18f      
> Nov 23 10:42:44 sulphur kernel:  [<ffffffff886303ae>] 
> :nfsd:nfsd_svc+0x6c/0x1e9              
> Nov 23 10:42:44 sulphur kernel:  [<ffffffff88630f8e>] 
> :nfsd:write_threads+0x0/0xa9           
> Nov 23 10:42:44 sulphur kernel:  [<ffffffff88630ffd>] 
> :nfsd:write_threads+0x6f/0xa9          
> Nov 23 10:42:44 sulphur kernel:  [<ffffffff88630f8e>] 
> :nfsd:write_threads+0x0/0xa9           
> Nov 23 10:42:44 sulphur kernel:  [<ffffffff88630d59>] 
> :nfsd:nfsctl_transaction_write+0x42/0x77Nov 23 10:42:44 sulphur 
> nfsd[27369]: nfssvc: Cannot allocate memory                          
> Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: Setting version failed: 
> errno 16 (Device or resource 
> busy)                                                                                     
>
> Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: unable to bind UPD socket: 
> errno 98 (Address already in use)
>
> So above shows the original problem and then me restarting it and 
> eventually I had to reboot the server.  Since then it has been behaving 
> bizarrely with it running for 5 mins and then stopping, upon a restart 
> it will run for a while and then stop.
> Nov 23 11:04:46 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as 
> the NFSv4 state recovery directory
> Nov 23 11:17:02 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:29:01 sulphur kernel: nfsd: last server has exited
> Nov 23 11:29:01 sulphur kernel: nfsd: unexporting all filesystems
> Nov 23 11:29:08 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as 
> the NFSv4 state recovery directory
> Nov 23 11:29:08 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:32:03 sulphur kernel: nfsd: last server has exited
> Nov 23 11:32:03 sulphur kernel: nfsd: unexporting all filesystems
> Nov 23 11:32:34 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as 
> the NFSv4 state recovery directory
> Nov 23 11:32:34 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:41:58 sulphur kernel: nfsd: last server has exited
> Nov 23 11:41:58 sulphur kernel: nfsd: unexporting all filesystems
> Nov 23 11:42:03 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as 
> the NFSv4 state recovery directory
> Nov 23 11:42:03 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big!
> Nov 23 11:47:20 sulphur kernel: nfsd: last server has exited
> Nov 23 11:47:20 sulphur kernel: nfsd: unexporting all filesystems
>
> I haven't found a report of an issues for the "nfsdcb: id '-2' too 
> big!"  message but equally I don't know what it means either.
>
> On the console we are seeing loads of these messages:-
>
> kernel: NFSD: preprocess_seqid_op: magic stateid!
>
> Again I don't know what this means or the implications of this message.
>
> Any suggestions would be welcome.
>
> At the moment we are up with two users migrated back to the old servers.
>
> Thanks
>
> Phil.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>   

Just a quick update, 4 hours later the message "

kernel: NFSD: preprocess_seqid_op: magic stateid!" has stopped, now to why ?

Thanks