Philip Manuel wrote: > We are running kernel 2.6.18-164.6.1.el5 with exporting 3 aoe provided > ext4 directories. For a couple of weeks we had a small number of users > using the system with no issues, today we added 7 users and the system > crashed and did not perform correctly since. > > Nov 23 10:20:03 sulphur rpc.idmapd[5199]: nfsdcb: id '-2' too big! > Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: Setting version failed: > errno 16 (Device or resource busy) > Nov 23 10:42:25 sulphur nfsd[27306]: nfssvc: unable to bind UPD socket: > errno 98 (Address already in use) > Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy(): > cache `nfsd4_files': Can't free all objects > Nov 23 10:42:26 sulphur kernel: [<ffffffff88645efd>] > :nfsd:nfsd4_free_slab+0x11/0x4d > Nov 23 10:42:26 sulphur kernel: [<ffffffff88645f55>] > :nfsd:nfsd4_free_slabs+0x1c/0x33 > Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] > :nfsd:nfs4_state_shutdown+0x17e/0x18a > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] > :nfsd:nfsd_last_thread+0x45/0x76 > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: BUG: warning at > fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G ) > Nov 23 10:42:26 sulphur kernel: [<ffffffff88645f55>] > :nfsd:nfsd4_free_slabs+0x1c/0x33 > Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] > :nfsd:nfs4_state_shutdown+0x17e/0x18a > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] > :nfsd:nfsd_last_thread+0x45/0x76 > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: slab error in kmem_cache_destroy(): > cache `nfsd4_delegations': Can't free all objects > Nov 23 10:42:26 sulphur kernel: [<ffffffff88645efd>] > :nfsd:nfsd4_free_slab+0x11/0x4d > Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] > :nfsd:nfs4_state_shutdown+0x17e/0x18a > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] > :nfsd:nfsd_last_thread+0x45/0x76 > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] :nfsd:nfsd+0x2b5/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] > :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: BUG: warning at > fs/nfsd/nfs4state.c:1016/nfsd4_free_slab() (Tainted: G ) > Nov 23 10:42:26 sulphur kernel: [<ffffffff88646ecb>] > :nfsd:nfs4_state_shutdown+0x17e/0x18a > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630570>] > :nfsd:nfsd_last_thread+0x45/0x76 > Nov 23 10:42:26 sulphur kernel: [<ffffffff88630856>] > :nfsd:nfsd+0x2b5/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] > :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: [<ffffffff886305a1>] > :nfsd:nfsd+0x0/0x2cb > Nov 23 10:42:26 sulphur kernel: nfsd: last server has > exited > Nov 23 10:42:26 sulphur kernel: nfsd: unexporting all > filesystems > Nov 23 10:42:44 sulphur kernel: kmem_cache_create: duplicate cache > nfsd4_files > Nov 23 10:42:44 sulphur kernel: [<ffffffff88646f29>] > :nfsd:nfs4_state_start+0x52/0x18f > Nov 23 10:42:44 sulphur kernel: [<ffffffff886303ae>] > :nfsd:nfsd_svc+0x6c/0x1e9 > Nov 23 10:42:44 sulphur kernel: [<ffffffff88630f8e>] > :nfsd:write_threads+0x0/0xa9 > Nov 23 10:42:44 sulphur kernel: [<ffffffff88630ffd>] > :nfsd:write_threads+0x6f/0xa9 > Nov 23 10:42:44 sulphur kernel: [<ffffffff88630f8e>] > :nfsd:write_threads+0x0/0xa9 > Nov 23 10:42:44 sulphur kernel: [<ffffffff88630d59>] > :nfsd:nfsctl_transaction_write+0x42/0x77Nov 23 10:42:44 sulphur > nfsd[27369]: nfssvc: Cannot allocate memory > Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: Setting version failed: > errno 16 (Device or resource > busy) > > Nov 23 10:43:55 sulphur nfsd[27495]: nfssvc: unable to bind UPD socket: > errno 98 (Address already in use) > > So above shows the original problem and then me restarting it and > eventually I had to reboot the server. Since then it has been behaving > bizarrely with it running for 5 mins and then stopping, upon a restart > it will run for a while and then stop. > Nov 23 11:04:46 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as > the NFSv4 state recovery directory > Nov 23 11:17:02 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! > Nov 23 11:29:01 sulphur kernel: nfsd: last server has exited > Nov 23 11:29:01 sulphur kernel: nfsd: unexporting all filesystems > Nov 23 11:29:08 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as > the NFSv4 state recovery directory > Nov 23 11:29:08 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! > Nov 23 11:32:03 sulphur kernel: nfsd: last server has exited > Nov 23 11:32:03 sulphur kernel: nfsd: unexporting all filesystems > Nov 23 11:32:34 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as > the NFSv4 state recovery directory > Nov 23 11:32:34 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! > Nov 23 11:41:58 sulphur kernel: nfsd: last server has exited > Nov 23 11:41:58 sulphur kernel: nfsd: unexporting all filesystems > Nov 23 11:42:03 sulphur kernel: NFSD: Using /var/lib/nfs/v4recovery as > the NFSv4 state recovery directory > Nov 23 11:42:03 sulphur rpc.idmapd[8178]: nfsdcb: id '-2' too big! > Nov 23 11:47:20 sulphur kernel: nfsd: last server has exited > Nov 23 11:47:20 sulphur kernel: nfsd: unexporting all filesystems > > I haven't found a report of an issues for the "nfsdcb: id '-2' too > big!" message but equally I don't know what it means either. > > On the console we are seeing loads of these messages:- > > kernel: NFSD: preprocess_seqid_op: magic stateid! > > Again I don't know what this means or the implications of this message. > > Any suggestions would be welcome. > > At the moment we are up with two users migrated back to the old servers. > > Thanks > > Phil. > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > Just a quick update, 4 hours later the message " kernel: NFSD: preprocess_seqid_op: magic stateid!" has stopped, now to why ? Thanks