What is the best approach when an nfs mount hangs on a client but the server is OK? I have mount options of: rw,bg,soft,intr,rsize=32768,wsize=32768 but whatever it did was not interruptable and would not shut down.
There were some: Oct 15 09:08:32 dev-ngf-l-01 kernel: INFO: task gnome-settings-:19169 blocked for more than 120 seconds. Oct 15 09:08:32 dev-ngf-l-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
messages on the console and /var/log/messages.
Is this a bug or there a way to avoid it?
Did you also check /var/log/messages on the nfs server side ?
I had some NFS troubles with lockd some times ago and it was a firewall problem on the client:
Try: - log on the NFS server and check in /var/log/messages which client is responsible for the problem (it could be an other one than your client). - on this client stop iptables (service iptables stop) and check if the problem still exist.
In my configs, client iptables fully trust my NFS server.
Patrick
On Wed, Oct 16, 2013 at 7:05 AM, Patrick Begou Patrick.Begou@legi.grenoble-inp.fr wrote:
Did you also check /var/log/messages on the nfs server side ?
I had some NFS troubles with lockd some times ago and it was a firewall problem on the client:
No jumbo frames, no firewalling, no server side issues. This is a lab setup with on server holding home directories and about 10 other hosts and VMs mounting it as /home. There is heavy network testing on some of the servers but the NFS connection runs over a different interface/subnet. I think the issue is triggered by a user running NX/freenx sessions on multiple hosts and something gnome is trying to lock in the common home directory, but regardless it is a kernel hang to the point that I had to pull the plug to get the machine to shut down. And now one user (perhaps the only one with Gnome sessions on multiple hosts) has things hanging again - even an ssh login by this users hangs with this in the logs:
Oct 16 09:24:25 dev-l-01 kernel: INFO: task bash:20785 blocked for more than 120 seconds. Oct 16 09:24:25 dev-l-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 16 09:24:25 dev--01 kernel: bash D 0000000000000008 0 20785 20784 0x00000080 Oct 16 09:24:25 dev-l-01 kernel: ffff882066ecfba8 0000000000000082 0000000000000000 ffff881064998740 Oct 16 09:24:25 dev-l-01 kernel: ffff882066ecfb28 ffffffff8119b30a ffff881065d12200 ffff881064998740 Oct 16 09:24:25 dev-l-01 kernel: ffff8820665ef058 ffff882066ecffd8 000000000000fb88 ffff8820665ef058 Oct 16 09:24:25 dev-l-01 kernel: Call Trace: Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff8119b30a>] ? dput+0x9a/0x150 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff8150f78e>] __mutex_lock_slowpath+0x13e/0x180 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff8150f62b>] mutex_lock+0x2b/0x50 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff811907ab>] do_lookup+0x11b/0x230 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff81190ff4>] __link_path_walk+0x734/0x1030 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff81143867>] ? handle_pte_fault+0xf7/0xb50 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff81191b7a>] path_walk+0x6a/0xe0 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff81191d4b>] do_path_lookup+0x5b/0xa0 Oct 16 09:24:25 dev--l-01 kernel: [<ffffffff811929d7>] user_path_at+0x57/0xa0 Oct 16 09:24:25 dev--l-01 kernel: [<ffffffff81186d8c>] vfs_fstatat+0x3c/0x80 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff81186efb>] vfs_stat+0x1b/0x20 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff81186f24>] sys_newstat+0x24/0x50 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff810dc937>] ? audit_syscall_entry+0x1d7/0x200 Oct 16 09:24:25 dev--l-01 kernel: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 Oct 16 09:24:25 dev-l-01 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b