[CentOS] nfs causes Centos 7.7 system to hang

Nikolaos Milas

nmilas at noa.gr
Thu Dec 26 10:15:41 UTC 2019


Hello,

MERRY CHRISTMAS to all in list!

After I upgraded to latest: CentOS Linux release 7.7.1908 (Core) I am 
facing nfs crashes which cause the system to hang frequently.

This is caused by cp to nfs-mounted shares.

Below is dmesg output; you will see call traces. These cause system to 
gradually overload:

[root at hesperia1 ~]# top
top - 10:09:40 up 10:16,  1 user,  load average: 53.66, 54.13, 52.98
Tasks: 475 total,   2 running, 436 sleeping,   0 stopped,  37 zombie
%Cpu(s):  0.1 us,  0.6 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi, 0.0 si,  
0.1 st
KiB Mem :  3879928 total,   813504 free,  1733216 used,  1333208 buff/cache
KiB Swap:  4063228 total,  4062708 free,      520 used.  1797264 avail Mem

and finally hangs showing messages (which I have not recorded precisely) 
in the CLI login screen like "System out of memory". Then I have to reboot.

I tried to downgrade nfs-utils and rpcbind to earlier versions (in case 
there is a bug in latest ones), but I couldn't:

    [root at hesperia1 ~]# yum downgrade rpcbind-0.2.0-47 nfs-utils-1.3.0-61
    Loaded plugins: fastestmirror
    Loading mirror speeds from cached hostfile
      * base: ftp.ntua.gr
      * epel: mirrors.daticum.com
      * extras: ftp.ntua.gr
      * updates: ftp.ntua.gr
    No package rpcbind-0.2.0-47 available.
    No package nfs-utils-1.3.0-61 available.
    Error: Nothing to do

Do you know if this is a known bug? (I couldn't find something in my 
searches.)

Can you suggest a solution / workaround? I am facing this problem 
constantly and the system has become unstable.

[root at hesperia1 ~]# dmesg
...
[   26.525584] RPC: Registered named UNIX socket transport module.
[   26.525588] RPC: Registered udp transport module.
[   26.525590] RPC: Registered tcp transport module.
[   26.525592] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   27.103268] type=1305 audit(1577317991.465:3): audit_pid=836 old=0 
auid=4294967295 ses=4294967295 res=1
[   29.384755] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   29.629701] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   30.773604] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   31.977476] FS-Cache: Loaded
[   32.170682] FS-Cache: Netfs 'nfs' registered for caching
[   53.671997] random: crng init done
[26399.967630] INFO: task cp:14560 blocked for more than 120 seconds.
[26399.967984] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[26399.968202] cp              D ffff89373fc9ac80     0 14560 14053 
0x00000080
[26399.968211] Call Trace:
[26399.968272]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[26399.968321]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[26399.968345]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[26399.968375]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[26399.968395]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[26399.968404]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[26399.968410]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[26399.968421]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[26399.968445]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[26399.968475]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[26399.968481]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[26399.968487]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[26399.968492]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[26399.968497]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[26399.968501]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[26399.968505]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[26399.968530]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[26399.968534]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[26399.968544]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[27479.967352] INFO: task cp:4656 blocked for more than 120 seconds.
[27479.967414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[27479.967711] cp              D ffff8936b6548000     0  4656 2914 
0x00000080
[27479.967726] Call Trace:
[27479.967749]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[27479.967762]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[27479.967769]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[27479.967774]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[27479.967786]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[27479.967792]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[27479.967821]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[27479.967828]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[27479.967836]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[27479.967841]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[27479.967846]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[27479.967852]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[27479.967857]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[27479.967862]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[27479.967866]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[27479.967870]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[27479.967880]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[27479.967884]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[27479.967892]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[27479.967910] INFO: task cp:7754 blocked for more than 120 seconds.
[27479.968147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[27479.968437] cp              D ffff89373fc1ac80     0  7754 7285 
0x00000080
[27479.968443] Call Trace:
[27479.968455]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[27479.968461]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[27479.968466]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[27479.968470]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[27479.968475]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[27479.968481]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[27479.968486]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[27479.968491]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[27479.968500]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[27479.968505]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[27479.968510]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[27479.968515]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[27479.968520]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[27479.968525]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[27479.968528]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[27479.968533]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[27479.968539]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[27479.968543]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[27479.968547]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[27479.968557] INFO: task cp:20086 blocked for more than 120 seconds.
[27479.968798] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[27479.969064] cp              D ffff89373fc9ac80     0 20086 19644 
0x00000080
[27479.969069] Call Trace:
[27479.969080]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[27479.969086]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[27479.969090]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[27479.969094]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[27479.969099]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[27479.969105]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[27479.969110]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[27479.969115]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[27479.969119]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[27479.969124]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[27479.969129]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[27479.969134]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[27479.969139]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[27479.969144]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[27479.969148]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[27479.969152]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[27479.969157]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[27479.969161]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[27479.969166]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[30359.967071] INFO: task cp:4656 blocked for more than 120 seconds.
[30359.967120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[30359.967345] cp              D ffff89373fd9ac80     0  4656 2914 
0x00000080
[30359.967369] Call Trace:
[30359.967391]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[30359.967403]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[30359.967410]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[30359.967415]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[30359.967421]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[30359.967427]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[30359.967432]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[30359.967438]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[30359.967444]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[30359.967449]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[30359.967454]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[30359.967459]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[30359.967464]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[30359.967469]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[30359.967473]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[30359.967477]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[30359.967486]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[30359.967490]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[30359.967497]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[30359.967502] INFO: task cp:6363 blocked for more than 120 seconds.
[30359.967708] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[30359.967933] cp              D ffff89373fd9ac80     0  6363 6002 
0x00000080
[30359.967938] Call Trace:
[30359.968064]  [<ffffffffc05bc091>] ? rpcauth_lookupcred+0x91/0xd0 [sunrpc]
[30359.968071]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[30359.968076]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[30359.968082]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[30359.968088]  [<ffffffffaf458fc8>] do_last+0x298/0x1290
[30359.968093]  [<ffffffffaf45bdbd>] path_openat+0xcd/0x5a0
[30359.968100]  [<ffffffffaf44737a>] ? __check_object_size+0x1ca/0x250
[30359.968121]  [<ffffffffaf45d72d>] do_filp_open+0x4d/0xb0
[30359.968129]  [<ffffffffaf46b282>] ? __alloc_fd+0xc2/0x170
[30359.968135]  [<ffffffffaf449684>] do_sys_open+0x124/0x220
[30359.968140]  [<ffffffffaf44979e>] SyS_open+0x1e/0x20
[30359.968145]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[30359.968151] INFO: task cp:9517 blocked for more than 120 seconds.
[30359.968414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[30359.968649] cp              D ffff89373fd9ac80     0  9517 9100 
0x00000080
[30359.968656] Call Trace:
[30359.968670]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[30359.968677]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[30359.968682]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[30359.968686]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[30359.968691]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[30359.968696]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[30359.968701]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[30359.968706]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[30359.968711]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[30359.968716]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[30359.968721]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[30359.968726]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[30359.968730]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[30359.968735]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[30359.968739]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[30359.968743]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[30359.968749]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[30359.968753]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[30359.968760]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[30359.968767] INFO: task cp:15665 blocked for more than 120 seconds.
[30359.969037] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[30359.969287] cp              D ffff89373fd1ac80     0 15665 15170 
0x00000080
[30359.969308] Call Trace:
[30359.969319]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[30359.969325]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[30359.969330]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[30359.969334]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[30359.969339]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[30359.969344]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[30359.969348]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[30359.969354]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[30359.969358]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[30359.969363]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[30359.969368]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[30359.969375]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[30359.969380]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[30359.969385]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[30359.969389]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[30359.969393]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[30359.969398]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[30359.969402]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[30359.969407]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[30359.969413] INFO: task cp:21924 blocked for more than 120 seconds.
[30359.969657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[30359.969902] cp              D ffff89373fc9ac80     0 21924 21419 
0x00000080
[30359.969907] Call Trace:
[30359.969917]  [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs]
[30359.969923]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[30359.969927]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[30359.969931]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[30359.969936]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[30359.969941]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[30359.969945]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[30359.969950]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[30359.969969]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[30359.969974]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[30359.969979]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[30359.969984]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[30359.969988]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[30359.969994]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[30359.970000]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[30359.970004]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[30359.970009]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[30359.970013]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[30359.970018]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
[30359.970024] INFO: task cp:28139 blocked for more than 120 seconds.
[30359.970260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[30359.970497] cp              D ffff8937380a5780     0 28139 27597 
0x00000080
[30359.970504] Call Trace:
[30359.970510]  [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30
[30359.970514]  [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70
[30359.970518]  [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0
[30359.970523]  [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140
[30359.970529]  [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f
[30359.970533]  [<ffffffffaf977865>] lookup_slow+0x33/0xa7
[30359.970538]  [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0
[30359.970542]  [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0
[30359.970547]  [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0
[30359.970552]  [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0
[30359.970557]  [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0
[30359.970561]  [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0
[30359.970566]  [<ffffffffaf45d661>] user_path_at+0x11/0x20
[30359.970570]  [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0
[30359.970592]  [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60
[30359.970598]  [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280
[30359.970602]  [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10
[30359.970607]  [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a

Please advise!

Cheers,
Nick




More information about the CentOS mailing list