Hello,
MERRY CHRISTMAS to all in list!
After I upgraded to latest: CentOS Linux release 7.7.1908 (Core) I am facing nfs crashes which cause the system to hang frequently.
This is caused by cp to nfs-mounted shares.
Below is dmesg output; you will see call traces. These cause system to gradually overload:
[root@hesperia1 ~]# top top - 10:09:40 up 10:16, 1 user, load average: 53.66, 54.13, 52.98 Tasks: 475 total, 2 running, 436 sleeping, 0 stopped, 37 zombie %Cpu(s): 0.1 us, 0.6 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.1 st KiB Mem : 3879928 total, 813504 free, 1733216 used, 1333208 buff/cache KiB Swap: 4063228 total, 4062708 free, 520 used. 1797264 avail Mem
and finally hangs showing messages (which I have not recorded precisely) in the CLI login screen like "System out of memory". Then I have to reboot.
I tried to downgrade nfs-utils and rpcbind to earlier versions (in case there is a bug in latest ones), but I couldn't:
[root@hesperia1 ~]# yum downgrade rpcbind-0.2.0-47 nfs-utils-1.3.0-61 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: ftp.ntua.gr * epel: mirrors.daticum.com * extras: ftp.ntua.gr * updates: ftp.ntua.gr No package rpcbind-0.2.0-47 available. No package nfs-utils-1.3.0-61 available. Error: Nothing to do
Do you know if this is a known bug? (I couldn't find something in my searches.)
Can you suggest a solution / workaround? I am facing this problem constantly and the system has become unstable.
[root@hesperia1 ~]# dmesg ... [ 26.525584] RPC: Registered named UNIX socket transport module. [ 26.525588] RPC: Registered udp transport module. [ 26.525590] RPC: Registered tcp transport module. [ 26.525592] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 27.103268] type=1305 audit(1577317991.465:3): audit_pid=836 old=0 auid=4294967295 ses=4294967295 res=1 [ 29.384755] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 29.629701] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 30.773604] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 31.977476] FS-Cache: Loaded [ 32.170682] FS-Cache: Netfs 'nfs' registered for caching [ 53.671997] random: crng init done [26399.967630] INFO: task cp:14560 blocked for more than 120 seconds. [26399.967984] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [26399.968202] cp D ffff89373fc9ac80 0 14560 14053 0x00000080 [26399.968211] Call Trace: [26399.968272] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [26399.968321] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [26399.968345] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [26399.968375] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [26399.968395] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [26399.968404] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [26399.968410] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [26399.968421] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [26399.968445] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [26399.968475] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [26399.968481] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [26399.968487] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [26399.968492] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [26399.968497] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [26399.968501] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [26399.968505] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [26399.968530] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [26399.968534] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [26399.968544] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [27479.967352] INFO: task cp:4656 blocked for more than 120 seconds. [27479.967414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [27479.967711] cp D ffff8936b6548000 0 4656 2914 0x00000080 [27479.967726] Call Trace: [27479.967749] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [27479.967762] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [27479.967769] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [27479.967774] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [27479.967786] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [27479.967792] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [27479.967821] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [27479.967828] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [27479.967836] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [27479.967841] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [27479.967846] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [27479.967852] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [27479.967857] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [27479.967862] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [27479.967866] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [27479.967870] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [27479.967880] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [27479.967884] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [27479.967892] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [27479.967910] INFO: task cp:7754 blocked for more than 120 seconds. [27479.968147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [27479.968437] cp D ffff89373fc1ac80 0 7754 7285 0x00000080 [27479.968443] Call Trace: [27479.968455] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [27479.968461] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [27479.968466] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [27479.968470] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [27479.968475] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [27479.968481] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [27479.968486] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [27479.968491] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [27479.968500] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [27479.968505] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [27479.968510] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [27479.968515] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [27479.968520] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [27479.968525] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [27479.968528] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [27479.968533] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [27479.968539] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [27479.968543] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [27479.968547] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [27479.968557] INFO: task cp:20086 blocked for more than 120 seconds. [27479.968798] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [27479.969064] cp D ffff89373fc9ac80 0 20086 19644 0x00000080 [27479.969069] Call Trace: [27479.969080] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [27479.969086] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [27479.969090] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [27479.969094] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [27479.969099] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [27479.969105] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [27479.969110] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [27479.969115] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [27479.969119] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [27479.969124] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [27479.969129] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [27479.969134] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [27479.969139] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [27479.969144] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [27479.969148] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [27479.969152] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [27479.969157] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [27479.969161] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [27479.969166] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [30359.967071] INFO: task cp:4656 blocked for more than 120 seconds. [30359.967120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30359.967345] cp D ffff89373fd9ac80 0 4656 2914 0x00000080 [30359.967369] Call Trace: [30359.967391] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [30359.967403] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [30359.967410] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [30359.967415] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [30359.967421] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [30359.967427] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [30359.967432] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [30359.967438] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [30359.967444] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [30359.967449] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [30359.967454] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [30359.967459] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [30359.967464] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [30359.967469] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [30359.967473] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [30359.967477] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [30359.967486] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [30359.967490] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [30359.967497] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [30359.967502] INFO: task cp:6363 blocked for more than 120 seconds. [30359.967708] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30359.967933] cp D ffff89373fd9ac80 0 6363 6002 0x00000080 [30359.967938] Call Trace: [30359.968064] [<ffffffffc05bc091>] ? rpcauth_lookupcred+0x91/0xd0 [sunrpc] [30359.968071] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [30359.968076] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [30359.968082] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [30359.968088] [<ffffffffaf458fc8>] do_last+0x298/0x1290 [30359.968093] [<ffffffffaf45bdbd>] path_openat+0xcd/0x5a0 [30359.968100] [<ffffffffaf44737a>] ? __check_object_size+0x1ca/0x250 [30359.968121] [<ffffffffaf45d72d>] do_filp_open+0x4d/0xb0 [30359.968129] [<ffffffffaf46b282>] ? __alloc_fd+0xc2/0x170 [30359.968135] [<ffffffffaf449684>] do_sys_open+0x124/0x220 [30359.968140] [<ffffffffaf44979e>] SyS_open+0x1e/0x20 [30359.968145] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [30359.968151] INFO: task cp:9517 blocked for more than 120 seconds. [30359.968414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30359.968649] cp D ffff89373fd9ac80 0 9517 9100 0x00000080 [30359.968656] Call Trace: [30359.968670] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [30359.968677] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [30359.968682] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [30359.968686] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [30359.968691] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [30359.968696] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [30359.968701] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [30359.968706] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [30359.968711] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [30359.968716] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [30359.968721] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [30359.968726] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [30359.968730] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [30359.968735] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [30359.968739] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [30359.968743] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [30359.968749] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [30359.968753] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [30359.968760] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [30359.968767] INFO: task cp:15665 blocked for more than 120 seconds. [30359.969037] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30359.969287] cp D ffff89373fd1ac80 0 15665 15170 0x00000080 [30359.969308] Call Trace: [30359.969319] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [30359.969325] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [30359.969330] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [30359.969334] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [30359.969339] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [30359.969344] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [30359.969348] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [30359.969354] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [30359.969358] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [30359.969363] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [30359.969368] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [30359.969375] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [30359.969380] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [30359.969385] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [30359.969389] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [30359.969393] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [30359.969398] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [30359.969402] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [30359.969407] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [30359.969413] INFO: task cp:21924 blocked for more than 120 seconds. [30359.969657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30359.969902] cp D ffff89373fc9ac80 0 21924 21419 0x00000080 [30359.969907] Call Trace: [30359.969917] [<ffffffffc070ce05>] ? nfs_permission+0x105/0x220 [nfs] [30359.969923] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [30359.969927] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [30359.969931] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [30359.969936] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [30359.969941] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [30359.969945] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [30359.969950] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [30359.969969] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [30359.969974] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [30359.969979] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [30359.969984] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [30359.969988] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [30359.969994] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [30359.970000] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [30359.970004] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [30359.970009] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [30359.970013] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [30359.970018] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a [30359.970024] INFO: task cp:28139 blocked for more than 120 seconds. [30359.970260] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30359.970497] cp D ffff8937380a5780 0 28139 27597 0x00000080 [30359.970504] Call Trace: [30359.970510] [<ffffffffaf504262>] ? security_inode_permission+0x22/0x30 [30359.970514] [<ffffffffaf981929>] schedule_preempt_disabled+0x29/0x70 [30359.970518] [<ffffffffaf97f8b7>] __mutex_lock_slowpath+0xc7/0x1d0 [30359.970523] [<ffffffffaf455ec3>] ? unlazy_walk+0x133/0x140 [30359.970529] [<ffffffffaf97ec8f>] mutex_lock+0x1f/0x2f [30359.970533] [<ffffffffaf977865>] lookup_slow+0x33/0xa7 [30359.970538] [<ffffffffaf45b608>] path_lookupat+0x838/0x8b0 [30359.970542] [<ffffffffaf424ef5>] ? kmem_cache_alloc+0x35/0x1f0 [30359.970547] [<ffffffffaf45c45f>] ? getname_flags+0x4f/0x1a0 [30359.970552] [<ffffffffaf45b6ab>] filename_lookup+0x2b/0xc0 [30359.970557] [<ffffffffaf45d5f7>] user_path_at_empty+0x67/0xc0 [30359.970561] [<ffffffffaf450695>] ? cp_new_stat+0x165/0x1a0 [30359.970566] [<ffffffffaf45d661>] user_path_at+0x11/0x20 [30359.970570] [<ffffffffaf450343>] vfs_fstatat+0x63/0xc0 [30359.970592] [<ffffffffaf4506fe>] SYSC_newstat+0x2e/0x60 [30359.970598] [<ffffffffaf33e3f6>] ? __audit_syscall_exit+0x1e6/0x280 [30359.970602] [<ffffffffaf450bbe>] SyS_newstat+0xe/0x10 [30359.970607] [<ffffffffaf98dede>] system_call_fastpath+0x25/0x2a
Please advise!
Cheers, Nick