[Arm-dev] Weird issue with NFSv4

Tue Mar 19 13:54:44 UTC 2019
Stephan Guilloux <stephan.guilloux at crisalid.com>


Le 19/03/2019 à 11:57, Pablo Sebastián Greco a écrit :
>
> El 19/3/19 a las 07:47, Stephan Guilloux escribió:
>> Hi there.
>>
>> Strange issue with NFSv4 this morning.
>> RPI3b+ with kernel 103 and fresh yum update, set up as NFSv4 client.
>>
>> Trying to copy a whole folder with a few GB (many .rpm & .tgz and a 
>> few .img), from NFS mounted source to local filesystem.
>>     (cd /shared.nfs ; tar c) | (cd /local ; tar x)
>>
>> After a while ...
>> - PING to the RPI works.
>> - SSH sessions got frozen and login is impossible, even with USB 
>> keyboard on the RPI.
>> - Sometimes, I do see some kernel backtrace on the VGA, but not always.
>> As I have to switch off the RPI, there is nothing left in 
>> /var/log/message when system is back.
>>
>> If I do this copy with full SSH, the copy works perfectly well.
>>     (ssh xxx "cd /xxx ; tar c") | (cd /local ; tar x)
>>
>> I have the feeling that the system hangs often on the same huge file, 
>> but when I do copy it manually, even in a loop ... nothin happens.
>>
>> Any idea ?
>>
>>
> Stephan, was 4.14.101 your last "good" kernel?
> We also have 4.14.103 and yesterday 4.14.106 was released.
In case of interest, I'm running full Ethernet.
No other USB device than a USB keyboard.


Tests on 4.14.106 gave the same... but I succeeded to have a backtrace 
(see below)
During this operation, the SSH sessions hanged for a few seconds, then, 
restarted again and so on, till this backtrace.
This is why/when/how I was able to get this backtrace (while [ true ] ; 
do scp root at myrpi:/var/log/messages . ; done )
:-)
Now, system is stuck.

Mar 19 14:24:41 myrpi-8ac500 systemd-logind: Removed session 9.
Mar 19 14:24:42 myrpi-8ac500 kernel: rpi_firmware_get_throttled: 5 
callbacks suppressed
Mar 19 14:24:42 myrpi-8ac500 kernel: Under-voltage detected! (0x00050005)
    << I noticed this 9s delay, here, but logs are not cut.
Mar 19 14:24:51 myrpi-8ac500 kernel: INFO: rcu_sched self-detected stall 
on CPU
Mar 19 14:24:51 myrpi-8ac500 kernel: #0112-...: (2099 ticks this GP) 
idle=6b6/140000000000001/0 softirq=7460/7460 fqs=1049
Mar 19 14:24:51 myrpi-8ac500 kernel: #011 (t=2100 jiffies g=3477 c=3476 
q=4847)
Mar 19 14:24:51 myrpi-8ac500 kernel: NMI backtrace for cpu 2
Mar 19 14:24:51 myrpi-8ac500 kernel: CPU: 2 PID: 1868 Comm: 
192.168.57.231- Not tainted 4.14.106-v7.1.el7 #1
Mar 19 14:24:51 myrpi-8ac500 kernel: Hardware name: BCM2835
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8010fec4>] (unwind_backtrace) 
from [<8010c120>] (show_stack+0x20/0x24)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8010c120>] (show_stack) from 
[<807c60b8>] (dump_stack+0xd4/0x118)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<807c60b8>] (dump_stack) from 
[<807cbecc>] (nmi_cpu_backtrace+0xc0/0xc4)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<807cbecc>] (nmi_cpu_backtrace) 
from [<807cbfb8>] (nmi_trigger_cpumask_backtrace+0xe8/0x124)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<807cbfb8>] 
(nmi_trigger_cpumask_backtrace) from [<8010e558>] 
(arch_trigger_cpumask_backtrace+0x20/0x24)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8010e558>] 
(arch_trigger_cpumask_backtrace) from [<80186240>] 
(rcu_dump_cpu_stacks+0xb4/0xe4)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<80186240>] (rcu_dump_cpu_stacks) 
from [<80185b18>] (rcu_check_callbacks+0x894/0xa3c)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<80185b18>] (rcu_check_callbacks) 
from [<8018bbf0>] (update_process_times+0x40/0x6c)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8018bbf0>] (update_process_times) 
from [<8019d628>] (tick_sched_handle+0x64/0x70)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8019d628>] (tick_sched_handle) 
from [<8019d85c>] (tick_sched_timer+0x50/0xa0)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8019d85c>] (tick_sched_timer) 
from [<8018c57c>] (__hrtimer_run_queues+0x160/0x30c)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8018c57c>] (__hrtimer_run_queues) 
from [<8018cf38>] (hrtimer_interrupt+0xc8/0x218)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8018cf38>] (hrtimer_interrupt) 
from [<806708bc>] (arch_timer_handler_phys+0x38/0x40)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<806708bc>] 
(arch_timer_handler_phys) from [<8017ac4c>] 
(handle_percpu_devid_irq+0x88/0x228)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8017ac4c>] 
(handle_percpu_devid_irq) from [<801755dc>] (generic_handle_irq+0x34/0x44)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<801755dc>] (generic_handle_irq) 
from [<80175c28>] (__handle_domain_irq+0x6c/0xc0)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<80175c28>] (__handle_domain_irq) 
from [<80101504>] (bcm2836_arm_irqchip_handle_irq+0xa8/0xac)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<80101504>] 
(bcm2836_arm_irqchip_handle_irq) from [<807e143c>] (__irq_svc+0x5c/0x7c)
Mar 19 14:24:51 myrpi-8ac500 kernel: Exception stack(0xadf69df8 to 
0xadf69e40)
Mar 19 14:24:51 myrpi-8ac500 kernel: 
9de0:                                                       b49addbc 
00000000
Mar 19 14:24:51 myrpi-8ac500 kernel: 9e00: 1c961c95 1c951c95 b49add80 
b2489d98 b2489c00 b4b1c8a8 b4b1c800 b4b1c8a8
Mar 19 14:24:51 myrpi-8ac500 kernel: 9e20: b4b1c800 adf69e54 adf69e58 
adf69e48 803e0f50 807e0c3c 20000013 ffffffff
Mar 19 14:24:51 myrpi-8ac500 kernel: [<807e143c>] (__irq_svc) from 
[<807e0c3c>] (_raw_spin_lock+0x24/0x54)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<807e0c3c>] (_raw_spin_lock) from 
[<803e0f50>] (nfs_delegation_grab_inode+0x20/0x4c)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<803e0f50>] 
(nfs_delegation_grab_inode) from [<803e2000>] 
(nfs_delegation_reap_unclaimed+0x7c/0xd8)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<803e2000>] 
(nfs_delegation_reap_unclaimed) from [<803dcfc4>] 
(nfs4_state_clear_reclaim_reboot+0x10c/0x114)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<803dcfc4>] 
(nfs4_state_clear_reclaim_reboot) from [<803dd7c0>] 
(nfs4_recovery_handle_error+0x58/0x148)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<803dd7c0>] 
(nfs4_recovery_handle_error) from [<803de9e4>] (nfs4_do_reclaim+0x738/0x7b4)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<803de9e4>] (nfs4_do_reclaim) from 
[<803dee84>] (nfs4_run_state_manager+0x424/0x768)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<803dee84>] 
(nfs4_run_state_manager) from [<8013dccc>] (kthread+0x13c/0x16c)
Mar 19 14:24:51 myrpi-8ac500 kernel: [<8013dccc>] (kthread) from 
[<8010804c>] (ret_from_fork+0x14/0x28)
Mar 19 14:24:52 myrpi-8ac500 systemd-logind: New session 10 of user root.
Mar 19 14:24:52 myrpi-8ac500 systemd: Started Session 10 of user root.
Mar 19 14:24:52 myrpi-8ac500 kernel: Under-voltage detected! (0x00050005)
Mar 19 14:24:52 myrpi-8ac500 systemd-logind: Removed session 10.

After this, if I do the same again with "SSH tar", instead of NFS 
access, the copy works fine ...

Hmmm...



>
> Pablo
> _______________________________________________
> Arm-dev mailing list
> Arm-dev at centos.org
> https://lists.centos.org/mailman/listinfo/arm-dev