Re: [CentOS] CentOS7 and NFS

28 Aug 2020


      Hello,
I'm back with these NFS problems....
Server and client have been updated but it still rise time to time.
server is: Linux robin.legi.grenoble-inp.fr 3.10.0-1127.18.2.el7.x86_64
#1 SMP Sun Jul 26 15:27:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
client is :  Linux grivola.legi.grenoble-inp.fr
3.10.0-1127.18.2.el7.x86_64 #1 SMP Sun Jul 26 15:27:06 UTC 2020 x86_64
x86_64 x86_64 GNU/Linux
CentOS Linux release 7.8.2003 (Core) each.
It seams related to an scp session: the NFS client downloads a large
data set from a remote server and store the files on it's NFS file system.
On the client I have such messages in /var/log/messages:
Aug 28 10:03:08 grivola kernel: INFO: task scp:78495 blocked for
    more than 120 seconds.
    Aug 28 10:03:08 grivola kernel: "echo 0 >
    /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Aug 28 10:03:08 grivola kernel: scp             D
    ffff97e37fa9acc0     0 78495 147369 0x00000084
    Aug 28 10:03:08 grivola kernel: Call Trace:
    Aug 28 10:03:08 grivola kernel: [<ffffffff92783ef0>] ?
    bit_wait+0x50/0x50
    Aug 28 10:03:08 grivola kernel: [<ffffffff92785da9>] schedule+0x29/0x70
    Aug 28 10:03:08 grivola kernel: [<ffffffff927838b1>]
    schedule_timeout+0x221/0x2d0
    Aug 28 10:03:08 grivola kernel: [<ffffffffc132e7e6>] ?
    rpc_run_task+0xf6/0x150 [sunrpc]
    Aug 28 10:03:08 grivola kernel: [<ffffffffc133d850>] ?
    rpc_put_task+0x10/0x20 [sunrpc]
    Aug 28 10:03:08 grivola kernel: [<ffffffff92783ef0>] ?
    bit_wait+0x50/0x50
    Aug 28 10:03:08 grivola kernel: [<ffffffff9278549d>]
    io_schedule_timeout+0xad/0x130
    Aug 28 10:03:08 grivola kernel: [<ffffffff92785538>]
    io_schedule+0x18/0x20
    Aug 28 10:03:08 grivola kernel: [<ffffffff92783f01>]
    bit_wait_io+0x11/0x50
    Aug 28 10:03:08 grivola kernel: [<ffffffff92783a27>]
    __wait_on_bit+0x67/0x90
    Aug 28 10:03:08 grivola kernel: [<ffffffff921bd741>]
    wait_on_page_bit+0x81/0xa0
    Aug 28 10:03:08 grivola kernel: [<ffffffff920c7840>] ?
    wake_bit_function+0x40/0x40
    Aug 28 10:03:08 grivola kernel: [<ffffffff921bd871>]
    __filemap_fdatawait_range+0x111/0x190
    Aug 28 10:03:08 grivola kernel: [<ffffffff921bd904>]
    filemap_fdatawait_range+0x14/0x30
    Aug 28 10:03:08 grivola kernel: [<ffffffff921bd947>]
    filemap_fdatawait+0x27/0x30
    Aug 28 10:03:08 grivola kernel: [<ffffffff921bfd1c>]
    filemap_write_and_wait+0x4c/0x80
    Aug 28 10:03:08 grivola kernel: [<ffffffffc097ddd0>]
    nfs_wb_all+0x20/0x100 [nfs]
    Aug 28 10:03:08 grivola kernel: [<ffffffffc09700e0>]
    nfs_setattr+0x1f0/0x210 [nfs]
    Aug 28 10:03:08 grivola kernel: [<ffffffff9226cecc>]
    notify_change+0x30c/0x4d0
    Aug 28 10:03:08 grivola kernel: [<ffffffff9224af05>]
    do_truncate+0x75/0xc0
    Aug 28 10:03:08 grivola kernel: [<ffffffff92250118>] ?
    __sb_start_write+0x58/0x120
    Aug 28 10:03:08 grivola kernel: [<ffffffff9224b329>]
    do_sys_ftruncate.constprop.14+0x139/0x1a0
    Aug 28 10:03:08 grivola kernel: [<ffffffff9224b3ce>]
    SyS_ftruncate+0xe/0x10
    Aug 28 10:03:08 grivola kernel: [<ffffffff92792ed2>]
    system_call_fastpath+0x25/0x2a
At this time the NFS server freeze. Even a ssh session or the local
console (via IDRAC or screen/keyboard physically plugged on the server)
do not work.
I have no special messages on the NFS server. The freeze period end with:
On the server:
Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
    Aug 28 10:20:26 robin kernel: NFSD: client 194.254.66.26 testing
    state ID with incorrect client ID
and on the client:
Aug 28 10:20:26 grivola kernel: nfs: server
    robin.legi.grenoble-inp.fr OK
    Aug 28 10:20:26 grivola kernel: nfs: server
    robin.legi.grenoble-inp.fr OK
    Aug 28 10:20:26 grivola kernel: nfs: server
    robin.legi.grenoble-inp.fr OK
    Aug 28 10:20:26 grivola kernel: nfs: server
    robin.legi.grenoble-inp.fr OK
    Aug 28 10:20:26 grivola kernel: nfs: server
    robin.legi.grenoble-inp.fr OK
I do not know how to investigate this....
Patrick
Le 09/07/2020 à 12:11, Patrick Bégou a écrit :
...
Hi Orion,
no, I still have this problem. I delay working on it as I the latest
updates have not been installed on the server and on the client. I'll
work again on this problem as soon as possible.
Thanks Charles for your detailed information on how to track this
problem. I'll check all these metrics.
I have several clients for this nfs server and the problem seems only to
occur from the client using nfs 4.1 in CentOS Linux release 7.7.1908 (Core).
The default options used are:
rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=194.254.xx.xx,local_lock=none,addr=194.254.yy.yy
On olders clients (Red Hat Enterprise Linux Server release 6.7
(Santiago)) default options are:
rw,intr,hard,sloppy,vers=4,addr=194.254.xx.xx,clientaddr=194.254.yy.yy
The server in CentOS7.6.1810
Will see if the latest updates help to solve the problem.
Patrick
Le 03/07/2020 à 00:05, Orion Poplawski a écrit :
...
On 6/1/20 3:08 AM, Patrick Bégou wrote:
...
Le 13/05/2020 à 02:13, Orion Poplawski a écrit :
...
On 5/12/20 2:46 AM, Patrick Bégou wrote:
...
Hi,
I need some help with NFSv4 setup/tuning. I have a dedicated nfs
server
(2 x E5-2620  8cores/16 threads each, 64GB RAM, 1x10Gb ethernet and
16x
8TB HDD) used by two servers and a small cluster (400 cores). All the
servers are running CentOS 7, the cluster is running CentOS6.
Time to time on the server I get:
kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID with
      incorrect client ID
And the client xxx.xxx.xxx.xxx freeze whith:
kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding,
      still trying
       kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
       kernel: nfs: server xxxxx.legi.grenoble-inp.fr not responding,
      still trying
       kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
There is a discussion on RedHat7 support about this but only open to
subscribers. Other searches with google do not provide  useful
information.
FYI - you can get access to such info with a free RHEL developers
account.
Thanks for your suggestion. As the problem is back I've subscribed to
reach the full content of this discussion.
The answer was "do not use antivirus" :-(. I do not use antivirus as I
am CentOS only.
Patrick
Just curious to see if you have had any luck resolving these issues?
I'm afraid that NFS on EL 7 has become much less stable for us
recently as well with lots more client access hangs.
Orion

CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] CentOS7 and NFS