[CentOS] CentOS7 and NFS

On May 16, 2020 12:41:09 PM GMT+03:00, "Patrick Bégou" <Patrick.Begou at legi.grenoble-inp.fr> wrote:
>Hi Barbara,
>
>Thanks for all these suggestions. Yes, jumbo frames are activated and I
>have only two 10Gb ethernet switch between the server and the client,
>connected with a monomode fiber.
>I saw yesterday that the client showing the problem had not the right
>MTU (1500 instead of 9000). I don't know why. I changed the MTU to 9000
>yesterday and I'm looking at the logs now to see if the problems occur
>again.
>
>I will try to increase the number of nfs daemon in a few day, to check
>each setup change one after the other. Because of covid19, I'm working
>from home so I should be really careful when changing the setup of the
>servers.
>
>On a cluster node I try to set "rsize=1048576,wsize=1048576,vers=4,tcp"
>(I cannot have a larger value for rsize/wsize) but comparison with the
>mount using default setup do not show significant improvements. I sent
>20GB to the server or 2x10GB (2 concurrent processes) with dd to be
>larger than the raid controller cache but lower than the  server and
>client RAM. It was just a short test this morning.
>
>Patrick
>
>Le 15/05/2020 à 15:32, Barbara Krašovec a écrit :
>> The number of threads has nothing to do with the number of cores on
>the machine. It depends on the I/O, network speed, type of workload
>etc.
>> We usually start with 32 threads and increase if necessary. 
>>
>> You can check the statistics with:
>> watch 'cat /proc/net/rpc/nfsd | grep th’
>>
>> Or you can check on the client
>> bide5.bin 
>> nfsstat -rc
>> Client rpc stats:
>> calls      retrans    authrefrsh
>> 1326777974   0          1326645701
>>
>> If you see a large number of retransmissions, you should increase the
>number of threads.
>>
>> However, your problem could also be related to the filesystem or
>network.
>>
>> Do you have jumbo frames (if yes, you should have them on clients and
>server)? You might think about disabling flow control on the switch and
>on the network card. Are there a lot of dropped packets?
>>
>> For network tuning, check http://fasterdata.es.net/host-tuning/linux/
>>
>> Did you try to enable readahead (blockdev —setra) on the filesystem?
>>
>> On the client side, changing the mount options helps. The default
>read/write block size is quite little, increase it (rsize, wsize), and
>use noatime.
>>
>>
>> Cheers,
>> Barbara
>>
>>
>>
>>
>>
>>> On 15 May 2020, at 09:26, Patrick Bégou
><Patrick.Begou at legi.grenoble-inp.fr> wrote:
>>>
>>> Le 13/05/2020 à 15:36, Patrick Bégou a écrit :
>>>> Le 13/05/2020 à 07:32, Simon Matter via CentOS a écrit :
>>>>>> Le 12/05/2020 à 16:10, James Pearson a écrit :
>>>>>>> Patrick Bégou wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I need some help with NFSv4 setup/tuning. I have a dedicated
>nfs server
>>>>>>>> (2 x E5-2620  8cores/16 threads each, 64GB RAM, 1x10Gb ethernet
>and 16x
>>>>>>>> 8TB HDD) used by two servers and a small cluster (400 cores).
>All the
>>>>>>>> servers are running CentOS 7, the cluster is running CentOS6.
>>>>>>>>
>>>>>>>> Time to time on the server I get:
>>>>>>>>
>>>>>>>>       kernel: NFSD: client xxx.xxx.xxx.xxx testing state ID
>with
>>>>>>>>      incorrect client ID
>>>>>>>>
>>>>>>>> And the client xxx.xxx.xxx.xxx freeze whith:
>>>>>>>>
>>>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr not
>responding,
>>>>>>>>      still trying
>>>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr not
>responding,
>>>>>>>>      still trying
>>>>>>>>       kernel: nfs: server xxxxx.legi.grenoble-inp.fr OK
>>>>>>>>
>>>>>>>> There is a discussion on RedHat7 support about this but only
>open to
>>>>>>>> subscribers. Other searches with google do not provide  useful
>>>>>>>> information.
>>>>>>>>
>>>>>>>> Do you have an idea how to solve these freeze states ?
>>>>>>>>
>>>>>>>> More generally I would be really interested with some
>advice/tutorials
>>>>>>>> to improve NFS performances in this dedicated context. There
>are so
>>>>>>>> many
>>>>>>>> [different] things about tuning NFS available on the web that
>I'm a
>>>>>>>> little bit lost (the opposite of the previous question). So if
>some one
>>>>>>>> has "the tutorial"...;-)
>>>>>>> How many nfsd threads are you running on the server? - current
>count
>>>>>>> will be in /proc/fs/nfsd/threads
>>>>>>>
>>>>>>> James Pearson
>>>>>> Hi James,
>>>>>>
>>>>>> Thanks for your answer. I've configured 24 threads (for 16
>hardware
>>>>>> cores/ 32Threads on the NFS server with this processors)
>>>>>>
>>>>>> But it seams that there are buffer setup to modify too when
>increasing
>>>>>> the threads number... It is not done.
>>>>>>
>>>>>> Load average on the server is below 1....
>>>>> I'd be very careful with higher thread numbers than physical
>cores. NFS
>>>>> threads and so called CPU hyper/simultaneous threads are quite
>different
>>>>> things and it can hurt performance if not configured correctly.
>>>>>
>>>> So you suggest to limit the setup to 16 daemons ? I'll try this
>evening.
>>>>
>>> Setting 16 daemons (the number of physical cores) do not solve this
>>> problem. Moreover I saw a document (but old) provided by DELL to
>>> optimize NFS servers performances in HPC context and they suggest to
>>> use... 128 daemons on a dedicated poweredge server. :-\
>>>
>>> I saw that it is always the same client showing the problem (a large
>fat
>>> node), may be I must investigate on the client side more than on the
>>> serveur side.
>>>
>>> Patrick
>>>
>>>
>>>
>>> _______________________________________________
>>> CentOS mailing list
>>> CentOS at centos.org <mailto:CentOS at centos.org>
>>> https://lists.centos.org/mailman/listinfo/centos
><https://lists.centos.org/mailman/listinfo/centos>
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>
>
>_______________________________________________
>CentOS mailing list
>CentOS at centos.org
>https://lists.centos.org/mailman/listinfo/centos

Hi ,
Why don't you leave the client negotiate the version itself ?
pNFS requires  at minimum - v4.1 and can bring extra performance.

P.S.: According to the man pages 'vers' is :
'is  an  alternative  to   the nfsvers option.  It is included for compatibility with other operating systems.'
I was  always using 'nfsvers' :).

Best Regards,
Strahil Nikolov