[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?
Matt Garman
matthew.garman at gmail.com
Wed Apr 29 16:32:26 UTC 2015
On Wed, Apr 29, 2015 at 10:36 AM, Devin Reade <gdr at gno.org> wrote:
> Have you looked at the client-side NFS cache? Perhaps the C6 cache
> is either disabled, has fewer resources, or is invalidating faster?
> (I don't think that would explain the C5 starvation, though, unless
> it's a secondary effect from retransmits, etc.)
Do you know where the NFS cache settings are specified? I've looked
at the various nfs mount options. Anything cache-related appears to
be the same between the two OSes, assuming I didn't miss anything. We
did experiment with the "noac" mount option, though that had no effect
in our tests.
FWIW, we've done a tcpdump on both OSes, performing the same tasks,
and it appears that 5 actually has more "chatter". Just looking at
packet counts, 5 has about 17% more packets than 6, for the same
workload. I haven't dug too deep into the tcpdump files, since we
need a pretty big workload to trigger the measurable performance
discrepancy. So the resulting pcap files are on the order of 5 GB.
> Regarding the cache, do you have multiple mount points on a client
> that resolve to the same server filesystem? If so, do they have
> different mount options? If so, that can result in multiple caches
> instead of a single disk cache. The client cache can also be bypassed
> if your application is doing direct I/O on the files. Perhaps there
> is a difference in the application between C5 and C6, including
> whether or not it was just recompiled? (If so, can you try a C5 version
> on the C6 machines?)
No multiple mount points to the same server.
No application differences. We're still compiling on 5, regardless of
target platform.
> If you determine that C6 is doing aggressive caching, does this match
> the needs of your application? That is, do you have the situation
> where the client NFS layer does an aggressive read-ahead that is never
> used by the application?
That was one of our early theories. On 6, you can adjust this via
/sys/class/bdi/X:Y/read_ahead_kb (use stat on the mountpoint to
determine X and Y). This file doesn't exist on 5. But we tried
increasing and decreasing it from the default (960), and didn't see
any changes.
> Are C5 and C6 using the same NFS protocol version? How about TCP vs
> UDP? If UDP is in play, have a look at fragmentation stats under load.
Yup, both are using tcp, protocol version 3.
> Are both using the same authentication method (ie: maybe just
> UID-based)?
Yup, sec=sys.
> And, like always, is DNS sane for all your clients and servers? Everything
> (including clients) has proper PTR records, consistent with A records,
> et al? DNS is so fundamental to everything that if it is out of whack
> you can get far-reaching symptoms that don't seem to have anything to do
> with DNS.
I believe so. I wouldn't bet my life on it. But there were certainly
no changes to our DNS before, during or since the OS upgrade.
> You may want to look at NFSometer and see if it can help.
Haven't seen that, will definitely give it a try!
Thanks for your thoughts and suggestions!
More information about the CentOS
mailing list