[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

Wed Apr 29 13:35:29 UTC 2015
Matt Garman <matthew.garman at gmail.com>

We have a "compute cluster" of about 100 machines that do a read-only
NFS mount to a big NAS filer (a NetApp FAS6280).  The jobs running on
these boxes are analysis/simulation jobs that constantly read data off
the NAS.

We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5.
We did a "piecemeal" upgrade, usually upgrading five or so machines at
a time, every few days.  We noticed improved performance on the CentOS
6 boxes.  But as the number of CentOS 6 boxes increased, we actually
saw performance on the CentOS 5 boxes decrease.  By the time we had
only a few CentOS 5 boxes left, they were performing so badly as to be
effectively worthless.

What we observed in parallel to this upgrade process was that the read
latency on our NetApp device skyrocketed.  This in turn caused all
compute jobs to actually run slower, as it seemed to move the
bottleneck from the client servers' OS to the NetApp.  This is
somewhat counter-intuitive: CentOS 6 performs faster, but actually
results in net performance loss because it creates a bottleneck on our
centralized storage.

All indications are that CentOS 6 seems to be much more "aggressive"
in how it does NFS reads.  And likewise, CentOS 5 was very "polite",
to the point that it basically got starved out by the introduction of
the 6.5 boxes.

What I'm looking for is a "deep dive" list of changes to the NFS
implementation between CentOS 5 and CentOS 6.  Or maybe this is due to
a change in the TCP stack?  Or maybe the scheduler?  We've tried a lot
of sysctl tcp tunings, various nfs mount options, anything that's
obviously different between 5 and 6... But so far we've been unable to
find the "smoking gun" that causes the obvious behavior change between
the two OS versions.

Just hoping that maybe someone else out there has seen something like
this, or can point me to some detailed documentation that might clue
me in on what to look for next.

Thanks!