[CentOS] nfs (or tcp or scheduler) changes between centos 5 and 6?

Wed Apr 29 15:00:30 UTC 2015

Matt Garman wrote:
> We have a "compute cluster" of about 100 machines that do a read-only
> NFS mount to a big NAS filer (a NetApp FAS6280).  The jobs running on
> these boxes are analysis/simulation jobs that constantly read data off
> the NAS.
>
> We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5.
> We did a "piecemeal" upgrade, usually upgrading five or so machines at
> a time, every few days.  We noticed improved performance on the CentOS
> 6 boxes.  But as the number of CentOS 6 boxes increased, we actually
> saw performance on the CentOS 5 boxes decrease.  By the time we had
> only a few CentOS 5 boxes left, they were performing so badly as to be
> effectively worthless.
>
> What we observed in parallel to this upgrade process was that the read
> latency on our NetApp device skyrocketed.  This in turn caused all
> compute jobs to actually run slower, as it seemed to move the
> bottleneck from the client servers' OS to the NetApp.  This is
> somewhat counter-intuitive: CentOS 6 performs faster, but actually
> results in net performance loss because it creates a bottleneck on our
> centralized storage.
<snip>
*IF* I understand you, I've got one question: what parms are you using to
mount the storage? We had *real* performance problems when we went from 5
to 6 - as in, unzipping a 26M file to 107M, while writing to an
NFS-mounted drive, went from 30 sec or so to a *timed* 7 min. The final
answer was that once we mounted the NFS filesystem with nobarrier in fstab
instead of default, the time dropped to 35 or 40 sec again.

barrier is in 6, and tries to make writes atomic transactions; its intent
is to protect in case of things like power failure. Esp. if you're on
UPSes, nobarrier is the way to go.

       mark