On Thu, Apr 30, 2015 at 02:24:27PM +0200, Peter van Hooft wrote: > > Message: 4 > > Date: Wed, 29 Apr 2015 08:35:29 -0500 > > From: Matt Garman <matthew.garman at gmail.com> > > To: CentOS mailing list <centos at centos.org> > > Subject: [CentOS] nfs (or tcp or scheduler) changes between centos 5 > > and 6? > > Message-ID: > > <CAJvUf-CyTg8ZiGq3OXRLKw7s1K2dGx1gqo_2XwOAXXQty=RHZQ at mail.gmail.com> > > Content-Type: text/plain; charset=UTF-8 > > > > We have a "compute cluster" of about 100 machines that do a read-only > > NFS mount to a big NAS filer (a NetApp FAS6280). The jobs running on > > these boxes are analysis/simulation jobs that constantly read data off > > the NAS. > > > > We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5. > > We did a "piecemeal" upgrade, usually upgrading five or so machines at > > a time, every few days. We noticed improved performance on the CentOS > > 6 boxes. But as the number of CentOS 6 boxes increased, we actually > > saw performance on the CentOS 5 boxes decrease. By the time we had > > only a few CentOS 5 boxes left, they were performing so badly as to be > > effectively worthless. > > > > What we observed in parallel to this upgrade process was that the read > > latency on our NetApp device skyrocketed. This in turn caused all > > compute jobs to actually run slower, as it seemed to move the > > bottleneck from the client servers' OS to the NetApp. This is > > somewhat counter-intuitive: CentOS 6 performs faster, but actually > > results in net performance loss because it creates a bottleneck on our > > centralized storage. > > > > All indications are that CentOS 6 seems to be much more "aggressive" > > in how it does NFS reads. And likewise, CentOS 5 was very "polite", > > to the point that it basically got starved out by the introduction of > > the 6.5 boxes. > > > > What I'm looking for is a "deep dive" list of changes to the NFS > > implementation between CentOS 5 and CentOS 6. Or maybe this is due to > > a change in the TCP stack? Or maybe the scheduler? We've tried a lot > > of sysctl tcp tunings, various nfs mount options, anything that's > > obviously different between 5 and 6... But so far we've been unable to > > find the "smoking gun" that causes the obvious behavior change between > > the two OS versions. > > > > Just hoping that maybe someone else out there has seen something like > > this, or can point me to some detailed documentation that might clue > > me in on what to look for next. > > > > Thanks! > > > > > You may want to try reducing sunrpc.tcp_max_slot_table_entries . > In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries = 16 > In CentOS 6, this number is dynamic with a maximum of > sunrpc.tcp_max_slot_table_entries which by default has a value of 65536. > > We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc > tcp_max_slot_table_entries=128 Make that /etc/modprobe.d/sunrpc.conf, of course. peter