> Message: 4 > Date: Wed, 29 Apr 2015 08:35:29 -0500 > From: Matt Garman <matthew.garman at gmail.com> > To: CentOS mailing list <centos at centos.org> > Subject: [CentOS] nfs (or tcp or scheduler) changes between centos 5 > and 6? > Message-ID: > <CAJvUf-CyTg8ZiGq3OXRLKw7s1K2dGx1gqo_2XwOAXXQty=RHZQ at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > We have a "compute cluster" of about 100 machines that do a read-only > NFS mount to a big NAS filer (a NetApp FAS6280). The jobs running on > these boxes are analysis/simulation jobs that constantly read data off > the NAS. > > We recently upgraded all these machines from CentOS 5.7 to CentOS 6.5. > We did a "piecemeal" upgrade, usually upgrading five or so machines at > a time, every few days. We noticed improved performance on the CentOS > 6 boxes. But as the number of CentOS 6 boxes increased, we actually > saw performance on the CentOS 5 boxes decrease. By the time we had > only a few CentOS 5 boxes left, they were performing so badly as to be > effectively worthless. > > What we observed in parallel to this upgrade process was that the read > latency on our NetApp device skyrocketed. This in turn caused all > compute jobs to actually run slower, as it seemed to move the > bottleneck from the client servers' OS to the NetApp. This is > somewhat counter-intuitive: CentOS 6 performs faster, but actually > results in net performance loss because it creates a bottleneck on our > centralized storage. > > All indications are that CentOS 6 seems to be much more "aggressive" > in how it does NFS reads. And likewise, CentOS 5 was very "polite", > to the point that it basically got starved out by the introduction of > the 6.5 boxes. > > What I'm looking for is a "deep dive" list of changes to the NFS > implementation between CentOS 5 and CentOS 6. Or maybe this is due to > a change in the TCP stack? Or maybe the scheduler? We've tried a lot > of sysctl tcp tunings, various nfs mount options, anything that's > obviously different between 5 and 6... But so far we've been unable to > find the "smoking gun" that causes the obvious behavior change between > the two OS versions. > > Just hoping that maybe someone else out there has seen something like > this, or can point me to some detailed documentation that might clue > me in on what to look for next. > > Thanks! > You may want to try reducing sunrpc.tcp_max_slot_table_entries . In CentOS 5 the number of slots is fixed: sunrpc.tcp_slot_table_entries = 16 In CentOS 6, this number is dynamic with a maximum of sunrpc.tcp_max_slot_table_entries which by default has a value of 65536. We put that in /etc/sysconfig/modprobe.d/sunrpc.conf: options sunrpc tcp_max_slot_table_entries=128 You can't put this in /etc/sysctl.conf because the sunrpc kernel module is loaded before sysctl -p is done. peter