[CentOS] NFS Hanging Under Heavy Load

Sat Mar 17 02:01:49 UTC 2012
Ray Van Dolson <rayvd at bludgeon.org>

On Fri, Mar 16, 2012 at 01:33:54PM -0700, Aaron Blew wrote:
> Hello all,
> I'm currently experiencing an issue with an NFS server I've built (a Dell
> R710 with a Dell PERC H800/LSI 2108 and four external disk trays).  It's a
> backup target for Solaris 10, CentOS 5.5 and CentOS 6.2 servers that mount
> it's data volume via NFS.  It has two 10gig NICs set up in a layer2+3 bond
> for one network, and two more 10gig NICs set up in the same way in another
> network.  The host has a 99T XFS filesystem for the backups.  RPCNFSDCOUNT
> is set to 256.
> During backups from clients the system exhibits odd hangs that interfere
> with some of our sensitive system's backup windows.  On the NFS server side
> we see the following in dmesg.  Originally I thought it was related to
> dirty writeback cache, but I adjusted dirty_writeback_centisecs and am
> still seeing the issue.
> dmesg during the problem window:
> Mar 16 07:01:21 *****store01 kernel: __ratelimit: 11 callbacks suppressed
> Mar 16 07:01:21 *****store01 kernel: nfsd: page allocation failure.


> Has anyone else seem similar issues?  I can provide additional details
> about the server/configuration if anybody needs anything else.  The issue
> only seems to occur under high write load as we've restored some of these
> backups and didn't seem to have an issue reading the data.

The page allocation failure message made me wonder if your issue could
be related to the issue I've run into here[1] on RHEL 6.2.

My issue seems to be related to NFS mounting, but it's possible the
root cause could be the same?

A few other links:


Red Hat has provided me with a test kernel which purportedly will
resolve the issue.  I haven't had a chance to test it out yet.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=751992