[CentOS] NFS Hanging Under Heavy Load

Fri Mar 30 20:33:10 UTC 2012
Aaron Blew <aaronblew at gmail.com>

UPDATE

I rolled a new kernel that's identical to the stock CentOS 2.6.32-220.el6
kernel with the exception of the new idmapper being enabled.  Unfortunately
there's been no improvement.

Did you get a chance to try the RHEL kernel?

-Aaron


On Fri, Mar 16, 2012 at 7:01 PM, Ray Van Dolson <rayvd at bludgeon.org> wrote:

> On Fri, Mar 16, 2012 at 01:33:54PM -0700, Aaron Blew wrote:
> > Hello all,
> > I'm currently experiencing an issue with an NFS server I've built (a Dell
> > R710 with a Dell PERC H800/LSI 2108 and four external disk trays).  It's
> a
> > backup target for Solaris 10, CentOS 5.5 and CentOS 6.2 servers that
> mount
> > it's data volume via NFS.  It has two 10gig NICs set up in a layer2+3
> bond
> > for one network, and two more 10gig NICs set up in the same way in
> another
> > network.  The host has a 99T XFS filesystem for the backups.
>  RPCNFSDCOUNT
> > is set to 256.
> >
> > During backups from clients the system exhibits odd hangs that interfere
> > with some of our sensitive system's backup windows.  On the NFS server
> side
> > we see the following in dmesg.  Originally I thought it was related to
> > dirty writeback cache, but I adjusted dirty_writeback_centisecs and am
> > still seeing the issue.
> >
> > dmesg during the problem window:
> > Mar 16 07:01:21 *****store01 kernel: __ratelimit: 11 callbacks suppressed
> > Mar 16 07:01:21 *****store01 kernel: nfsd: page allocation failure.
>
> <snip>
>
> >
> > Has anyone else seem similar issues?  I can provide additional details
> > about the server/configuration if anybody needs anything else.  The issue
> > only seems to occur under high write load as we've restored some of these
> > backups and didn't seem to have an issue reading the data.
>
> The page allocation failure message made me wonder if your issue could
> be related to the issue I've run into here[1] on RHEL 6.2.
>
> My issue seems to be related to NFS mounting, but it's possible the
> root cause could be the same?
>
> A few other links:
>
>  https://bugzilla.redhat.com/show_bug.cgi?id=593035
>  http://www.spinics.net/lists/linux-nfs/msg22248.html
>
> Red Hat has provided me with a test kernel which purportedly will
> resolve the issue.  I haven't had a chance to test it out yet.
>
> Ray
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=751992
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>