[CentOS] Excessive NFS operations

Thu Sep 10 11:30:53 UTC 2009
lhecking at users.sourceforge.net <lhecking at users.sourceforge.net>

 Reading the "waiting IOs" thread made me remember I have a similar problem
 that has been here for months, and I have no sulution yet.

 A single CentOS 5.2 x86_64 machine here is overloading our NetApp filer with
 excessive NFS getattr, lookup and access operations. The weird thing is that
 the number of these operations increases over time. I have an mrtg graph
 (which I didn't want to attach here) showing e.g. 200 NFS Ops on Monday,
 measured with filer-mrtg, going up to, e.g. 1200 in a straight line within
 days. nfsstat -l on the filer proves beyond doubt that the load is caused by
 this particular machine. dstat shows me which NFS operations are causing it.

  date/time   | null  gatr  satr  look  aces ...
10-09 12:22:52|   0     0     0     0     0
10-09 12:22:53|   0   525     0   602   602
10-09 12:22:54|   0  1275     0  1464  1438
10-09 12:22:55|   0     0     0     0     0
10-09 12:22:56|   0     0     0     0     0
10-09 12:22:57|   0     0     0     0     0
10-09 12:22:58|   0   238     0   270   270
10-09 12:22:59|   0  1461     0  1663  1660
10-09 12:23:00|   0   205     0   133   114
10-09 12:23:01|   0     0     0     0     0
10-09 12:23:02|   0     1     0     0     0
10-09 12:23:03|   0     0     0     0     0
10-09 12:23:04|   0  1411     0  1574  1574
10-09 12:23:05|   0   498     0   465   466
10-09 12:23:06|   0     0     0     0     0
10-09 12:23:07|   0     0     0     0     0
10-09 12:23:08|   0     0     0     0     0
10-09 12:23:09|   0  1082     0  1178  1192
10-09 12:23:10|   0   790     0   885   865

 This behaviour is somehow tied to the Gnome desktop. I have other machines
 running CentOS 5.2 x86_64 (at init level 3) which don't show this behaviour.
 I also have CentOS 5.2 i386 machines which don't show it either. None of the
 other machines on the lan show it - RHEL3 32 and 64bit, Solaris.

 What I'd need is a monitoring tool than can tie the NFS ops to process ids
 or applications. lsof isn't nearly as helpful here as I thought. I even copied
 this workstation user's files to another account, logged in and ran the same
 apps - and couldn't reproduce it.

 Ideas? Essentially, this makes CentOS 64bit undeployable in our environemnt.