[CentOS] Centos-4.3: Filelocking problems under high [network related] load with kernel 2.6.9-42.0.3.ELsmp

Mon Nov 27 22:11:26 UTC 2006
Kevan Benson <kbenson at a-1networks.com>

On Monday 27 November 2006 10:54, Martin Knoblauch wrote:
>  first of all, please CC me on any reply, as I am only subscribed to
> the digest.
>
> OK. Here is the problem. Said kernel (from 4.4) seems to have problems
> with file-locking when the system is under high, likely network
> related,
> load. The symptoms are things using file locking (rpm, the user-space
> automounter amd) fail to obtain locks, usually stating timeout
> problems.
>
> The sytem in question is a HP/DL380G4 with dual-single-core EM64T CPUs
> and 8GB of Memory. The network interfaces are "tg3". It happens with
> both CentOs and RHEL4.
>
> The high load can be triggered by copying three 3 GB files in parallel
> from an NFS server (Solaris10, NFS, TCP, 1GBit) to another NFS server
> (RHEL4, NFS, TCP, 100 MBit). The measured network performance is OK.
> During this operation the systems goes to Loads around/above 10.
> Overall responsiveness feels good, but software doing file-locking or
> stuff like opening a new ssh connection take extremely long.
>
> So, if anyone has an idea or hint, it will be highly appreciated.

NFS has known problems with flock.  man flock(2) specifically notes this.  
Which file locking mechanism (flock or fcntl) does your system use 
predominantly (that is, how do the applications that uses NFS lock their 
files)?  

NFS v4 has some major strides towards better locking, but it's been long 
enough since I dealt with this that I'm not sure if it actually solves 
anything (although it looks like it does).  You might want to try NFS v4 if 
possible.

-- 
- Kevan Benson
- A-1 Networks