On Tue, 2008-08-12 at 14:27 +0200, Johan Swensson wrote: > So I'm running nfs to get content to my web servers. Now I've had this > problem 2 times (about 2 weeks since the last occurrence). > I use drbd on the nfs server for redundancy. Now to my problem: > > All my web sites stopped responding so I started by checking dmesg and > there I found a bunch of this errors > Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out > Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out > > But when checking the nfs server lockd was running and I could access > all the files from the webserver with ls, cd etc. This is the exact problem we were having here. Rebooting is the only solution. And as already mentioned further down the thread it was attributed to this https://bugzilla.redhat.com/show_bug.cgi?id=453094 My solution was to extract the patch from the upstream kernel in http://people.redhat.com/dzickus/el5/103.el5/src/ called linux-2.6-fs-lockd-nlmsvc_lookup_host-called-with-f_sema-held.patch and reroll the latest centosplus kernel srpm with it. Servers have been fine for 6 days running this kernel. As much as I hate carrying custom kernel rpms this is a showstopper for us, and it looks like it won't make in until 5.3. Personally given the limited scope of the patch and apparent unwillingness of redhat to include it in an update I'd advocate CentOS carrying it as a custom patch. Here's my srpm if anyone wants it, http://magoazul.com/tmp/kernel-2.6.18-92.1.10.1.el5.centos.plus.src.rpm the only change is the patch for this issue. Everything builds cleanly via mock. -- Matthew Kent \ SA \ bravenet.com