On Wed, Aug 13, 2008 at 9:27 AM, Matthew Kent matt@bravenet.com wrote:
This is the exact problem we were having here. Rebooting is the only solution.
And as already mentioned further down the thread it was attributed to this https://bugzilla.redhat.com/show_bug.cgi?id=453094
My solution was to extract the patch from the upstream kernel in http://people.redhat.com/dzickus/el5/103.el5/src/ called linux-2.6-fs-lockd-nlmsvc_lookup_host-called-with-f_sema-held.patch
and reroll the latest centosplus kernel srpm with it. Servers have been fine for 6 days running this kernel.
As much as I hate carrying custom kernel rpms this is a showstopper for us, and it looks like it won't make in until 5.3.
Personally given the limited scope of the patch and apparent unwillingness of redhat to include it in an update I'd advocate CentOS carrying it as a custom patch.
Here's my srpm if anyone wants it, http://magoazul.com/tmp/kernel-2.6.18-92.1.10.1.el5.centos.plus.src.rpm the only change is the patch for this issue. Everything builds cleanly via mock. -- Matthew Kent \ SA \ bravenet.com
CentOS developer, Tru, compiled a patched version of regular kernel and is offering it at:
http://people.centos.org/tru/kernel+bz453094/
Also, the fix will be in the upcoming kernel-2.6.18-92.1.13.el5 according to the bugzilla referred to above.
Akemi