Is the NFS lockd bug fixed ?

List overview All Threads
Download

newer

older

RHSA-2009:0264-9 Kernel Security...

Samba Permissions - Sanity check

Alain Terriault

17 Feb 2009 17 Feb '09

3:44 p.m.

Hi,

I notice other users reporting problems with NFS lockd and was under the impression the problem was solve with "kernel-2.6.18-92.1.13.el5".

I am runing x86_64 versions of "kernel-2.6.18-92.1.22.el5.centos.plus" ( i need XFS) on my nfs server and "kernel 2.6.18-92.1.22.el5" on my clients.

I constantly get this error "lockd: server 192.168.10.2 not responding, timed out"

Does the most recent "centos_x64" kernel has the lockd patch ? maybe my problem is elsewhere, I never had problem with NFS before ..

thanks, alain

Show replies by date

Peter Kjellstrom

17 Feb 17 Feb

3:55 p.m.

On Tuesday 17 February 2009, Alain Terriault wrote:

...

Hi,

I notice other users reporting problems with NFS lockd and was under the impression the problem was solve with "kernel-2.6.18-92.1.13.el5".

I am runing x86_64 versions of "kernel-2.6.18-92.1.22.el5.centos.plus" ( i need XFS) on my nfs server and "kernel 2.6.18-92.1.22.el5" on my clients.

OT but, you have no reason to run the cenots.plus kernel for XFS. Use the normal kernel and kmod-xfs.

/Peter

Filipe Brandenburger

3:57 p.m.

Hi,

On Tue, Feb 17, 2009 at 10:44, Alain Terriault alaint@music.mcgill.ca wrote:

...

I am runing x86_64 versions of "kernel-2.6.18-92.1.22.el5.centos.plus" ( i need XFS)

You no longer need CentOS Plus kernel for XFS.

See: http://wiki.centos.org/AdditionalResources/Repositories/CentOSPlus#line-76

HTH, Filipe

Akemi Yagi

4:05 p.m.

On Tue, Feb 17, 2009 at 7:57 AM, Filipe Brandenburger filbranden@gmail.com wrote:

...

Hi,

On Tue, Feb 17, 2009 at 10:44, Alain Terriault alaint@music.mcgill.ca wrote:

...
I am runing x86_64 versions of "kernel-2.6.18-92.1.22.el5.centos.plus" ( i need XFS)

You no longer need CentOS Plus kernel for XFS.

See: http://wiki.centos.org/AdditionalResources/Repositories/CentOSPlus#line-76

And even this info is becoming obsolete. The current kmod-xfs package (in CentOS-5) is kABI-tracking, that is, it is independent of the kernel version.

Akemi

Alain Terriault

18 Feb 18 Feb

7:09 p.m.

New subject: Is the NFS lockd bug fixed ? (update)

removing all my centos+ kernels and using kmod on top of redhat kernels fix my nfs lock problem. thanks, alain

Akemi Yagi wrote:

...

On Tue, Feb 17, 2009 at 7:57 AM, Filipe Brandenburger filbranden@gmail.com wrote:

...
Hi,

On Tue, Feb 17, 2009 at 10:44, Alain Terriault alaint@music.mcgill.ca wrote:

...
I am runing x86_64 versions of "kernel-2.6.18-92.1.22.el5.centos.plus" ( i need XFS)

You no longer need CentOS Plus kernel for XFS.

See: http://wiki.centos.org/AdditionalResources/Repositories/CentOSPlus#line-76

And even this info is becoming obsolete. The current kmod-xfs package (in CentOS-5) is kABI-tracking, that is, it is independent of the kernel version.

Akemi _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Filipe Brandenburger

19 Feb 19 Feb

12:20 p.m.

New subject: Is the NFS lockd bug fixed ? (update)

Hi,

On Wed, Feb 18, 2009 at 14:09, Alain Terriault alaint@music.mcgill.ca wrote:

...

removing all my centos+ kernels and using kmod on top of redhat kernels fix my nfs lock problem.

Do you mean you replaced the CentOS-Plus kernel with a kernel from RedHat?

Or are you using a CentOS-Base (instead of Plus) kernel now?

When you replaced it, did you upgrade it as well? What was the version of the -Plus kernel you were using? And what is the version of the -Base or RHEL kernel you are using now?

If the issue happens with a version of the -Plus kernel but does not happen with the same version of the -Base kernel, I think it might deserve a little more investigation, so the details on what fixed your problem are important for us to know.

Thanks! Filipe

Alain Terriault

20 Feb 20 Feb

4:13 p.m.

New subject: Is the NFS lockd bug fixed ? (update)

Filipe Brandenburger wrote:

...

Do you mean you replaced the CentOS-Plus kernel with a kernel from RedHat?

...

Or are you using a CentOS-Base (instead of Plus) kernel now?

yes .. apologies for the confusion

...

When you replaced it, did you upgrade it as well? What was the version of the -Plus kernel you were using? And what is the version of the -Base or RHEL kernel you are using now?

problematic setup : nfs server was running 2.6.18-92.1.22.el5.centos.plus clients where running 2.6.18-92.1.22.el5.x86_64

healty setup : nfs server is running 2.6.18-92.1.22.el5.x86_64 + kmod-xfs-0.4-2 clients are still on 2.6.18-92.1.22.el5.x86_64

Akemi Yagi

5:28 p.m.

New subject: Is the NFS lockd bug fixed ? (update)

On Fri, Feb 20, 2009 at 8:13 AM, Alain Terriault alaint@music.mcgill.ca wrote:

...

problematic setup : nfs server was running 2.6.18-92.1.22.el5.centos.plus clients where running 2.6.18-92.1.22.el5.x86_64

healty setup : nfs server is running 2.6.18-92.1.22.el5.x86_64 + kmod-xfs-0.4-2 clients are still on 2.6.18-92.1.22.el5.x86_64

This setup would have worked as well: nfs server was running 2.6.18-92.1.22.el5.centos.plus + kmod-xfs-0.4-2 clients are still on 2.6.18-92.1.22.el5.x86_64

In other words, xfs is provided as a module package (kmod-xfs) and is therefore not enabled in the centosplus kernel. For those who are running the cplus kernel, kmod-xfs is also available from the cplus repository.

Akemi

Alain Terriault

24 Feb 24 Feb

2:08 p.m.

New subject: centos.plus kernels

hi,

I experienced 3 problems that i am 99% sure related to kernel-2.6.18-92.1.22.el5.centos.plus.x86_64.rpm http://mirror.centos.org/centos/5.2/centosplus/x86_64/RPMS/kernel-2.6.18-92.1.22.el5.centos.plus.x86_64.rpm

#1 on a nfs server problem : constantly appearing on all my machines "lockd: server 192.168.10.2 not responding, timed out"

solution : move to kernel-2.6.18-92.1.22.el5, never saw that message again.

#2 on the same nfs server problem : it is a important server for our department, so i rarely do a "shutdown". when i moved to kernel-2.6.18-92.1.22.el5 the "/sbin/shutdown -h now" did not succeed, i had to press the power button to shut it down. the only thing that was on the screen, after 10 minutes, was a usb messages, i could unplug my usb keyboard and the kernel would notify me of the changes, it seem that it was not totally frozen. everything boot up properly, after the power cycle, so i assume all the files where close properly. i only had this problem on this nfs server, other machines with the same kernel never had this shutdown problem.

solution : move to kernel-2.6.18-92.1.22.el5, did power cycle test and all went just fine.

#3 on a vmware virtual machine problem : Feb 15 18:55:41 www kernel: nscd[5631]: segfault at 00002b89938405c0 rip 00002b891f30f7c5 rsp 00000000411256d0 error 4 Feb 16 14:39:40 www nscd: 5642 invalid persistent database file "/var/db/nscd/passwd": verification failed Feb 16 17:11:53 www kernel: nscd[5653]: segfault at 0000000040f9b000 rip 00002af497e3e4d4 rsp 0000000040f96050 error 6 Feb 17 09:44:37 www nscd: 5639 invalid persistent database file "/var/db/nscd/hosts": verification failed Feb 19 03:09:09 www kernel: nscd[5830]: segfault at 00002baab9b653cc rip 00002ba9afe3b7a6 rsp 0000000041ec96d0 error 4

for ldap accounts login was still possible, some of the directories became unaccessible and other directories where working just fine.

rebooting nscd temporary fixed the problem for ~12-24 hours, same for a system reboot. i did not push my investigation further, after my problems with nfs lock i instantly switch to kernel-2.6.18-92.1.22.el5

solution : move to kernel-2.6.18-92.1.22.el5, never saw that message again.

I did not take that many notes or investigate that much, so i can not tell you much more. but i will do my best to reply to direct emails. I am also not in a position to reproduce those problems. I sent this post because i am concern they may be serious bugs in the centos.plus kernels.

other than that, i am old centos fan, using it on terabytes of data and this is the first time i hit such a problem.

cheers, alain

Rob Kampen

4:10 p.m.

New subject: centos.plus kernels

Alain Terriault wrote:

...

hi,

I experienced 3 problems that i am 99% sure related to kernel-2.6.18-92.1.22.el5.centos.plus.x86_64.rpm http://mirror.centos.org/centos/5.2/centosplus/x86_64/RPMS/kernel-2.6.18-92.1.22.el5.centos.plus.x86_64.rpm

#1 on a nfs server problem : constantly appearing on all my machines "lockd: server 192.168.10.2 not responding, timed out"

solution : move to kernel-2.6.18-92.1.22.el5, never saw that message again.

I run a CentOS 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008 x86_64 x86_64 x86_64 GNU/Linux kernel serving NFS with no issues. I have a number of CentOS 2.6.18-92.1.22.el5.centos.plus #1 SMP Wed Dec 17 10:49:19 EST 2008 x86_64 x86_64 x86_64 GNU/Linux workstations that have no issues as NFS clients. No experience of plus kernel used on servers.

...

#2 on the same nfs server problem : it is a important server for our department, so i rarely do a "shutdown". when i moved to kernel-2.6.18-92.1.22.el5 the "/sbin/shutdown -h now" did not succeed, i had to press the power button to shut it down. the only thing that was on the screen, after 10 minutes, was a usb messages, i could unplug my usb keyboard and the kernel would notify me of the changes, it seem that it was not totally frozen. everything boot up properly, after the power cycle, so i assume all the files where close properly. i only had this problem on this nfs server, other machines with the same kernel never had this shutdown problem.

solution : move to kernel-2.6.18-92.1.22.el5, did power cycle test and all went just fine.

#3 on a vmware virtual machine problem : Feb 15 18:55:41 www kernel: nscd[5631]: segfault at 00002b89938405c0 rip 00002b891f30f7c5 rsp 00000000411256d0 error 4 Feb 16 14:39:40 www nscd: 5642 invalid persistent database file "/var/db/nscd/passwd": verification failed Feb 16 17:11:53 www kernel: nscd[5653]: segfault at 0000000040f9b000 rip 00002af497e3e4d4 rsp 0000000040f96050 error 6 Feb 17 09:44:37 www nscd: 5639 invalid persistent database file "/var/db/nscd/hosts": verification failed Feb 19 03:09:09 www kernel: nscd[5830]: segfault at 00002baab9b653cc rip 00002ba9afe3b7a6 rsp 0000000041ec96d0 error 4

for ldap accounts login was still possible, some of the directories became unaccessible and other directories where working just fine.

rebooting nscd temporary fixed the problem for ~12-24 hours, same for a system reboot. i did not push my investigation further, after my problems with nfs lock i instantly switch to kernel-2.6.18-92.1.22.el5

solution : move to kernel-2.6.18-92.1.22.el5, never saw that message again.

I did not take that many notes or investigate that much, so i can not tell you much more. but i will do my best to reply to direct emails. I am also not in a position to reproduce those problems. I sent this post because i am concern they may be serious bugs in the centos.plus kernels.

other than that, i am old centos fan, using it on terabytes of data and this is the first time i hit such a problem.

Yeah - me too, just love the stability

...

cheers, alain

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Akemi Yagi

7:43 p.m.

New subject: centos.plus kernels

2009/2/24 Rob Kampen rkampen@kampensonline.com:

...

Alain Terriault wrote:

...

...
I experienced 3 problems that i am 99% sure related to kernel-2.6.18-92.1.22.el5.centos.plus.x86_64.rpm

You cannot be sure 99% until you can reproduce the problems...

...

...
#1 on a nfs server problem : constantly appearing on all my machines "lockd: server 192.168.10.2 not responding, timed out"

solution : move to kernel-2.6.18-92.1.22.el5, never saw that message again.

I run a CentOS 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008 x86_64 x86_64 x86_64 GNU/Linux kernel serving NFS with no issues. I have a number of CentOS 2.6.18-92.1.22.el5.centos.plus #1 SMP Wed Dec 17 10:49:19 EST 2008 x86_64 x86_64 x86_64 GNU/Linux workstations that have no issues as NFS clients. No experience of plus kernel used on servers.

I have been running 2.6.18-92.1.22.el5.centos.plus both on my own workstations and as VMware guests. The real (host) machine works as an nfs server as well. I have never seen any issue there.

...

...
#2 on the same nfs server problem : it is a important server for our department, so i rarely do a "shutdown". when i moved to kernel-2.6.18-92.1.22.el5 the "/sbin/shutdown -h now" did not succeed, i had to press the power button to shut it down.

I have not seen this behavior either on any of the machines running a centosplus kernel.

...

...
#3 on a vmware virtual machine problem : Feb 15 18:55:41 www kernel: nscd[5631]: segfault at 00002b89938405c0 rip 00002b891f30f7c5 rsp 00000000411256d0 error 4

(snip)

...

...
I did not take that many notes or investigate that much, so i can not tell you much more. but i will do my best to reply to direct emails. I am also not in a position to reproduce those problems. I sent this post because i am concern they may be serious bugs in the centos.plus kernels.

Unless this can be reproduced by you or other users, it will be very difficult to even identify it as a bug. Troubleshooting is nearly impossible. The centosplus kernel has several drivers and other options enabled, but otherwise its behavior should be the same as the distro kernel. Yes, some of those options might be involved, but nothing is immediately obvious as to the cause of the issues you experienced.

Akemi

5982

Age (days ago)

5989

Last active (days ago)

discuss@lists.centos.org

10 comments

5 participants

tags (0)

participants (5)

Akemi Yagi
Alain Terriault
Filipe Brandenburger
Peter Kjellstrom
Rob Kampen