[CentOS] centos.plus kernels

Tue Feb 24 14:08:22 UTC 2009
Alain Terriault <alaint at music.mcgill.ca>

hi,

I experienced 3 problems that i am 99% sure related to kernel-2.6.18-92.1.22.el5.centos.plus.x86_64.rpm <http://mirror.centos.org/centos/5.2/centosplus/x86_64/RPMS/kernel-2.6.18-92.1.22.el5.centos.plus.x86_64.rpm>

#1 on a nfs server
problem : constantly appearing on all my machines "lockd: server 192.168.10.2 not responding, 
timed out"

solution : move to kernel-2.6.18-92.1.22.el5, never saw that message again. 

#2 on the same nfs server
problem : it is a important server for our department, so i rarely do a "shutdown". 
when i moved to kernel-2.6.18-92.1.22.el5 the "/sbin/shutdown -h now" did not succeed, i had to press the power button to shut it down.
the only thing that was on the screen, after 10 minutes, was a usb messages, i could unplug my usb keyboard and the kernel would notify
me of the changes, it seem that it was not totally frozen. 
everything boot up properly, after the power cycle, so i assume all the files where close properly.
i only had this problem on this nfs server, other machines with the same kernel never had this shutdown problem.

solution : move to kernel-2.6.18-92.1.22.el5, did power cycle test and all went just fine. 


#3 on a vmware virtual machine
problem :
Feb 15 18:55:41 www kernel: nscd[5631]: segfault at 00002b89938405c0 rip 00002b891f30f7c5 rsp  00000000411256d0 error 4
Feb 16 14:39:40 www nscd: 5642 invalid persistent database file "/var/db/nscd/passwd": verification failed
Feb 16 17:11:53 www kernel: nscd[5653]: segfault at 0000000040f9b000 rip 00002af497e3e4d4 rsp  0000000040f96050 error 6
Feb 17 09:44:37 www nscd: 5639 invalid persistent database file "/var/db/nscd/hosts": verification failed
Feb 19 03:09:09 www kernel: nscd[5830]: segfault at 00002baab9b653cc rip 00002ba9afe3b7a6 rsp  0000000041ec96d0 error 4

for ldap accounts login was still possible, some of the directories became unaccessible and other directories where working just fine. 

rebooting nscd temporary fixed the problem for ~12-24 hours, same for a system reboot.
i did not push my investigation further, after my problems with nfs lock i instantly switch to kernel-2.6.18-92.1.22.el5


solution : move to kernel-2.6.18-92.1.22.el5, never saw that message again.


I did not take that many notes or investigate that much, so i can not tell you much more. but i will do my best to reply to direct emails.
I am also not in a position to reproduce those problems.
I sent this post because i am concern they may be serious bugs in the centos.plus kernels. 

other than that, i am old centos fan, using it on terabytes of data and this is the first time i hit such a problem.

cheers,
alain