[CentOS] nscd segfaulting on centos 4.5

Wed Oct 10 15:19:54 UTC 2007
jlee <jlee at flambeau.com>


Craig White wrote:
> On Wed, 2007-10-10 at 08:16 -0400, Andy Harrison wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> On 10/9/07, jlee  wrote:
>>> output from /var/log/messages
>>> Oct  9 12:56:38 lyra kernel: nscd[11660]: segfault at 0000002b401fee8b rip 000000552aab7966 rsp 00000000408029e0 error 4
>>> Oct  9 13:16:38 lyra kernel: nscd[12540]: segfault at 0000002b401fee8b rip 000000552aab7966 rsp 00000000408029e0 error 4
>>
>> I'm starting to have this problem as well.  I have two mail servers
>> running courier and postfix.  They've been up for a couple weeks but I
>> just put them into production monday this week, two days ago.
>>
>> Oct  9 07:34:49 ash kernel: nscd[3455]: segfault at 0000000040201000
>> rip 0000555555563274 rsp 00000000401a1df0 error 6
>> Oct  9 07:35:20 ash nscd: 27206 invalid persistent database file
>> "/var/db/nscd/passwd": verification failed
>>
>>
>> Oct 10 07:33:37 oak kernel: nscd[25051]: segfault at 0000000040201000
>> rip 0000555555563274 rsp 00000000401a73a0 error 6
>> Oct 10 07:33:48 oak nscd: 29526 invalid persistent database file
>> "/var/db/nscd/passwd": verification failed
>>
>> The first time it had happened, I was using the stock /etc/nscd.conf
>> file.  The second time it happened on the other server, I had doubled
>> the max-db-size passwd value to 67108864.
>>
>> Both servers are running CentOS 5, firewall disabled and no SELinux .
>>
>> Linux ash 2.6.18-8.el5 #1 SMP Thu Mar 15 19:46:53 EDT 2007 x86_64
>> x86_64 x86_64 GNU/Linux
>>
>> (24)[11:58am] # yum list nscd
>> nscd.x86_64                              2.5-12                 installed
>>
>>
>>
>> # ls -la /etc/ldap*
>> lrwxrwxrwx 1 root root   18 Sep 27 15:14 /etc/ldap.conf -> openldap/ldap.conf
>> lrwxrwxrwx 1 root root   20 Sep 27 15:14 /etc/ldap.secret ->
>> openldap/ldap.secret
>> # ls -la /etc/openldap/ldap.*
>> - -rw-r--r-- 1 root root 8974 Sep 27 13:55 /etc/openldap/ldap.conf
>> - -rw------- 1 root root   10 Sep 27 13:56 /etc/openldap/ldap.secret
>>
>>
>> My ldap.conf
>> # grep '^[^#]' /etc/ldap.conf
>> base dc=xxxxxxx,dc=xxx
>> uri ldap://ldap-1.xxxxxxx.xxx
>> binddn cn=foo,ou=bar,dc=xxxxxxx,dc=xxx
>> bindpw xxxxxxxx
>> rootbinddn cn=foo,ou=bar,dc=xxxxxxx,dc=xxx
>> scope sub
>> timelimit 30
>> bind_timelimit 30
>> bind_policy soft
>> idle_timelimit 3600
>> pam_check_host_attr yes
>> nss_base_passwd dc=xxxxxxx,dc=net?sub
>> nss_base_shadow dc=xxxxxxx,dc=net?sub
>> pam_password clear
>> nss_base_group          ou=Group,dc=xxxxxxx,dc=xxx?one
>> TLS_REQCERT request
>> TLS_CACERT /usr/local/etc/openldap/certs/cacert.pem
>>
>> The two previous servers did not have this particular problem.  They
>> were not identical hardware, but identical os install and config,
>>
>> Any clues?
> ---
> I don't generally use nscd any longer but since it is a dynamic system,
> why not just stop nscd and delete the db and then restart nscd service
> since it is certain to recreate it? (or perhaps move it out of the way
> to be safe)...
> 
> /sbin/service nscd stop
> mv /var/db/nscd/* /tmp
> /sbin/service nscd start
> 
> Craig
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 

I tried deleting the db files on one of th boxes after seeing this on the web, but nscd
segfaulted less than half an hour later. This problem seems to happen only
with x86_64 boxes. Another box here is x86_32 and has no issues with nscd.

I would like to drop this service but there are critical apps that require
it since authentication comes through openldap. It does not seem to be hardware specific
since the two x86_64 boxes have different mobo, one abit and one asus.

The logger is turned on for nscd but nothing looks unusual in them, and it has been
difficult finding which pid precedes the segfault.

Can malformed addresses cause nscd to segfault?