On Wed, 2007-10-10 at 10:19 -0500, jlee wrote:
Craig White wrote:
On Wed, 2007-10-10 at 08:16 -0400, Andy Harrison wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 10/9/07, jlee wrote:
output from /var/log/messages Oct 9 12:56:38 lyra kernel: nscd[11660]: segfault at 0000002b401fee8b rip 000000552aab7966 rsp 00000000408029e0 error 4 Oct 9 13:16:38 lyra kernel: nscd[12540]: segfault at 0000002b401fee8b rip 000000552aab7966 rsp 00000000408029e0 error 4
I'm starting to have this problem as well. I have two mail servers running courier and postfix. They've been up for a couple weeks but I just put them into production monday this week, two days ago.
Oct 9 07:34:49 ash kernel: nscd[3455]: segfault at 0000000040201000 rip 0000555555563274 rsp 00000000401a1df0 error 6 Oct 9 07:35:20 ash nscd: 27206 invalid persistent database file "/var/db/nscd/passwd": verification failed
Oct 10 07:33:37 oak kernel: nscd[25051]: segfault at 0000000040201000 rip 0000555555563274 rsp 00000000401a73a0 error 6 Oct 10 07:33:48 oak nscd: 29526 invalid persistent database file "/var/db/nscd/passwd": verification failed
The first time it had happened, I was using the stock /etc/nscd.conf file. The second time it happened on the other server, I had doubled the max-db-size passwd value to 67108864.
Both servers are running CentOS 5, firewall disabled and no SELinux .
Linux ash 2.6.18-8.el5 #1 SMP Thu Mar 15 19:46:53 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
(24)[11:58am] # yum list nscd nscd.x86_64 2.5-12 installed
# ls -la /etc/ldap* lrwxrwxrwx 1 root root 18 Sep 27 15:14 /etc/ldap.conf -> openldap/ldap.conf lrwxrwxrwx 1 root root 20 Sep 27 15:14 /etc/ldap.secret -> openldap/ldap.secret # ls -la /etc/openldap/ldap.*
- -rw-r--r-- 1 root root 8974 Sep 27 13:55 /etc/openldap/ldap.conf
- -rw------- 1 root root 10 Sep 27 13:56 /etc/openldap/ldap.secret
My ldap.conf # grep '^[^#]' /etc/ldap.conf base dc=xxxxxxx,dc=xxx uri ldap://ldap-1.xxxxxxx.xxx binddn cn=foo,ou=bar,dc=xxxxxxx,dc=xxx bindpw xxxxxxxx rootbinddn cn=foo,ou=bar,dc=xxxxxxx,dc=xxx scope sub timelimit 30 bind_timelimit 30 bind_policy soft idle_timelimit 3600 pam_check_host_attr yes nss_base_passwd dc=xxxxxxx,dc=net?sub nss_base_shadow dc=xxxxxxx,dc=net?sub pam_password clear nss_base_group ou=Group,dc=xxxxxxx,dc=xxx?one TLS_REQCERT request TLS_CACERT /usr/local/etc/openldap/certs/cacert.pem
The two previous servers did not have this particular problem. They were not identical hardware, but identical os install and config,
Any clues?
I don't generally use nscd any longer but since it is a dynamic system, why not just stop nscd and delete the db and then restart nscd service since it is certain to recreate it? (or perhaps move it out of the way to be safe)...
/sbin/service nscd stop mv /var/db/nscd/* /tmp /sbin/service nscd start
I tried deleting the db files on one of th boxes after seeing this on the web, but nscd segfaulted less than half an hour later. This problem seems to happen only with x86_64 boxes. Another box here is x86_32 and has no issues with nscd.
I would like to drop this service but there are critical apps that require it since authentication comes through openldap. It does not seem to be hardware specific since the two x86_64 boxes have different mobo, one abit and one asus.
The logger is turned on for nscd but nothing looks unusual in them, and it has been difficult finding which pid precedes the segfault.
Can malformed addresses cause nscd to segfault?
---- I don't know the answer to that but it would seem that if that were the case, the problem would exist with i386 version.
I suppose you will have to attach an strace to the pid and then create a bugzilla entry with attached strace - probably on the upstream provider.
As for 'critical apps that require' nscd...I don't personally know of any and if we are talking about CentOS-5 which has 2.3.27 version of openldap...the 2.3.x versions are very fast and I'm not certain that nscd is of all that much benefit (but I don't know because I have never tested it out).