On Wed, 2007-10-10 at 10:19 -0500, jlee wrote: > > Craig White wrote: > > On Wed, 2007-10-10 at 08:16 -0400, Andy Harrison wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> > >> > >> On 10/9/07, jlee wrote: > >>> output from /var/log/messages > >>> Oct 9 12:56:38 lyra kernel: nscd[11660]: segfault at 0000002b401fee8b rip 000000552aab7966 rsp 00000000408029e0 error 4 > >>> Oct 9 13:16:38 lyra kernel: nscd[12540]: segfault at 0000002b401fee8b rip 000000552aab7966 rsp 00000000408029e0 error 4 > >> > >> I'm starting to have this problem as well. I have two mail servers > >> running courier and postfix. They've been up for a couple weeks but I > >> just put them into production monday this week, two days ago. > >> > >> Oct 9 07:34:49 ash kernel: nscd[3455]: segfault at 0000000040201000 > >> rip 0000555555563274 rsp 00000000401a1df0 error 6 > >> Oct 9 07:35:20 ash nscd: 27206 invalid persistent database file > >> "/var/db/nscd/passwd": verification failed > >> > >> > >> Oct 10 07:33:37 oak kernel: nscd[25051]: segfault at 0000000040201000 > >> rip 0000555555563274 rsp 00000000401a73a0 error 6 > >> Oct 10 07:33:48 oak nscd: 29526 invalid persistent database file > >> "/var/db/nscd/passwd": verification failed > >> > >> The first time it had happened, I was using the stock /etc/nscd.conf > >> file. The second time it happened on the other server, I had doubled > >> the max-db-size passwd value to 67108864. > >> > >> Both servers are running CentOS 5, firewall disabled and no SELinux . > >> > >> Linux ash 2.6.18-8.el5 #1 SMP Thu Mar 15 19:46:53 EDT 2007 x86_64 > >> x86_64 x86_64 GNU/Linux > >> > >> (24)[11:58am] # yum list nscd > >> nscd.x86_64 2.5-12 installed > >> > >> > >> > >> # ls -la /etc/ldap* > >> lrwxrwxrwx 1 root root 18 Sep 27 15:14 /etc/ldap.conf -> openldap/ldap.conf > >> lrwxrwxrwx 1 root root 20 Sep 27 15:14 /etc/ldap.secret -> > >> openldap/ldap.secret > >> # ls -la /etc/openldap/ldap.* > >> - -rw-r--r-- 1 root root 8974 Sep 27 13:55 /etc/openldap/ldap.conf > >> - -rw------- 1 root root 10 Sep 27 13:56 /etc/openldap/ldap.secret > >> > >> > >> My ldap.conf > >> # grep '^[^#]' /etc/ldap.conf > >> base dc=xxxxxxx,dc=xxx > >> uri ldap://ldap-1.xxxxxxx.xxx > >> binddn cn=foo,ou=bar,dc=xxxxxxx,dc=xxx > >> bindpw xxxxxxxx > >> rootbinddn cn=foo,ou=bar,dc=xxxxxxx,dc=xxx > >> scope sub > >> timelimit 30 > >> bind_timelimit 30 > >> bind_policy soft > >> idle_timelimit 3600 > >> pam_check_host_attr yes > >> nss_base_passwd dc=xxxxxxx,dc=net?sub > >> nss_base_shadow dc=xxxxxxx,dc=net?sub > >> pam_password clear > >> nss_base_group ou=Group,dc=xxxxxxx,dc=xxx?one > >> TLS_REQCERT request > >> TLS_CACERT /usr/local/etc/openldap/certs/cacert.pem > >> > >> The two previous servers did not have this particular problem. They > >> were not identical hardware, but identical os install and config, > >> > >> Any clues? > > --- > > I don't generally use nscd any longer but since it is a dynamic system, > > why not just stop nscd and delete the db and then restart nscd service > > since it is certain to recreate it? (or perhaps move it out of the way > > to be safe)... > > > > /sbin/service nscd stop > > mv /var/db/nscd/* /tmp > > /sbin/service nscd start > > > > I tried deleting the db files on one of th boxes after seeing this on the web, but nscd > segfaulted less than half an hour later. This problem seems to happen only > with x86_64 boxes. Another box here is x86_32 and has no issues with nscd. > > I would like to drop this service but there are critical apps that require > it since authentication comes through openldap. It does not seem to be hardware specific > since the two x86_64 boxes have different mobo, one abit and one asus. > > The logger is turned on for nscd but nothing looks unusual in them, and it has been > difficult finding which pid precedes the segfault. > > Can malformed addresses cause nscd to segfault? ---- I don't know the answer to that but it would seem that if that were the case, the problem would exist with i386 version. I suppose you will have to attach an strace to the pid and then create a bugzilla entry with attached strace - probably on the upstream provider. As for 'critical apps that require' nscd...I don't personally know of any and if we are talking about CentOS-5 which has 2.3.27 version of openldap...the 2.3.x versions are very fast and I'm not certain that nscd is of all that much benefit (but I don't know because I have never tested it out). -- Craig White <craig at tobyhouse.com>