Hi Friends,
We are running some of the Centos 5 32 bit, 5.2 64-bit systems. These systems are ldap clients and the ldap server is Windows 2003 Server. Sometimes 1 or 2 services on these servers sucks 100% cpu and the load becomes high on the server.
Below is an example where one the httpd process was eating 100% cpu and we took dump of this process
gcore 17711
Core was generated by `/usr/sbin/httpd'.
#0 0x00002ad1849cd997 in ldap_chase_v3referrals () from /usr/lib64/libldap-2.3.so.0
(gdb) bt full
#0 0x00002ad1849cd997 in ldap_chase_v3referrals () from /usr/lib64/libldap-2.3.so.0
No symbol table info available.
#1 0x00002ad1849bc4dd in ldap_msgdelete () from /usr/lib64/libldap-2.3.so.0
No symbol table info available.
#2 0x00002ad1849bceb0 in ldap_result () from /usr/lib64/libldap-2.3.so.0
No symbol table info available.
/etc/ldap.conf file
host dc.example.com base ou=users,dc=example,dc=com binddn cn=ldap,ou=extra accounts,dc=example,dc=com bindpw QrQcepFKHR6wGNXu4 scope sub ssl no nss_base_passwd dc=example,dc=com?sub nss_base_shadow dc=example,dc=com?sub nss_base_group dc=example,dc=com?sub nss_map_objectclass posixAccount user nss_map_objectclass shadowAccount user nss_map_attribute uid sAMAccountName nss_map_attribute uidNumber UidNumber nss_map_attribute gidNumber GidNumber nss_map_attribute loginShell LoginShell nss_map_attribute gecos name nss_map_attribute userPassword unixUserPassword nss_map_attribute homeDirectory unixHomeDirectory nss_map_objectclass posixGroup Group nss_map_attribute uniqueMember msSFU30PosixMember nss_map_attribute cn cn pam_login_attribute sAMAccountName pam_filter objectclass=user pam_password md5 timelimit 0 sizelimit 0 tls_cacertdir /etc/openldap/cacerts
There are 2 bugs listed on the redhat site but no solution for this problem has been provided.
https://bugzilla.redhat.com/show_bug.cgi?id=222667
https://bugzilla.redhat.com/show_bug.cgi?id=474181
Thanks & Regards
Ankush
Hello,
On Thu, Jan 22, 2009 at 01:41, ankush grover ankushcentos@gmail.com wrote:
We are running some of the Centos 5 32 bit, 5.2 64-bit systems. These systems are ldap clients and the ldap server is Windows 2003 Server.
How exactly? Are you using nss_ldap to get user ids from AD? Are you authenticating to AD using LDAP? What are the lines that contain "ldap" in your /etc/nsswitch.conf? What are the lines that contain "pam_ldap.so" in your /etc/pam.d/system-auth and the other files in that directory?
Sometimes 1 or 2 services on these servers sucks 100% cpu and the load becomes high on the server.
Only Apache or other daemons as well?
Below is an example where one the httpd process was eating 100% cpu and we took dump of this process
Do you have any LDAP authentication configured in Apache? Or any other kind of authentication (PAM? System?) that might end up being served by LDAP? Do you have an application, such as a PHP application that would run inside an Apache process, that might be using LDAP?
#0 0x00002ad1849cd997 in ldap_chase_v3referrals () from /usr/lib64/libldap-2.3.so.0
Looks like it's getting in a loop of referrals, but it's hard to tell for sure from one backtrace only.
You could try to get several backtraces and see if it's all the time in that same function, that might indicate a loop.
Can you get a log of queries that the LDAP server is receiving (if it is receiving LDAP queries at all while your process is in that loop)? Can you use tcpdump to determine if you get a lot of LDAP traffic and if the traffic stops when you kill the process?
Can you see what that Apache process was serving at that time, using /server-status or something like it? That might give you a clue of why the problem appeared.
/etc/ldap.conf file: [...] timelimit 0 sizelimit 0
Did you try to increase those?
There are 2 bugs listed on the redhat site but no solution for this problem has been provided. https://bugzilla.redhat.com/show_bug.cgi?id=222667 https://bugzilla.redhat.com/show_bug.cgi?id=474181
These do not seem related to your problem, as they report processes that hang in a deadlock, which is not your case. If that would have been your case, the process would be using 0% CPU instead of 100% CPU.
HTH, Filipe