[CentOS] pam_ldap and nss_ldap failover

Fri Mar 20 00:19:20 UTC 2009
Paul Heinlein <heinlein at madboa.com>

I'm (finally) getting around to putting a backup LDAP authentication 
server on my network. The backup uses syncrepl to grab the database, 
and to my eyes both LDAP servers answer read queries identically.

I'm testing the client side of this configuration on virtual CentOS 5 
i386 machine. /etc/ldap.conf reads

----- %< -----
base dc=DOMAIN,dc=com
timelimit 30
bind_timelimit 30
idle_timelimit 300
nss_initgroups_ignoreusers root,ldap,named,[... trimmed ...]
uri ldap://ldap1.DOMAIN.com ldap://ldap2.DOMAIN.com
ssl start_tls
tls_cacertdir /etc/openldap/cacerts
pam_password md5
----- %< -----

The client will bind to whichever server is listed first after the 
'uri' directive. In the config snippet, it's 'ldap1' -- but it works 
the other way too.

If the first-listed server goes away, the client never seems to try to 
find or bind to the second-listed server (where "never" == my 
patience limit of about an hour). Once the first-listed server goes 
away, all password authentication fails, though getent passwd and 
getent group still work (presumably because of nscd).

Has anyone else experienced this or, more importantly, figured out a 
way to get failover to work in a reasonable timeframe?

