[CentOS] Please shed a light: when sssd will return from offline to online?

Tue Mar 5 07:59:28 UTC 2013
Gelen James <hahaha_30k at yahoo.com>

Hi all,

 I'm new to sssd configs and debugging. Recently we have encountered some problems with sssd. Basically 6 out of 50 servers has 'getent passwd' lost all userIDs from LDAP backend, while others are OK. 

My sssd is at version 1.8.0-32. the related error messages are attached below. The sssd_nss seems got killed by temporarily network connection problems to backend openLDAP servers. Wonder why? and can we change the backend retry check interval? (see the timestamps for log entries in sssd_nss.log).

[root at testbox sssd]# cat sssd_nss.log 
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_dp_init] (0x0010): Failed to connect to monitor services.
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_process_init] (0x0010): fatal error setting up backend connector
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_dp_init] (0x0010): Failed to connect to monitor services.
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_process_init] (0x0010): fatal error setting up backend connector
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_dp_init] (0x0010): Failed to connect to monitor services.
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_process_init] (0x0010): fatal error setting up backend connector
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_dp_init] (0x0010): Failed to connect to monitor services.
(Sat Mar  2 02:30:41 2013) [sssd[nss]] [sss_process_init] (0x0010): fatal error setting up backend connector

[root at testbox sssd]# cat sssd_pam.log
(Sat Mar  2 02:30:09 2013) [sssd[pam]] [pam_dp_reconnect_init] (0x0010): Could not reconnect to ldap provider.
(Sat Mar  2 02:30:39 2013) [sssd[pam]] [pam_dp_reconnect_init] (0x0010): Could not reconnect to ldap provider.

[root at testbox sssd]# cat sssd_ldap.log
(Sat Mar  2 02:30:53 2013) [sssd[be[ldap]]] [id_callback] (0x0010): The Monitor returned an error [org.freedesktop.DBus.Error.NoReply]

[root at testbox sssd]# cat sssd.log
(Sat Mar  2 02:30:41 2013) [sssd] [mt_svc_exit_handler] (0x0010): Process [nss], definitely stopped!
[root at testbox sssd]# 

Please shed a light. Thanks a lot.

--Gelen