Hi Everyone,
I'm experiencing the following problem, for which I've not yet found a resolution. It's been discussed elsewhere, but unfortunately nothing actually solves it.
Here's my /etc/ldap.conf file: ################# ldap_version 3 base ou=people,o=xxx uri ldaps://server1.domain.be/ ldaps://server2.domain.be/ bind_policy soft scope sub timelimit 3 bind_timelimit 5 idle_timelimit 120 referrals no ssl start_tls ssl on tls_checkpeer yes tls_cacertdir /etc/openldap/cacerts #################
And the relevant nsswitch: ################# passwd: files ldap shadow: files ldap group: files ldap #################
So that's pretty straight forward. My LDAP systems are running fine, and I can authenticate to them.
However, the problem: when the client boots *without network connectivity*, the server gets stuck/hangs at "Start System Message Bus". I've tracked this down to the following known bug in Redhat, but it dates back to early 2010. https://bugzilla.redhat.com/show_bug.cgi?id=182464#c46
The solution works: if I comment out the "group" from nsswitch to only load from "files" and not from "ldap", it works and the system boots. However, since most systems (and that includes ours) uses groups for management, that's not a viable option.
We're running the very latest 5.6 with all packages (only from the CentOS repo's) up-to-date.
Has anyone else ever solved this to still be able to keep the group ldap entry in nsswitch.conf without having a server hang on boot if there's no network?
Regards, Mattias
On Thu, 28 Apr 2011 16:21:58 +0200 "Mattias Geniar" mattias@nucleus.be wrote:
Here's my /etc/ldap.conf file:
Did you include nss_initgroups_ignoreuser in your /etc/ldap.conf?
nss_initgroups_ignoreusers root,ldap
Brgds
Did you include nss_initgroups_ignoreuser in your /etc/ldap.conf?
nss_initgroups_ignoreusers root,ldap
Brgds
Hi Benjamin,
I tried that, but that just makes it hang upon the next service trying to start (in our case: a zabbix monitoring daemon running as zabbix/zabbix).
It works, if I include the entire list of all "local" users/groups that can be ignored. However, that's not feasible when doing mass-deploys on varied systems.
If there's a way to simply say "ignore all users with UID's < 500" that could be a work-around I can live with, but it doesn't appear there is.
Regards, Mattias
On Thu, 28 Apr 2011, Mattias Geniar wrote:
Did you include nss_initgroups_ignoreuser in your /etc/ldap.conf?
nss_initgroups_ignoreusers root,ldap
Brgds
Hi Benjamin,
I tried that, but that just makes it hang upon the next service trying to start (in our case: a zabbix monitoring daemon running as zabbix/zabbix).
It works, if I include the entire list of all "local" users/groups that can be ignored. However, that's not feasible when doing mass-deploys on varied systems.
If there's a way to simply say "ignore all users with UID's < 500" that could be a work-around I can live with, but it doesn't appear there is.
I'd hope you'd see these problems almost entirely go away in future with a switch to sssd rather than nss_ldap, as it makes the whole process a lot more stateful and aware of what's going on.
Having an rc.local that does an nsswitch.conf twiddle is probably a viciously robust way of dealing with this problem...
jh
On Thu, Apr 28, 2011 at 03:52:44PM +0100, John Hodrien wrote:
On Thu, 28 Apr 2011, Mattias Geniar wrote:
could be a work-around I can live with, but it doesn't appear there is.
I'd hope you'd see these problems almost entirely go away in future with a switch to sssd rather than nss_ldap, as it makes the whole process a lot more stateful and aware of what's going on.
Fear not, Fedora has managed to have that break things for many people too.
I see they just closed the bug with a won't fix, though the fix is known and available.
Having an rc.local that does an nsswitch.conf twiddle is probably a viciously robust way of dealing with this problem...
Unnecessary too. :) See my earlier email.
I might as well give a link to my ldap page, so if anyone else comes across this, they can see the issue mentioned withfix.
http://home.roadrunner.com/~computertaijutsu/ldap.html
On Thu, 28 Apr 2011, Scott Robbins wrote:
On Thu, Apr 28, 2011 at 03:52:44PM +0100, John Hodrien wrote:
On Thu, 28 Apr 2011, Mattias Geniar wrote:
could be a work-around I can live with, but it doesn't appear there is.
I'd hope you'd see these problems almost entirely go away in future with a switch to sssd rather than nss_ldap, as it makes the whole process a lot more stateful and aware of what's going on.
Fear not, Fedora has managed to have that break things for many people too.
I see they just closed the bug with a won't fix, though the fix is known and available.
Having an rc.local that does an nsswitch.conf twiddle is probably a viciously robust way of dealing with this problem...
Unnecessary too. :) See my earlier email.
I might as well give a link to my ldap page, so if anyone else comes across this, they can see the issue mentioned withfix.
bind_policy soft isn't a panacea in my experience. I've had failures that aren't fixed with this (I've had udev go into a world of its own stopping the machine booting).
nss_ldap's just a bit sucky by design. It lacks any caching, and nscd simply isn't in a position to provide it in a sane manner. Performance with large directories and nested groups is terrible unless you completely avoid enumeration of groups which breaks some tools.
jh
On Thu, 28 Apr 2011, Benjamin Hackl wrote:
On Thu, 28 Apr 2011 16:21:58 +0200 "Mattias Geniar" mattias@nucleus.be wrote:
Here's my /etc/ldap.conf file:
Did you include nss_initgroups_ignoreuser in your /etc/ldap.conf?
nss_initgroups_ignoreusers root,ldap
This works:
nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus
-Steve
On Thu, 28 Apr 2011, Steve Thompson wrote:
This works:
nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus
We use a slightly longer version:
nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman
I suspect, however, that the extra users listed in our setup aren't the cause of the hangups...
On Thu, 2011-04-28 at 09:28 -0700, Paul Heinlein wrote:
On Thu, 28 Apr 2011, Steve Thompson wrote:
This works:
nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus
We use a slightly longer version:
nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman
I suspect, however, that the extra users listed in our setup aren't the cause of the hangups...
---- I use the following to prevent hanging at startup with LDAP.
nss_initgroups_ignoreusers root,ldap,bacula,named timelimit 30 bind_timelimit 30 bind_policy soft
This is because some daemons start prior to the start of OpenLDAP service.
Obviously adding haldaemon, dbus, radvd, tomcat, etc. or other 'users' for daemons that launch prior to your LDAP server application is useful but those users would have to be listed in /etc/passwd|group to significantly benefit.
Craig
I use the following to prevent hanging at startup with LDAP.
nss_initgroups_ignoreusers root,ldap,bacula,named timelimit 30 bind_timelimit 30 bind_policy soft
This is because some daemons start prior to the start of OpenLDAP service.
Obviously adding haldaemon, dbus, radvd, tomcat, etc. or other 'users' for daemons that launch prior to your LDAP server application is
useful
but those users would have to be listed in /etc/passwd|group to significantly benefit.
Craig
Hi Craig,
The problem I have with listing those ignoreusers, is you need to know in advance which services are on the system, and that's not always the case. Or if a user installs a new daemon, he'll break his start-up of the server should he ever be unable to connect to the LDAP systems.
Perhaps I'm asking too much, but could anyone try the following config (in a VM or so, with networking disabled)? This is the one that is causing boots to hang indefinitely, even though there are "bind_policy soft" parameters involved.
/etc/ldap.conf ======================================= ldap_version 3 base ou=people,o=company uri ldaps://srv.domain.be/ ldaps://srv2.domain.be/ scope sub timelimit 5 bind_timelimit 5 bind_policy soft idle_timelimit 15 timeout 5
# If the LDAP server is unavailable during boot, don't retry too often # or the system will hang on the System Message Bus service bind_timeout 2 #nss_reconnect_tries 2 #nss_reconnect_sleeptime 1 #nss_reconnect_maxsleeptime 3 #nss_reconnect_maxconntries 2
referrals no
ssl start_tls ssl on tls_checkpeer yes tls_cacertdir /etc/openldap/cacerts
pam_filter objectclass=posixAccount pam_login_attribute uid pam_min_uid 5000 pam_max_uid 6000 #pam_groupdn cn= company -shared,ou=groups,o=company pam_groupdn cn= company -managed,ou=groups,o=company pam_member_attribute memberUid pam_password md5
nss_base_passwd ou=people,o= company nss_base_shadow ou=people,o= company nss_base_group ou=groups,o= company
#debug 255 #logdir /tmp/ =======================================
Or if anyone else can spot an obvious "Dude, why the f#!? did you put in those lines"-error, please inform me. :-)
Thanks everyone for your interest and comments!
Kind regards, Mattias
On Thu, Apr 28, 2011 at 04:21:58PM +0200, Mattias Geniar wrote:
Hi Everyone,
So that's pretty straight forward. My LDAP systems are running fine, and I can authenticate to them.
However, the problem: when the client boots *without network connectivity*, the server gets stuck/hangs at "Start System Message Bus". I've tracked this down to the following known bug in Redhat, but it dates back to early 2010. https://bugzilla.redhat.com/show_bug.cgi?id=182464#c46
Yes, the bug is actually older than that---Don't know if it's only RH based systems (as so many things seem to work everywhere but RH and their offshoots) or ldap. You should be able to fix it by changing /etc/ldap.conf. There is a default commented line in there
#bind_policy hard
Uncomment it, change it to soft. (On the client.) Note this is /etc/ldap.conf--in Fedora, if that's the client, I believe it's now /etc/pam_ldap.conf or possibly /etc/nss_ldap.conf.
I can't find the earlier bug at first glance, but it's FAR older than 2010, and they never bothered to fix it.
Has anyone else ever solved this to still be able to keep the group ldap entry in nsswitch.conf without having a server hang on boot if there's no network?
See above. Darn, I wish I could find that older bug, so that I could go to the newer one you mention and point out that they've been unable to fix it for far longer than a year. :) (I might do it anyway)
Grouchily yours, (Not at you, at RH for being unable to get such a basic thing to work--actually, at one point, Fedora changed bind_policy to soft so that it would work, but now they're back to the broken way.)
Yes, the bug is actually older than that---Don't know if it's only RH based systems (as so many things seem to work everywhere but RH and their offshoots) or ldap. You should be able to fix it by changing /etc/ldap.conf. There is a default commented line in there
#bind_policy hard
Uncomment it, change it to soft. (On the client.) Note this is /etc/ldap.conf--in Fedora, if that's the client, I believe it's now /etc/pam_ldap.conf or possibly /etc/nss_ldap.conf.
I can't find the earlier bug at first glance, but it's FAR older than 2010, and they never bothered to fix it.
Has anyone else ever solved this to still be able to keep the group
ldap
entry in nsswitch.conf without having a server hang on boot if
there's
no network?
See above. Darn, I wish I could find that older bug, so that I could
go
to the newer one you mention and point out that they've been unable to fix it for far longer than a year. :) (I might do it anyway)
Grouchily yours, (Not at you, at RH for being unable to get such a basic thing to work--actually, at one point, Fedora changed
bind_policy
to soft so that it would work, but now they're back to the broken
way.)
-- Scott Robbins
Hi Scott,
In case you're wondering, this is about the oldest entry (2006): https://bugzilla.redhat.com/show_bug.cgi?id=186527
The bind_policy didn't seem to have the wanted effect with me, it kept trying to connect to LDAP server even after 10+ failed attempts, taking 1m50s on each and every attempt.
I read quite a few topics on that solving the issue, but it didn't seem to be that case in my environment. Are there other workarounds/tips if the bind_policy doesn't work? The rc.local hack seems ... ugly ... and embarrassing if a client would ever find it out. :-)
Regards, Mattias
On Thu, Apr 28, 2011 at 05:03:55PM +0200, Mattias Geniar wrote:
Hi Scott,
In case you're wondering, this is about the oldest entry (2006): https://bugzilla.redhat.com/show_bug.cgi?id=186527
The bind_policy didn't seem to have the wanted effect with me, it kept trying to connect to LDAP server even after 10+ failed attempts, taking 1m50s on each and every attempt.
I read quite a few topics on that solving the issue, but it didn't seem to be that case in my environment. Are there other workarounds/tips if the bind_policy doesn't work? The rc.local hack seems ... ugly ... and embarrassing if a client would ever find it out. :-)
Agreed. I've never known that fix to not work though.
(Thanks for the input,will have to add that it doesn't work in all cases t my page).
On Thu, 28 Apr 2011, Mattias Geniar wrote:
I read quite a few topics on that solving the issue, but it didn't seem to be that case in my environment. Are there other workarounds/tips if the bind_policy doesn't work? The rc.local hack seems ... ugly ... and embarrassing if a client would ever find it out. :-)
Automatic generation of the nss_initrgroups_ignoreusers line on boot? A creative patch to nss_ldap?
Current versions of sssd look really promising to me (I tested against a candidate for RHEL 6.1), and offer workable performance compared to a heavily hacked nss_ldap against a large LDAP tree (much better than an unmodified nss_ldap).
I also seemed to recall that bind_policy soft potentially opened you up to security issues. An allow all, deny denied-people would let someone in if ldap timed out. Variations on that would presumably leak if you throw nscd into the mix.
Newer versions of nss_ldap support nss_initgroups_minimum_uid 500, so presumably that has a good chance of solving your problem.
jh
--On Thursday, April 28, 2011 10:53:52 AM -0400 Scott Robbins scottro@nyc.rr.com wrote:
On Thu, Apr 28, 2011 at 04:21:58PM +0200, Mattias Geniar wrote:
I've tracked this down to the following known bug in Redhat, but it dates back to early 2010. https://bugzilla.redhat.com/show_bug.cgi?id=182464#c46
Yes, the bug is actually older than that
*sigh*
Yes, I've been tripping up on this one, on and off, since 2006 in FC5.
AFAIK, nobody ever looked into my strace comment of https://bugzilla.redhat.com/show_bug.cgi?id=182464#c10, although https://bugzilla.redhat.com/show_bug.cgi?id=182464#c46 (four years later) seems related. Probably moot now anyway as nobody is interested in fixing it since sssd will cure all ills and bring world peace. (Insert sarcasm/skepticism as appropriate.)
Be aware that "bind_policy soft" may have some undesirable consequences, depending on your environment. For example, if you have a mail server that does user lookup based on ldap and your ldap server goes away (before or after the mail server boots), then while your ldap server is offline you can get mail bouncing permanently with "no such user" rather than temporarily with "system not available" -type messages.
Mitigation strategies that I've done in the past include: 1. never using 'bind_policy soft' 2. having at least one replica LDAP server (which is a good idea anyway) 3. putting LDAP on a machines which themselves are not LDAP clients, thus ensuring that although clients may get blocked on boot that the LDAP server itself does not
In recent CentOS 5 versions, I've had much better luck avoiding (3) as long as, using system-config-authentication, one enables "Local authorization is sufficient for local users" under the Options tab.
And for the record, despite this particularly annoying bug, I'm still a strong advocate of using LDAP for user and group provisioning.
Devin
On Fri, 29 Apr 2011, Devin Reade wrote:
Probably moot now anyway as nobody is interested in fixing it since sssd will cure all ills and bring world peace. (Insert sarcasm/skepticism as appropriate.)
I'd probably argue that nss_ldap is fundamentally unfixable. Why *not* get behind sssd? Have you given a recent version a try?
jh
I'd probably argue that nss_ldap is fundamentally unfixable. Why
*not* get
behind sssd? Have you given a recent version a try?
jh
Understandable, but since a lot of people are still going to stick with CentOS 4/5 for legacy reasons, I would argue that nss_ldap is still worth "fixing".
It's not as fancy as sssd of course, but it's what people are using right now. :-)
Regards, Mattias
On Tue, 3 May 2011, Mattias Geniar wrote:
Understandable, but since a lot of people are still going to stick with CentOS 4/5 for legacy reasons, I would argue that nss_ldap is still worth "fixing".
I'm not saying it's not worth fixing, I suspect it's fundamentally unfixable without a complete redesign.
It's not as fancy as sssd of course, but it's what people are using right now. :-)
Too much assumes that NSS information is quick and reliable. Lots of it seems to be designed around the assumption that random queries are expensive, and reading through the whole password file is cheap. nscd then perches on top of this and tries to paper over the fact this is all untrue.
Throw nss_ldap at a big tree (~85k users, and an equally large number of groups) and watch it suffer horribly. Watch it take minutes to decide whether or not I should be allowed to login (even where that access control list is a local group). Throw nscd into the mix. Watch it do one query through nscd, but then time out as it assumes nscd is broken, so does the whole query again missing out nscd. Wait until nscd eventually crashes under the strain...
sssd answers a lot of these questions. It's definitely not a perfect replacement yet, but it's going in the right direction if you ask me.
jh
On May 3, 2011, at 4:52 AM, John Hodrien wrote:
On Tue, 3 May 2011, Mattias Geniar wrote:
Understandable, but since a lot of people are still going to stick with CentOS 4/5 for legacy reasons, I would argue that nss_ldap is still worth "fixing".
I'm not saying it's not worth fixing, I suspect it's fundamentally unfixable without a complete redesign.
It's not as fancy as sssd of course, but it's what people are using right now. :-)
sssd answers a lot of these questions. It's definitely not a perfect replacement yet, but it's going in the right direction if you ask me.
So whats the answer today for ~10K users?
The bug fixes suggested here work around the problems I have been encountering.
Can any one comment on what ppl are using for larger deployments? I hope its not a resounding M$ AD?!
- aurf
On Tue, 3 May 2011, aurfalien@gmail.com wrote:
So whats the answer today for ~10K users?
The bug fixes suggested here work around the problems I have been encountering.
Well that's good then.
Can any one comment on what ppl are using for larger deployments? I hope its not a resounding M$ AD?!
I use a lightly patched nss_ldap and it's far from terrible. I'm forced to either use nss_getgrent_skipmembers or limit the number of groups it can see by localising it to a specific OU, as the performance becomes unworkable otherwise. I've additionally patched it to improve performance against our tree by optimising some of the queries using site specific details.
nss_getgrent_skipmembers is not without downsides, but if it's tolerable in your situation it'll get you the best performance.
In my case, the server end is indeed AD.
It's been considerably faster and more stable than using winbind.
jh
On 05/03/2011 10:43 AM, aurfalien@gmail.com wrote:
Can any one comment on what ppl are using for larger deployments? I hope its not a resounding M$ AD?!
Use sssd. It's now included in CentOS 5.
On Thu, 5 May 2011, Gordon Messmer wrote:
On 05/03/2011 10:43 AM, aurfalien@gmail.com wrote:
Can any one comment on what ppl are using for larger deployments? I hope its not a resounding M$ AD?!
Use sssd. It's now included in CentOS 5.
Included doesn't necessarily mean usable though. I might be out of date on this, but I thought when I looked at it that it didn't handle nested groups. That made it pretty much pointless for me. It took upgrading to a pre-release of a 6.1 RPM to get something I could use.
jh