[CentOS] polkit helper timeout and defunct pkla-check-authorization processes on CentOS 7.3

Fri Mar 24 14:05:19 UTC 2017
Edgecombe, Jason <jwedgeco at uncc.edu>

Hi everyone,

I'm replying to myself to help anyone else who happens to get the polkit
timeouts. Our CentOS7 machines are joined to our Active Directory domain
and use AD for authentication and account lookups (Using the SSSD AD
provider). We're NOT using FreeIPA. The polkit timeouts were caused by sssd
taking too long to respond to user information lookups for users that were
in Active Directory.

The solution was to set "enumerate = False" in /etc/sssd/sssd.conf and
restart the sssd service or reboot the machine. If "enumerate" is not
present in sssd.conf, then it defaults to False.

In addition to the polkit hangs, we were also experiencing the following
problems, which went away or improved after the sssd.conf change was made:

   - Running "id $USERNAME" was taking many seconds when looking up users
   in Active Directory
   - Logins were taking a while (5+ seconds) or would just hang
   - Unlocking a machine from the screensaver would sometimes fail.
   - General system sluggishness.
   - High system CPU/load with no obvious culprits according to "top"
   - The sssd_be process was often taking 5% or more of a CPU.

The problems were more prevalent on our big time-sharing systems (64
cores/512GB RAM),  that have multiple (15+) simultaneous users running
large memory or CPU interactive jobs. The problems also hit some of our
single-user workstations, but the most-affected users were still our
compute-heavy research users.

For others' reference, I'm also running the sssd cache on a tmpfs
filesystem and here is my sanitized sssd.conf file.:

> [sssd]
> config_file_version = 2
> services = nss, pam
> domains = subdomain.example.com
>
> [domain/subdomain.example.com]
> ad_domain = subdomain.example.com
> krb5_realm = subdomain.example.com
> realmd_tags = manages-system joined-with-samba
> cache_credentials = True
> id_provider = ad
> auth_provider = ad
> chpass_provider = ad
>
> enumerate = False
> access_provider = ad
> krb5_store_password_if_offline = True
> ldap_id_mapping = False
> use_fully_qualified_names = False
> krb5_renewable_lifetime = 60d
> krb5_lifetime = 60d
> krb5_renew_interval = 600s
>

Sincerely,
Jason

---------------------------------------------------------------------------
Jason Edgecombe | Linux Administrator
UNC Charlotte | The William States Lee College of Engineering
9201 University City Blvd. | Charlotte, NC 28223-0001
Phone: 704-687-1943
jwedgeco at uncc.edu | http://engr.uncc.edu |  Facebook
---------------------------------------------------------------------------
If you are not the intended recipient of this transmission or a person
responsible for delivering it to the intended recipient, any disclosure,
copying, distribution, or other use of any of the information in this
transmission is strictly prohibited. If you have received this transmission
in error, please notify me immediately by reply e-mail or by telephone at
704-687-1943.  Thank you.

On Fri, Mar 10, 2017 at 1:01 PM, Edgecombe, Jason <jwedgeco at uncc.edu> wrote:

> Hi everyone,
>
> We seem to be having issues on multiple CentOS 7.3 machines. The problem
> seems to revolve around polkitd. At some random time, polkitd seems to stop
> responding on my systems. Along with this, there might be hundreds of
> defunct pkla-check-authorization processes. If I reboot, then things are
> fine for a while.
>
> I don't see any activity in the unabridged journal to suggest anything
> that might be triggering polkitd. The puppet run finished 5 minutes before
> polkitd lost it's head.
>
> polkit version is polkit-0.112-11.el7_3.x86_64
>
> Any help is appreciated.
>
> Thanks,
> Jason
>
> Here is some condensed output from the "journalctl -u polkit" command:
> Mar 09 04:02:14 myhost systemd[1]: Starting Authorization Manager...
> Mar 09 04:02:14 myhost polkitd[1018]: Started polkitd version 0.112
> Mar 09 04:02:14 myhost polkitd[1018]: Loading rules from directory
> /etc/polkit-1/rules.d
> Mar 09 04:02:14 myhost polkitd[1018]: Loading rules from directory
> /usr/share/polkit-1/rules.d
> Mar 09 04:02:14 myhost polkitd[1018]: Finished loading, compiling and
> executing 7 rules
> Mar 09 04:02:14 myhost systemd[1]: Started Authorization Manager.
> Mar 09 04:02:14 myhost polkitd[1018]: Acquired the name
> org.freedesktop.PolicyKit1 on the system bus
> Mar 09 04:02:53 myhost polkitd[1018]: Registered Authentication Agent for
> unix-session:c1 (system bus name :1.41 [gnome-shell --mode=gdm], object
> path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
> Mar 09 04:08:25 myhost polkitd[1018]: Reloading rules
> Mar 09 04:08:25 myhost polkitd[1018]: Collecting garbage unconditionally...
> Mar 09 04:08:25 myhost polkitd[1018]: Loading rules from directory
> /etc/polkit-1/rules.d
> Mar 09 04:08:25 myhost polkitd[1018]: Loading rules from directory
> /usr/share/polkit-1/rules.d
> Mar 09 04:08:25 myhost polkitd[1018]: Finished loading, compiling and
> executing 8 rules
> Mar 09 04:08:25 myhost polkitd[1018]: Reloading rules
> Mar 09 04:08:25 myhost polkitd[1018]: Collecting garbage unconditionally...
> Mar 09 04:08:25 myhost polkitd[1018]: Loading rules from directory
> /etc/polkit-1/rules.d
> Mar 09 04:08:25 myhost polkitd[1018]: Loading rules from directory
> /usr/share/polkit-1/rules.d
> Mar 09 04:08:25 myhost polkitd[1018]: Finished loading, compiling and
> executing 8 rules
> Mar 09 04:08:53 myhost polkitd[1018]: Reloading rules
> ... (snipped more rules loading)
> Mar 09 04:10:39 myhost polkitd[1018]: Collecting garbage unconditionally...
> Mar 09 04:10:39 myhost polkitd[1018]: Loading rules from directory
> /etc/polkit-1/rules.d
> Mar 09 04:10:39 myhost polkitd[1018]: Loading rules from directory
> /usr/share/polkit-1/rules.d
> Mar 09 04:10:39 myhost polkitd[1018]: Finished loading, compiling and
> executing 7 rules
> Mar 09 16:59:42 myhost polkitd[1018]: /etc/polkit-1/rules.d/49-polkit-pkla-compat.rules:21:
> Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark,
> 24)
> Mar 09 16:59:42 myhost polkitd[1018]: Error evaluating authorization rules
> Mar 10 04:13:34 myhost polkitd[1018]: /etc/polkit-1/rules.d/49-polkit-pkla-compat.rules:21:
> Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark,
> 24)
> Mar 10 04:13:34 myhost polkitd[1018]: Error evaluating authorization rules
> Mar 10 04:14:32 myhost polkitd[1018]: /etc/polkit-1/rules.d/49-polkit-pkla-compat.rules:21:
> Error: Error spawning helper: Timed out after 10 seconds (g-io-error-quark,
> 24)
> ... (snipped more lines about error evaluating rules)...
>
> ------------------------------------------------------------
> ---------------
> Jason Edgecombe | Linux Administrator
> UNC Charlotte | The William States Lee College of Engineering
> 9201 University City Blvd. | Charlotte, NC 28223-0001
> Phone: 704-687-1943
> jwedgeco at uncc.edu | http://engr.uncc.edu |  Facebook
> ------------------------------------------------------------
> ---------------
> If you are not the intended recipient of this transmission or a person
> responsible for delivering it to the intended recipient, any disclosure,
> copying, distribution, or other use of any of the information in this
> transmission is strictly prohibited. If you have received this transmission
> in error, please notify me immediately by reply e-mail or by telephone at
> 704-687-1943.  Thank you.
>