[CentOS] dbus/systemd failure on startup (CentOS 7.7)

Thu Jan 23 12:09:04 UTC 2020
Simon Matter <simon.matter at invoca.ch>

> We are seeing a problem that occurs ~5% of the time when rebooting

I see such issues on a quite large multi user system but when this
happens, after forced restarts for kernel updates, I usually don't have
the time to analyze and play doctor on it. My "solution" now is to simply
reboot the server again in such a case, AKA the systemd way :-)

> CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just
> after the D-Bus service starts - from 'journalctl -x' :
>
> ...
> Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System
> Message Bus.
> -- Subject: Unit dbus.service has finished start-up
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit dbus.service has finished starting up.
> --
> -- The start-up result is done.
> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match
> for Disconnected message: Connection timed out
> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize
> D-Bus connection: Connection timed out
> ...
>
> This then has a knock-on effect that causes other services to fail - e.g.
>
> -- Unit gdm.service has begun starting up.
> Jan 21 16:10:39 linux7-7.mpc.local dbus[817]: [system] Activating
> systemd to hand-off: service name='org.freedesktop.login1'
> unit='dbus-org.freedesktop.login1.service'
> Jan 21 16:10:50 linux7-7.mpc.local dbus[817]: [system] Failed to
> activate service 'org.freedesktop.systemd1': timed out
> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to
> enable subscription: Failed to activate service
> 'org.freedesktop.systemd1': timed out
> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to fully
> start up daemon: Connection timed out
> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: systemd-logind.service:
> main process exited, code=exited, status=1/FAILURE
> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: Failed to start Login
> Service.
> -- Subject: Unit systemd-logind.service has failed
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit systemd-logind.service has failed.
> --
> -- The result is failed.
>
> Whatever the issue is, it appears that polkit might be involved - if we
> restart the polkit service, things appear to return to normal (e.g. gdm
> starts up etc)
>
> We can't find any similar reports of this happening elsewhere with
> CentOS 7.7 - but we were wondering if anyone else had come across a
> problem like this?

I think the root of the problem is that there are missing definitions in
some of the systemd scripts. They allow things to work in 95% or greater
of the cases but this happens by chance, not because of perfect process
handling and system control. Small delays somewhere or uncommon system
environments then lead to intermittent failures which are difficult to
diagnose - at least for me.

The good news is that you can just fiddle with the systemd scripts the
same way we fiddled with init scripts in the past. That way you can try
and error until you find a solution. Doesn't sound like being in full
control of things but better than not finding a solution at all.

Regards,
Simon