[CentOS] dbus/systemd failure on startup (CentOS 7.7)

Thu Jan 23 13:23:02 UTC 2020
James Pearson <james-p at moving-picture.com>

Simon Matter via CentOS wrote:
> 
>> We are seeing a problem that occurs ~5% of the time when rebooting
> 
> I see such issues on a quite large multi user system but when this
> happens, after forced restarts for kernel updates, I usually don't have
> the time to analyze and play doctor on it. My "solution" now is to simply
> reboot the server again in such a case, AKA the systemd way :-)
> 
>> CentOS 7.7 where systemd gets a 'Connection timed out' to D-Bus just
>> after the D-Bus service starts - from 'journalctl -x' :
>>
>> ...
>> Jan 21 16:09:59 linux7-7.mpc.local systemd[1]: Started D-Bus System
>> Message Bus.
>> -- Subject: Unit dbus.service has finished start-up
>> -- Defined-By: systemd
>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>> --
>> -- Unit dbus.service has finished starting up.
>> --
>> -- The start-up result is done.
>> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to register match
>> for Disconnected message: Connection timed out
>> Jan 21 16:10:24 linux7-7.mpc.local systemd[1]: Failed to initialize
>> D-Bus connection: Connection timed out
>> ...
>>
>> This then has a knock-on effect that causes other services to fail - e.g.
>>
>> -- Unit gdm.service has begun starting up.
>> Jan 21 16:10:39 linux7-7.mpc.local dbus[817]: [system] Activating
>> systemd to hand-off: service name='org.freedesktop.login1'
>> unit='dbus-org.freedesktop.login1.service'
>> Jan 21 16:10:50 linux7-7.mpc.local dbus[817]: [system] Failed to
>> activate service 'org.freedesktop.systemd1': timed out
>> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to
>> enable subscription: Failed to activate service
>> 'org.freedesktop.systemd1': timed out
>> Jan 21 16:10:50 linux7-7.mpc.local systemd-logind[1221]: Failed to fully
>> start up daemon: Connection timed out
>> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: systemd-logind.service:
>> main process exited, code=exited, status=1/FAILURE
>> Jan 21 16:10:50 linux7-7.mpc.local systemd[1]: Failed to start Login
>> Service.
>> -- Subject: Unit systemd-logind.service has failed
>> -- Defined-By: systemd
>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>> --
>> -- Unit systemd-logind.service has failed.
>> --
>> -- The result is failed.
>>
>> Whatever the issue is, it appears that polkit might be involved - if we
>> restart the polkit service, things appear to return to normal (e.g. gdm
>> starts up etc)
>>
>> We can't find any similar reports of this happening elsewhere with
>> CentOS 7.7 - but we were wondering if anyone else had come across a
>> problem like this?
> 
> I think the root of the problem is that there are missing definitions in
> some of the systemd scripts. They allow things to work in 95% or greater
> of the cases but this happens by chance, not because of perfect process
> handling and system control. Small delays somewhere or uncommon system
> environments then lead to intermittent failures which are difficult to
> diagnose - at least for me.
> 
> The good news is that you can just fiddle with the systemd scripts the
> same way we fiddled with init scripts in the past. That way you can try
> and error until you find a solution. Doesn't sound like being in full
> control of things but better than not finding a solution at all.

Yeah, we found that by introducing a small delay before the ExecStart in 
the dbus.service unit - even a delay of just 0.01 seconds (via 
'ExecStartPre=/usr/bin/sleep 0.01') _seems_ to workaround the issue ...

However, we would still like to know what the issue is and get a 'real' 
fix - I guess we could try creating a bug report with Redhat ...

Thanks

James Pearson