Hello all,
I've got an odd problem that doesn't seem to be mentioned anywhere.
I have several identical CentOS 7 servers (GCE instances). I recently ran `yum update` and rebooted all of them. All the servers came back fine except one. I opened a connection to the serial console of the broken server, and was greeted with this prompt:
… Cannot open access to console, the root account is locked. See sulogin(8) man page for more details.
Press Enter to continue.
I pressed Enter, and the boot process continued successfully! Just to test, I restarted the server again, and the same thing happened: I had to manually log in to the console and press Enter before it would complete the boot process. No other step was required.
I also tried creating a snapshot of the disk, and booted a new VM with a boot disk imaged from the snapshot. The same problem occurred, pressing Enter was all that was required.
Earlier in the boot log, it shows "Started Emergency Shell", which is why there was a "Press Enter" prompt. The error that the "root account is locked" isn't the issue; the root account is locked on all our servers.
So, in summary: something is causing the server to enter an Emergency Shell, but continues successfully after pressing Enter. That's odd because if Emergency Shell is loaded, usually something more serious is happening and requires more actions to get the server to boot.
Anybody have any idea what could be causing this?
I don't see any significant errors in the boot log, but I would appreciate if anyone has a moment to help me look for issues. Here's a copy of the serial console boot log – you can find the "Press Enter to continue" on line 536: https://write.as/dwuts24dcw6yh0kf.txt
Thanks!
Quinn
Hi Quinn,
Am Do., 10. Sept. 2020 um 04:49 Uhr schrieb Quinn Comendant < quinn@strangecode.com>:
[...] I don't see any significant errors in the boot log, but I would appreciate if anyone has a moment to help me look for issues. Here's a copy of the serial console boot log – you can find the "Press Enter to continue" on line 536: https://write.as/dwuts24dcw6yh0kf.txt [...]
If I'm not mistaken, problems after UTMP point to problems with X/ hardware configuration. So I guess you might find more information when you also have a look at the log files of systemd.
Kind regards Thomas
Hi Thomas,
On 10 Sep 2020 10:06:01, Thomas Bendler wrote:
If I'm not mistaken, problems after UTMP point to problems with X/ hardware configuration. So I guess you might find more information when you also have a look at the log files of systemd.
I don't see any hardware issues. Here's the output from `journalctl -p 5 -xb`: https://write.as/2vjgz6pfmopg7fnf.txt The time of the last interruption during boot was at Sep 10 15:01:46.
Thanks, Quinn
I had similar issue on 7.6 - the LVM timeouts were too short and it was timing out as we had a lot of multipath devices. Once those were up , you could just continue.
journalctl will show you what has happened.
Best Regards, Strahil Nikolov
В четвъртък, 10 септември 2020 г., 18:57:02 Гринуич+3, Quinn Comendant quinn@strangecode.com написа:
Hi Thomas,
On 10 Sep 2020 10:06:01, Thomas Bendler wrote:
If I'm not mistaken, problems after UTMP point to problems with X/ hardware configuration. So I guess you might find more information when you also have a look at the log files of systemd.
I don't see any hardware issues. Here's the output from `journalctl -p 5 -xb`: https://write.as/2vjgz6pfmopg7fnf.txt The time of the last interruption during boot was at Sep 10 15:01:46.
Thanks,
Quinn _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Hi Strahil,
On 10 Sep 2020 17:42:03, Strahil Nikolov via CentOS wrote:
I had similar issue on 7.6 - the LVM timeouts were too short and it was timing out […]
I don't see any timeout errors in the boot log or output from journalctl -xb.
I've tried increasing the timeout by updating the Grub config with `mount.timeout=300s`, is that the correct way to increase LVM timeouts? It has no effect.
Quinn
Update: I found a workaround to prevent entering emergency shell during boot for no reason. I've simply cleared the `OnFailure=` option for initrd-parse-etc.service (which was previously set to `OnFailure=emergency.target`).
Now the server boots successfully without dropping into an emergency shell.
This is a total hack, and I'm a little embarrassed that it's the only solution that I've found.
As I mentioned earlier, there are no errors printed in the boot or systemd logs, so I don't know what is actually failing. Well, at least now I know that it is `initrd-parse-etc.service` that is failing, but I don't know why. Does anyone know what initrd-parse-etc.service does? Or have suggestions how to troubleshoot that unit specifically?
Thanks, Quinn
On 9/11/20 4:51 PM, Quinn Comendant wrote:
Does anyone know what initrd-parse-etc.service does? Or have suggestions how to troubleshoot that unit specifically?
Run "systemctl daemon-reload && echo success" and verify that it reports success, and not errors.
Check the output of "systemctl status initrd-cleanup" too.
On 11 Sep 2020 17:23:00, Gordon Messmer wrote:
Run "systemctl daemon-reload && echo success" and verify that it reports success, and not errors.
Check the output of "systemctl status initrd-cleanup" too.
Those have always reported success (even before I removed the OnFailure option):
[~] sudo systemctl daemon-reload && echo success success [~] sudo systemctl status initrd-cleanup ● initrd-cleanup.service - Cleaning Up and Shutting Down Daemons Loaded: loaded (/usr/lib/systemd/system/initrd-cleanup.service; static; vendor preset: disabled) Active: inactive (dead)
Sep 11 23:34:01 durian systemd[1]: Starting Cleaning Up and Shutting Down Daemons... Sep 11 23:34:01 durian systemd[1]: Stopped Cleaning Up and Shutting Down Daemons.
On 11 Sep 2020 17:23:00, Gordon Messmer wrote:
Run "systemctl daemon-reload && echo success" and verify that it reports success, and not errors.
Check the output of "systemctl status initrd-cleanup" too.
Those have always reported success (even before I removed the OnFailure option):
[~] sudo systemctl daemon-reload && echo success success [~] sudo systemctl status initrd-cleanup ● initrd-cleanup.service - Cleaning Up and Shutting Down Daemons Loaded: loaded (/usr/lib/systemd/system/initrd-cleanup.service; static; vendor preset: disabled) Active: inactive (dead)
Sep 11 23:34:01 durian systemd[1]: Starting Cleaning Up and Shutting Down Daemons... Sep 11 23:34:01 durian systemd[1]: Stopped Cleaning Up and Shutting Down Daemons.
Hi,
I'm wondering what the proper solution is in this case. One thing I learned in the past and can also be learned from the list archives is that a lot of issues exist with systemd but almost never one really finds a good solution to fix the problem.
In most cases ugly hacks and workarounds are used but no real fix is available. IMHO it's in no way better than old days SysVinit hacking :-)
Regards, Simon
On 9/11/20 5:29 PM, Quinn Comendant wrote:
Those have always reported success (even before I removed the OnFailure option):
In that case, I'd revert the change you made, unlock the root account so that you can use the emergency shell, let the system boot to an emergency shell, and collect the output of "systemctl status initrd-parse-etc.service" and "journalctl -b 0".
(You can still do that in the VM, right?)