Hi,
Such a simple problem, but I can't figure out the cause. Supermicro server with a Xeon E3-1200 cpu. 1U entry level item.
Using CentOS 7
from ~$root --- systemctl reboot
Server disconnects my ssh connection and never comes back up. Go to the server and the power is on but the server is not accessible by ssh. When I connect a monitor and keyboard --- non-responsive. It's like it's in suspend mode.
I push and hold the power button until the server fully powers down. Push power again and everything boots, goes to prompt, and all is well.
When I try systemctl reboot directly on the server. Same problem --- does not start to login prompt.
Manually power down and power up again --- works and all is well.
Anyone have this problem before? I've checked all the BIOS options and I can't find anything misconfigured.
Thanks for your help.
Mike
Hi,
I have this problem!
Try:
# shutdown -r now
For a test, please...
2017-10-14 14:54 GMT-03:00 Mike 1100100@gmail.com:
Hi,
Such a simple problem, but I can't figure out the cause. Supermicro server with a Xeon E3-1200 cpu. 1U entry level item.
Using CentOS 7
from ~$root --- systemctl reboot
Server disconnects my ssh connection and never comes back up. Go to the server and the power is on but the server is not accessible by ssh. When I connect a monitor and keyboard --- non-responsive. It's like it's in suspend mode.
I push and hold the power button until the server fully powers down. Push power again and everything boots, goes to prompt, and all is well.
When I try systemctl reboot directly on the server. Same problem --- does not start to login prompt.
Manually power down and power up again --- works and all is well.
Anyone have this problem before? I've checked all the BIOS options and I can't find anything misconfigured.
Thanks for your help.
Mike _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On Sat, Oct 14, 2017 at 2:29 PM, Vitalino Victor vitalinobr@gmail.com wrote:
Try:
# shutdown -r now
I'll have to try this late one evening. It's a production Samba Active Directory Domain Controller in production so it's difficult to do this without warning to users.
On 15 October 2017 at 12:20, Mike 1100100@gmail.com wrote:
On Sat, Oct 14, 2017 at 2:29 PM, Vitalino Victor vitalinobr@gmail.com wrote:
Try:
# shutdown -r now
I'll have to try this late one evening. It's a production Samba Active Directory Domain Controller in production so it's difficult to do this without warning to users. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Don't bother ... it makes no difference to how the shutdown happens, this was nonsense "advice".
The shutdown 'command' is a symlink to systemctl which knows that it is being called that way and will act on it ... the same as if you did systemctl reboot
The issue surrounding remote syslog and gathering data on shutdown is that depending on where the freeze you are experiencing occurs there may not be any logs at all.
If it occurs before a sync to disk then any logs generated will be lost, if it occurs after the pivot-root when /var/log is no longer mounted then similarly any logs generated will be lost.
Of course if it is a *kernel* freeze issue then it is also likely that whatever is occurring never gets to generate a log event ... as that's hard to do with a frozen kernel ;)
I assume you've checked for BIOS/firmware updates and applied any pending?
Can you add IPMI (remote/out-of-band access) to that server? You may get something through hardware event then ... this is why I prefer HP or Dell kit over picking cheaper options when dealing with corporate needs ... their iLO and iDRAC implementations are robust and can provide better diagnosis on things like this with the built in hardware testing etc ... and avoid a need to walk to a server and plug in a monitor ;)
If you can't set up remote syslog for some reason, or if there's no logs found to help doing this, then I'd suggest removing rhgb and quiet from your kernel command line, having a monitor attached at the time you do the shutdown and monitor the console as you attempt the reboot.
Am 14.10.2017 um 19:54 schrieb Mike:
Hi,
Such a simple problem, but I can't figure out the cause. Supermicro server with a Xeon E3-1200 cpu. 1U entry level item.
Using CentOS 7
The version is a bit unprecise. Are you fully updated? On 7.3 and 7.4 I haven't seen that issue.
from ~$root --- systemctl reboot
Server disconnects my ssh connection and never comes back up. Go to the server and the power is on but the server is not accessible by ssh. When I connect a monitor and keyboard --- non-responsive. It's like it's in suspend mode.
I push and hold the power button until the server fully powers down. Push power again and everything boots, goes to prompt, and all is well.
When I try systemctl reboot directly on the server. Same problem --- does not start to login prompt.
Manually power down and power up again --- works and all is well.
Anyone have this problem before? I've checked all the BIOS options and I can't find anything misconfigured.
Thanks for your help.
Mike
https://bugzilla.redhat.com/show_bug.cgi?id=1047614
Does that fit?
Alexander
cat /etc/centos-release:
CentOS Linux release 7.4.1708 (Core)
The bugzilla report does sound similar --- in one of the comments, a user reports hang-up when trying remote reboot.
On 10/14/2017 10:54 AM, Mike wrote:
When I try systemctl reboot directly on the server. Same problem --- does not start to login prompt.
so where does it stop? does it never finish BIOS short self-test after the shutdown? does it hang somewhere in the Linux loading sequence? if it goes to the graphic screen with the blue startup bar or whatever, I believe you can hit ESC to get the console messages.
On Oct 14, 2017, at 1:54 PM, Mike 1100100@gmail.com wrote:
Server disconnects my ssh connection and never comes back up. Go to the server and the power is on but the server is not accessible by ssh. When I connect a monitor and keyboard --- non-responsive. It's like it's in suspend mode.
I push and hold the power button until the server fully powers down. Push power again and everything boots, goes to prompt, and all is well.
When you say that the monitor is plugged in, and the server is unresponsive, does that mean that the monitor doesn’t even come active? That sounds like it might have crashed the kernel in a way that the display isn’t showing.
You could set up kdump to catch that. You could also set up a persistent journal (create /var/log/journal) and try again, then when you manually power it up, check to see if anything was logged in the journal.
If the system’s keyboard is plugged in, you could try using the magic sysrq keys to get it to do something. (see https://en.wikipedia.org/wiki/Magic_SysRq_key ) Try ‘c’ to initiate a crashdump to force kdump to record a kernel dump, then you can examine the active processes. ‘k’ or ‘g’ might clean up the display if it’s bad.
Also, remote syslog is always helpful for these kinds of situations, although if the network is down when it crashes then it won’t be as helpful, which is why I suggest looking at the journal.
-- Jonathan Billings billings@negate.org
On Sat, Oct 14, 2017 at 6:24 PM, Jonathan Billings billings@negate.org wrote:
When you say that the monitor is plugged in, and the server is unresponsive, does that mean that the monitor doesn’t even come active? That sounds like it might have crashed the kernel in a way that the display isn’t showing.
You could set up kdump to catch that. You could also set up a persistent journal (create /var/log/journal) and try again, then when you manually power it up, check to see if anything was logged in the journal.
If the system’s keyboard is plugged in, you could try using the magic sysrq keys to get it to do something. (see https://en.wikipedia.org/wiki/Magic_SysRq_key ) Try ‘c’ to initiate a crashdump to force kdump to record a kernel dump, then you can examine the active processes. ‘k’ or ‘g’ might clean up the display if it’s bad.
Also, remote syslog is always helpful for these kinds of situations, although if the network is down when it crashes then it won’t be as helpful, which is why I suggest looking at the journal.
--
1. Monitor is on but screen is blank. 2. kdump logging --- i'll follow up on that. 3. remote syslog --- i'll need to do some more rtfm. I looked at /var/log/anaconda/syslog but I can't tell which boot-up I was looking at. Seemed like everything was normal...identifying naming locating hardware/devices....systemd services starting and running.
Thank you for your thoughtful responses. Very much appreciated. Good points to follow up with. Kind regards, Mike
It turns out kdump.service is already enabled on the server and /etc/kdump.conf settings would report any kernel crash/error items to /var/crash. The /var/crash file/folder is empty. It leads me to think the kernel is not crashing; however, I could be wrong. I'll need to perform another test "systemctl reboot" from remote ssh session and check it one more time.