[CentOS] system unresponsive

Wed May 22 13:59:53 UTC 2019
mark <m.roth at 5-cent.us>

Stephen John Smoogen wrote:
> On Wed, 22 May 2019 at 09:30, mark <m.roth at 5-cent.us> wrote:
>> Ok, we used to get this occasionally on cluster nodes, and we just got
>> it on a fileserver (very bad). The system is discovered to be
>> unresponsive:
>> it doesn't ping, and plugging a console in, you can see that it's not
>> dead, but there nothing at all on the screen, nor does it respond to
>> even <ctrl-alt-del>. The only answer is to power cycle it; it comes up
>> fine.
>> Nothing in /var/log/dmesg or /var/log/messages. No abrts I can find.
>> sar tells me it went unredponsive between 18:10 and 10:20 yesterday.
>> Note that
>> there are no further entries in sar, either, for yesterday, after the
>> event, and nothing till I power cycled it.
> From the above description, I would normally say it sounds like hardware.
>  However, why do you say the system is not dead when you plug in a
> console.. but there is nothing on the screen and it doesn't respond to
> control-alt-delete. To me that sounds like 'dead'. Usually the cpu is
> hardlocked or the hardware went into 'over-heat' and put everything in a
> deep sleep hoping it would cool down but never wake up.
It seems unlikely. It's a 4U server, with 36 disks (and the dual root
disks), in a machine room, and ipmitool sel list shows nada, nor are there
any warnings, as I've seen on other systems occasionally, that the CPU is
overheating, and is being throttled.

>> Has anyone else seen this - I can't imagine it's only us - or have any
>> thoughts?
>> C 7, 7.6.1810
>> mark
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> https://lists.centos.org/mailman/listinfo/centos
> --
> Stephen J Smoogen.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos