[CentOS] How to determine why a server is not responding

Thu Apr 11 15:42:56 UTC 2013
m.roth at 5-cent.us <m.roth at 5-cent.us>

nan del bosc wrote:
> Hi to all!
>
> We're using CentOS 5.5 64bits for our Plesk 11.
>
> This week we had the following problem 3 times...
>
> Suddenly, the server stops responding in all services (SSH, Apache,
> Postfix, ...) but ping works!
>
> After wait a few minutes (or 2 hours some times) the server continues
> unresponsive until we reboot. After reboot we search on /var/log/messages
> but cannot find useful information...
<snip>

A quick google shows me that the postfix messages are just that, and you
might want to fix it so it's not asking for it.

HOWEVER, the important thing is that it appears to have just gone
completely unresponsive. I've seen that happen to some servers here, and
we've never found any clues.... On the other hand, IIRC, they tended to be
boxes that we've had other problems with, and have had a number rebuilt
under warranty (mostly Penguins, and the problems I've had with them, as
they're all Supermicro m/b's, told me to NEVER buy a Supermicro m/b).

The only thing I can suggest trying might be to use ipmitool (assuming you
don't want to bring them down and look in the BIOS) to read the SEL
(system event log), to look for hardware errors.

        mark