[CentOS] System hangs silently

Wed Jan 18 20:00:50 UTC 2006
Les Mikesell <lesmikesell at gmail.com>

On Wed, 2006-01-18 at 13:38, Fong Vang wrote:
> I have a total of 20 CentOS 4.1 systems running on fairly new
> hardware.  About 6 of them are experiencing strange hangs without any
> indication -- nothing in /var/log/messages nor on the console --
> sometime within 10-30 minutes after a reboot.  The systems still
> responds to ping but you can't ssh to it.  At the console, you could
> type "root" at the user prompt but it hangs immediately after hitting
> enter.
> 
> Memory scan of all systems show no error.
> 
> Any idea how to troubleshoot this problem.  The system's not
> responsive to do any troubleshooting and nothing abnormal is in the
> log.
> 
> We running htis kernel: kernel-smp-2.6.9-11.EL.i686.rpm.

My first guess would be that something is consuming all possible
memory and pushing everything else into swap.  The system may
not be completely hung, but it can't respond in a reasonable
amount of time.  If the logs for whatever services you run
don't show anything, I'd watch with top over a period of
time to see if a single program is doing it and frequent
"ps ax" check to see if a large number of small processes
are accumulating.  You can get a hint about how fast new
processes are being started by looking at the process id
of the ps process when you run it repeatedly.  I assume from
the fact that you have 20 boxes that you are doing something
that causes substantial load - perhaps it needs to be distributed
better.

-- 
  Les Mikesell
    lesmikesell at gmail.com