[CentOS] System hangs silently

Wed Jan 18 20:59:52 UTC 2006
Paul Heinlein <heinlein at madboa.com>

On Wed, 18 Jan 2006, Fong Vang wrote:

> I have a total of 20 CentOS 4.1 systems running on fairly new 
> hardware.  About 6 of them are experiencing strange hangs without 
> any indication -- nothing in /var/log/messages nor on the console -- 
> sometime within 10-30 minutes after a reboot.  The systems still 
> responds to ping but you can't ssh to it.  At the console, you could 
> type "root" at the user prompt but it hangs immediately after 
> hitting enter.
>
> Memory scan of all systems show no error.
>
> Any idea how to troubleshoot this problem.  The system's not 
> responsive to do any troubleshooting and nothing abnormal is in the 
> log.

Other folks have hit on the best starting points. For diagnosis, 
however, you might want to cobble up a cron script that can run every 
minute:

#!/bin/sh
#
# season to taste...
(
   top -n 1 -b # also provides a timestamp
   vmstat
   iostat
   ps axf
) >> /var/log/troubleshooting.log 2>&1

The resulting log will be verbose and will grow quickly, but it'll 
likely contain strong hints of any process-related problems. What it 
won't do, of course, is provide indications of hardware faults.

-- 
Paul Heinlein <> heinlein at madboa.com <> www.madboa.com