On Wed, 18 Jan 2006, Fong Vang wrote:
I have a total of 20 CentOS 4.1 systems running on fairly new hardware. About 6 of them are experiencing strange hangs without any indication -- nothing in /var/log/messages nor on the console -- sometime within 10-30 minutes after a reboot. The systems still responds to ping but you can't ssh to it. At the console, you could type "root" at the user prompt but it hangs immediately after hitting enter.
Memory scan of all systems show no error.
Any idea how to troubleshoot this problem. The system's not responsive to do any troubleshooting and nothing abnormal is in the log.
Other folks have hit on the best starting points. For diagnosis, however, you might want to cobble up a cron script that can run every minute:
#!/bin/sh # # season to taste... ( top -n 1 -b # also provides a timestamp vmstat iostat ps axf ) >> /var/log/troubleshooting.log 2>&1
The resulting log will be verbose and will grow quickly, but it'll likely contain strong hints of any process-related problems. What it won't do, of course, is provide indications of hardware faults.