[CentOS] Server hangs on CentOS 5.5

Wed Mar 9 15:06:15 UTC 2011
Michael Eager <eager at eagerm.com>

Dr. Ed Morbius wrote:
> on 09:24 Tue 08 Mar, Michael Eager (eager at eagerm.com) wrote:
>> Hi --
>>
>> I'm running a server which is usually stable, but every
>> once in a while it hangs.  The server is used as a file
>> store using NFS and to run VMware machines.
>>
>> I don't see anything in /var/log/messages or elsewhere
>> to indicate any problem or offer any clue why the system
>> was hung.
>>
>> Any suggestions where I might look for a clue?
> 
> I'd very strongly recommend you configure netconsole.  Though not entire
> clear from the name, it's actually an in-kernel network logging module,
> which is very useful for kicking out kernel panics which otherwise
> aren't logged to disk and can't be seen on a (nonresponsive) monitor.

I'll take a look at netconsole.

> Alternately, a serial console which actually retains all output sent to
> it (some remote access systems support this, some don't) may help.
> 
> Barring that, I'd start looking at individual HW components, starting
> with RAM.

The problem with randomly replacing various components, other than
the downtime and nuisance, is that there's no way to know that the
change actually fixed any problem.  When the base rate is one
unknown system hang every few weeks, how many wees should I wait
without a failure to conclude that the replaced component was the
cause?  A failure which happens infrequently isn't really amenable
to a random diagnostic approach.

-- 
Michael Eager	 eager at eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077