[CentOS] Server hangs on CentOS 5.5

Tue Mar 8 18:41:47 UTC 2011
Scott Silva <ssilva at sgvwater.com>

on 3/8/2011 10:20 AM Michael Eager spake the following:
> Brian Mathis wrote:
>> On Tue, Mar 8, 2011 at 12:24 PM, Michael Eager <eager at eagerm.com> wrote:
>>> Hi --
>>>
>>> I'm running a server which is usually stable, but every
>>> once in a while it hangs.  The server is used as a file
>>> store using NFS and to run VMware machines.
>>>
>>> I don't see anything in /var/log/messages or elsewhere
>>> to indicate any problem or offer any clue why the system
>>> was hung.
>>>
>>> Any suggestions where I might look for a clue?
>>
>> Please be more specific when you say it "hangs".  Does it just pause
>> for a minute and then continue working, or does it freeze completely
>> until you reboot it?  Does it respond to s "soft" reboot like
>> Ctrl-Alt-Del, or do you need to hard power it off?
> 
> System is unresponsive.  Monitor blank, no response to keyboard,
> no response to remote ssh.  Hit reset to reboot.
> 
> The only indication that I had that there was a problem (other
> that attached systems were not accessing files) was that the fan(s)
> on the server were louder than normal.
> 
>> Since this is an NFS server I'm going to guess there might be a lot of
>> IO.  Maybe there is some large IO load going on, like maybe all your
>> VMs are running anti-virus scan at the same time, or something like
>> that.
> 
> At the time, should be very low NFS load.
> 
>> To troubleshoot, I recommend installing the 'sar' utilities (yum
>> install sysstat) and then reviewing the collected data using the
>> 'ksar' utility (http://sourceforge.net/projects/ksar/).  sar/ksar are
>> good for tracking down acute problems.
> 
> Thanks for the suggestion.  I'll look into sar.
> 
> 
Did you try the obvious stuff for older equipment? Remove and reseat ALL cards
and memory, several times, to clean off any oxidation from contacts.
Blow out any dust and collected lint.
reseat drive cables.