on 09:24 Tue 08 Mar, Michael Eager (eager at eagerm.com) wrote: > Hi -- > > I'm running a server which is usually stable, but every > once in a while it hangs. The server is used as a file > store using NFS and to run VMware machines. > > I don't see anything in /var/log/messages or elsewhere > to indicate any problem or offer any clue why the system > was hung. > > Any suggestions where I might look for a clue? I'd very strongly recommend you configure netconsole. Though not entire clear from the name, it's actually an in-kernel network logging module, which is very useful for kicking out kernel panics which otherwise aren't logged to disk and can't be seen on a (nonresponsive) monitor. Alternately, a serial console which actually retains all output sent to it (some remote access systems support this, some don't) may help. Barring that, I'd start looking at individual HW components, starting with RAM. The trick is in passing the appropriate parameters to the module at load time. I found it helpful to have an @boot cronjob to do this. You'll need to pass the local port, local system IP, local network device, remote syslog UDP port, remote syslog IP, and the /gateway/ MAC address, where gateway is the syslogd (if on a contiguous ethernet segment), or your network gateway host, if not. Some parsing magic can determine these values for you. Good article describing configuration: http://www.cyberciti.biz/tips/linux-netconsole-log-management-tutorial.html If you're not already remote-logging all other activity, I'd do that as well. You might catch the start of the hang, if not all of it. -- Dr. Ed Morbius, Chief Scientist / | Robot Wrangler / Staff Psychologist | When you seek unlimited power Krell Power Systems Unlimited | Go to Krell! -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: Digital signature URL: <http://lists.centos.org/pipermail/centos/attachments/20110308/844bd4e9/attachment-0005.sig>