[CentOS] diagnosing strange crash/hang

Gordon McLellan gordonthree at gmail.com
Mon Jun 18 16:11:26 UTC 2007

This morning I get a call that a server is down.  The server in
question is a vmware guest, windows 2003 advanced.  The host is vmware
server 1.01, running on centos 4.4 x64 on a poweredge 2950.  The
server has 16g of ram and a quadcore cpu, storage is provided by a
perc 5/i, raid 1 across two 146gb sas drives.

I was able to ssh into the host.  After trying to ping the guest, and
trying to connect to vmware via the management console, I decided to
restart the vmware service.  so I type service vmware restart.  it
hung on "shutting down virtual machines".  I was able to ctrl-c out,
and decided to manually kill the vmware processes.  after killing all
the vmware stuff, I did a service vmware start.  I get an error
"cannot touch /etc/vmware/locations: read only file system"

/etc is part of /, which mount claimed was mounted RW

so I try cat /var/log/messages and get nothing

so I tell the machine to reboot (remotely).  of course, it doesn't
come back up on its own, so I drive to the location.  the machine is
running, but sitting at a black screen.  I don't know what state it
was in, so did a forced turn off.  turning it back on, it proceeded to
boot normally.  it had a slight pause while it ran fsck on / but other
than that, no errors.

the vm's restarted normally, /var/log/messages is back, but has no
entries between June 15 and when I rebooted it the 2nd time on June

any ideas on where I should start looking?

is there some way to read array status from a Perc controller under linux?

any suggestions will be appreciated!


