On Wed, Jun 29, 2011 at 5:57 PM, Emmanuel Noobadmin centos.admin@gmail.comwrote:
On 6/30/11, Giovanni Tirloni gtirloni@sysdroid.com wrote:
I would approach this issue from another perspective: who's locking up
the
server (as in eating all resources) and how to stop/constrain it. You can try to renice the sshd process and see what happens. I'm not entirely
sure
what 'locked up' means in this context.
Server's unresponsive to the external world. It isn't dead, on two occasions, when it happened at times like Sunday and 1am in the night, I could afford to wait it out and see that it eventually does recover from whatever it was.
It's almost definitely related to disk i/o due to the VM guest fighting over the disks where their virtual disk-files are. However, the hard part is figuring out the exact factors, I know CPU isn't an issue having set up scripts to log top output when load goes above 5.
Linux includes I/O in how it calculates the load average so you're not measuring CPU alone.
What does top show? Any error messages in /var/log during the time the server is unresponsive? Is network responsive? Latency normal too?