On Thu, Sep 24, 2009 at 8:18 PM, Philip Gwyn liste@artware.qc.ca wrote:
Hello,
I have strange behaviour on a server that I can't get a handle on. I have a reasonably powerful server running VMware server 1.0.4-56528. It has a RAID5 build with mdadm on 5 SATA drives. Masses of ram and 2 XEON CPUs. But it stutters.
Example : fire up vi, press and keep finger on i. After filling 2-3 lines, the display is stopped for 2-12 seconds. Then they continue. This happens even on the host OS, at the console.
Host system running CentOS 5.2 x86-64:
CPU : 2x Xeon E5430 @ 2.66GHz RAM : 24GB Mobo : DSBV-DX HD : 5 x SATA ST3750330AS 750GB in RAID5
There are 5 VMs, detailed at http://www.awale.qc.ca/vmware/stj1.txt to make this mail shorter.
Seems to me this system should be more then adequate to handle the load.
This is what vmstat on the host looks like when the server is "unhappy" : http://www.awale.qc.ca/vmware/vmstat.txt Spending a lot of time in 'wa', but 'bo' and 'bi' are miniscule.
The problem seems like a disk problem. I grow to suspect that SATA isn't ready for the big time. I also grow to dislike RAID5.
Questions :
Anyone have a clue or other on how to track down my bottle neck?
SATA NCQ is limited to 15 queue depth. Is this per-SATA-port or
per-SATA-chip? Or does this question make no sense?
- I realise there are more recent versions of CentOS out. Are there specific
items in the changelogs that would affect my problem?
VMware Server 1.0.x was never supported on RHEL/CentOS 5.x, especially as early as 1.0.4. Not that it can't be made to work, but it just wasn't made for newer kernel versions. We run up to 10 guests in VMware Server 1.0.9 on a single Xeon quad core with the host running CentOS 4, SATA hardware RAID 1. Admittedly, our guests are pretty low CPU, low throughput, but it works just fine for us. If your guests are not really hammering the disk system, then you may be on a wild goose chase blaming RAID 5.
In my time on the VMware forums, it was always suggested to use single CPU guests running non-smp kernels for Server 1.0.x. It might help to convert the one smp guest you have. If you can afford some down-time, reconfigure the host to use compatible CentOS/VMware versions (4.x/1.0.x or 5.x/2.x respectively). At the very least, get the latest VMware Server 1.0.9.
-- Jeff