We are struggling with a strange problem.
When we have some VMWare clients running (mostly MS windows clients),
than the IO-write performance on the host becomes very bad.
The guest os's do not do anything, just having them started,
sitting at the login prompt, is enough to trigger the problem.
The host has plenty of 4G of RAM, and all clients fit easily into
the space.
The disksystem is a RAID5 (ok, not the fastest choice).
Both hosts are running the latest CentOS 5.2 (but the problem was
already present in 5.0 as well), and running the latest production
version of VMWare server 1.0.7 (but when running 1.0.5 the problem
was there too).
One host as an adaptec 2820 with 4 SATA drives, and uses the aacraid
driver. Another host is a proliant with a smart array E200, battery
backed cache of 128 MB using the cciss driver.
Filesystem is ext3 on both hosts.
When measuring the write performance, I try to avoid measuring the
use the buffer cache like this, writing a 4.2gb file:
$ sync; time sh -c 'dd if=/dev/zero of=large bs=8k count=500k; sync'
(Ignore the times that dd reports because those are including the buffer
cache; divide 4.2gb by the number of seconds reported by "time".)
To verify, I also open a another terminal and run there:
$ iostat -kx /dev/sda 5 1000
On my own workstation with a single sata drive, I get about 54 MB/s.
On the hosts having the RAID5 disks, and without any vmware client
running, I get about 20MB/s. As said, not really fast, but it should be
good enough for our purpose.
However when I start one vmclient and repeat the test, the write speed
drops to half to about 10MB/s. With 2 clients the write speed drops even
more to about 5 MB/s. With 3 clients only 2 MB/s, and with 4 MB/s
I get only 0.5-1 MB/s write speed.
The iostat shows the disk is 100% utilized, and the processor is
spending most of it's time (99-100%!) in iowait.
Here is some output of iostat on an actually "idle" machine,
running 3 MS Windows clients, all guest OS's are just sitting doing
almost nothing, not even a antivirus program installed:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 33.80 0.60 76.80 2.40 443.20 11.51
110.46 1269.85 12.93 100.04
avg-cpu: %user %nice %system %iowait %steal %idle
0.10 0.00 1.50 75.98 0.00 22.42
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 75.00 0.20 86.40 0.80 630.40 14.58
105.74 1466.97 11.55 100.06
avg-cpu: %user %nice %system %iowait %steal %idle
0.10 0.00 8.09 14.89 0.00 76.92
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 27.00 0.60 32.60 2.40 346.40 21.01
19.10 1079.57 10.57 35.10
avg-cpu: %user %nice %system %iowait %steal %idle
0.10 0.00 2.50 16.12 0.00 81.28
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 82.40 0.40 24.80 1.60 285.60 22.79
26.31 430.37 12.52 31.54
avg-cpu: %user %nice %system %iowait %steal %idle
0.10 0.00 0.70 6.71 0.00 92.49
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 7.40 0.40 24.20 1.60 271.20 22.18
7.12 918.03 8.59 21.14
Has anyone ever seen this before?
Any tips to find the cause?
--
Paul