We are struggling with a strange problem. When we have some VMWare clients running (mostly MS windows clients), than the IO-write performance on the host becomes very bad. The guest os's do not do anything, just having them started, sitting at the login prompt, is enough to trigger the problem.
The host has plenty of 4G of RAM, and all clients fit easily into the space. The disksystem is a RAID5 (ok, not the fastest choice).
Both hosts are running the latest CentOS 5.2 (but the problem was already present in 5.0 as well), and running the latest production version of VMWare server 1.0.7 (but when running 1.0.5 the problem was there too).
One host as an adaptec 2820 with 4 SATA drives, and uses the aacraid driver. Another host is a proliant with a smart array E200, battery backed cache of 128 MB using the cciss driver. Filesystem is ext3 on both hosts.
When measuring the write performance, I try to avoid measuring the use the buffer cache like this, writing a 4.2gb file:
$ sync; time sh -c 'dd if=/dev/zero of=large bs=8k count=500k; sync'
(Ignore the times that dd reports because those are including the buffer cache; divide 4.2gb by the number of seconds reported by "time".)
To verify, I also open a another terminal and run there:
$ iostat -kx /dev/sda 5 1000
On my own workstation with a single sata drive, I get about 54 MB/s.
On the hosts having the RAID5 disks, and without any vmware client running, I get about 20MB/s. As said, not really fast, but it should be good enough for our purpose.
However when I start one vmclient and repeat the test, the write speed drops to half to about 10MB/s. With 2 clients the write speed drops even more to about 5 MB/s. With 3 clients only 2 MB/s, and with 4 MB/s I get only 0.5-1 MB/s write speed.
The iostat shows the disk is 100% utilized, and the processor is spending most of it's time (99-100%!) in iowait.
Here is some output of iostat on an actually "idle" machine, running 3 MS Windows clients, all guest OS's are just sitting doing almost nothing, not even a antivirus program installed:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 33.80 0.60 76.80 2.40 443.20 11.51 110.46 1269.85 12.93 100.04
avg-cpu: %user %nice %system %iowait %steal %idle 0.10 0.00 1.50 75.98 0.00 22.42
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 75.00 0.20 86.40 0.80 630.40 14.58 105.74 1466.97 11.55 100.06
avg-cpu: %user %nice %system %iowait %steal %idle 0.10 0.00 8.09 14.89 0.00 76.92
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 27.00 0.60 32.60 2.40 346.40 21.01 19.10 1079.57 10.57 35.10
avg-cpu: %user %nice %system %iowait %steal %idle 0.10 0.00 2.50 16.12 0.00 81.28
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 82.40 0.40 24.80 1.60 285.60 22.79 26.31 430.37 12.52 31.54
avg-cpu: %user %nice %system %iowait %steal %idle 0.10 0.00 0.70 6.71 0.00 92.49
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 7.40 0.40 24.20 1.60 271.20 22.18 7.12 918.03 8.59 21.14
Has anyone ever seen this before? Any tips to find the cause?
-- Paul
On Sun, Oct 05, 2008 at 11:28:45AM +0200, Paul Bijnens wrote:
We are struggling with a strange problem. When we have some VMWare clients running (mostly MS windows clients), than the IO-write performance on the host becomes very bad. The guest os's do not do anything, just having them started, sitting at the login prompt, is enough to trigger the problem.
The host has plenty of 4G of RAM, and all clients fit easily into the space. The disksystem is a RAID5 (ok, not the fastest choice).
Both hosts are running the latest CentOS 5.2 (but the problem was already present in 5.0 as well), and running the latest production version of VMWare server 1.0.7 (but when running 1.0.5 the problem was there too).
Hello.
I assume you know VMware server v1.x is not supported on RHEL5/CentOS5..
Have you tried VMware server v2.0?
One host as an adaptec 2820 with 4 SATA drives, and uses the aacraid driver. Another host is a proliant with a smart array E200, battery backed cache of 128 MB using the cciss driver. Filesystem is ext3 on both hosts.
When measuring the write performance, I try to avoid measuring the use the buffer cache like this, writing a 4.2gb file:
$ sync; time sh -c 'dd if=/dev/zero of=large bs=8k count=500k; sync'
(Ignore the times that dd reports because those are including the buffer cache; divide 4.2gb by the number of seconds reported by "time".)
To verify, I also open a another terminal and run there:
$ iostat -kx /dev/sda 5 1000
On my own workstation with a single sata drive, I get about 54 MB/s.
On the hosts having the RAID5 disks, and without any vmware client running, I get about 20MB/s. As said, not really fast, but it should be good enough for our purpose.
20 MB/sec sounds really bad. There has to be something wrong with your raid/disk setup.
Any errors/warnings in dmesg? or syslog/messages?
However when I start one vmclient and repeat the test, the write speed drops to half to about 10MB/s. With 2 clients the write speed drops even more to about 5 MB/s. With 3 clients only 2 MB/s, and with 4 MB/s I get only 0.5-1 MB/s write speed.
Even without any disk activity on VMware guests?
The iostat shows the disk is 100% utilized, and the processor is spending most of it's time (99-100%!) in iowait.
This is the problem. For some reason the raid is performing really poorly and causing a lot of iowait.
Here is some output of iostat on an actually "idle" machine, running 3 MS Windows clients, all guest OS's are just sitting doing almost nothing, not even a antivirus program installed:
Has anyone ever seen this before? Any tips to find the cause?
I haven't really seen this myself.. I'd start with fixing the host OS disk performance first before installing VMware server at all.
-- Pasi
On 2008-10-06 10:39, Pasi Kärkkäinen wrote:
On Sun, Oct 05, 2008 at 11:28:45AM +0200, Paul Bijnens wrote:
We are struggling with a strange problem. When we have some VMWare clients running (mostly MS windows clients), than the IO-write performance on the host becomes very bad. The guest os's do not do anything, just having them started, sitting at the login prompt, is enough to trigger the problem.
The host has plenty of 4G of RAM, and all clients fit easily into the space. The disksystem is a RAID5 (ok, not the fastest choice).
Both hosts are running the latest CentOS 5.2 (but the problem was already present in 5.0 as well), and running the latest production version of VMWare server 1.0.7 (but when running 1.0.5 the problem was there too).
Hello.
I assume you know VMware server v1.x is not supported on RHEL5/CentOS5..
Have you tried VMware server v2.0?
You wouldn't believe now many times i've checked to see if 2.0 was in production. I did not verify before sending this mail.
Also, because it was rather difficult to test beta software on these servers (only limited time in the weekends were free to test variations in the setup), I did not try to run 2.0beta.
To my surprise indeed 2.0 is now out of beta indeed. I installed it, and indeed, it works! No performance loss, even with 4 VM guests running.
Case closed.
20 MB/sec sounds really bad. There has to be something wrong with your raid/disk setup.
I don't believe 20MB/s sustained write performance is too bad for RAID5 with 4 sata disks, in a inexpensive machine. Do you do much better?
On Tue, Oct 07, 2008 at 10:26:47AM +0200, Paul Bijnens wrote:
On 2008-10-06 10:39, Pasi Kärkkäinen wrote:
On Sun, Oct 05, 2008 at 11:28:45AM +0200, Paul Bijnens wrote:
We are struggling with a strange problem. When we have some VMWare clients running (mostly MS windows clients), than the IO-write performance on the host becomes very bad. The guest os's do not do anything, just having them started, sitting at the login prompt, is enough to trigger the problem.
The host has plenty of 4G of RAM, and all clients fit easily into the space. The disksystem is a RAID5 (ok, not the fastest choice).
Both hosts are running the latest CentOS 5.2 (but the problem was already present in 5.0 as well), and running the latest production version of VMWare server 1.0.7 (but when running 1.0.5 the problem was there too).
Hello.
I assume you know VMware server v1.x is not supported on RHEL5/CentOS5..
Have you tried VMware server v2.0?
You wouldn't believe now many times i've checked to see if 2.0 was in production. I did not verify before sending this mail.
Also, because it was rather difficult to test beta software on these servers (only limited time in the weekends were free to test variations in the setup), I did not try to run 2.0beta.
To my surprise indeed 2.0 is now out of beta indeed. I installed it, and indeed, it works! No performance loss, even with 4 VM guests running.
Case closed.
Nice.
20 MB/sec sounds really bad. There has to be something wrong with your raid/disk setup.
I don't believe 20MB/s sustained write performance is too bad for RAID5 with 4 sata disks, in a inexpensive machine. Do you do much better?
Not me, but: http://home.comcast.net/~jpiszcz/20080607/raid5-benchmarks-3to10-velicirapto...
4 disk raid5: sequential writes: 297 MB/sec sequential reads: 407 MB/sec
I remember seeing >100 MB/sec sequential reads and writes with at least 4 years old hardware and 4x sata-disk (7200 rpm) software md-raid5.
-- Pasi