disk i/o benchmarking inside vm

List overview All Threads
Download

newer

older

bridging bonded NICs, centos6

Emergency help needed on host...

Dennis Jacobfeuerborn

8 Oct 2011 8 Oct '11

5:45 p.m.

Hi, I'm having trouble benchmarking disk i/o in my vm's. The data I get seems bogus. I have two centos 6 guests which use a raw image as volume. Each volume is stored on its own physical disk and both disks are the same model. The host system is fedora 15 with the virt-preview repo enabled. The disks for the guests use virtio and caching is set to none.

My problem is that I get very different results when I benchmark I/O in these guests even though I shouldn't. Doing a seekmark I get: guest 1: 120 seeks/s guest 2: 220 seeks/s

"hdparm -t" shows: guest 1: 100 MB/s guest 1: 160 MB/s

What's worse is that when I restart the guests the results change and suddenly guest 1 is a lot faster and guest 2 is a lot slower however as long as the guests are running repeating the benchmarks give consistent results.

When I test the disks on the host directly I get: seekmark: /dev/sdb: 75 seeks/s /dev/sdc: 75 seeks/s

hdparm -t: /dev/sdb: 140 MB/s /dev/sdc: 140 MB/s

What bugs me is not so much the absolute numbers (for now) but the fact that these guests give so wildly inconsistent results. Even if the jump from 75 seeks/s to 120 seeks/s from host to guest is explainable by the way block i/o is handled in the virtualization layer I would still expect both guests to return similar results and I would also expect to see similar results across restarts of a single guest.

I've attached the definition for both guests even though they are virtually identical.

Does anyone have an idea what's happening here?

Regards, Dennis

Attachments:

gw1.xml (text/xml — 1.8 KB)
gw2.xml (text/xml — 1.8 KB)

Show replies by date

Nehemiah

9 Oct 9 Oct

2:32 p.m.

On Saturday, October 8, 2011 at 12:45 PM, Dennis Jacobfeuerborn wrote:

...

Hi, I'm having trouble benchmarking disk i/o in my vm's. The data I get seems bogus.

have you tried perf top, you can see their other performance statistics there. there is more going on during a disk write than writing what your guests want. I think there is hardware passthrough for disks also.

-- Nehemiah I. Dacres Saint Louis University: Advanced Technology Group Linux System Administrator Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

Dennis Jacobfeuerborn

10 Oct 10 Oct

1:58 a.m.

On 10/09/2011 04:32 PM, Nehemiah wrote:

...

On Saturday, October 8, 2011 at 12:45 PM, Dennis Jacobfeuerborn wrote:

...
Hi, I'm having trouble benchmarking disk i/o in my vm's. The data I get seems bogus.

have you tried perf top, you can see their other performance statistics there. there is more going on during a disk write than writing what your guests want. I think there is hardware passthrough for disks also.

At the moment I'm not interested in the benchmarks themselves but the reason why they show such an odd behavior. The fact that there might be "more going on" doesn't account for the results I'm seeing.

Regards, Dennis

Nehemiah

6:27 p.m.

but thats what I'm asking you to investigate. All you see is a symptom, more data may clarify the situation. Weren't you taking data from both inside the guests an on the host's disks. It was kind of unclear.

-- Nehemiah I. Dacres Saint Louis University: Advanced Technology Group Linux System Administrator Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, October 9, 2011 at 8:58 PM, Dennis Jacobfeuerborn wrote: > "more going on" doesn't account for the results I'm seeing.

Dennis Jacobfeuerborn

6:43 p.m.

On 10/10/2011 08:27 PM, Nehemiah wrote:

...

but thats what I'm asking you to investigate. All you see is a symptom, more data may clarify the situation. Weren't you taking data from both inside the guests an on the host's disks. It was kind of unclear.

The guest number are the ones that are not consistent with each other or across restarts of the guests:

seekmark: guest 1: 120 seeks/s guest 2: 220 seeks/s

"hdparm -t" shows: guest 1: 100 MB/s guest 2: 160 MB/s

The host numbers are the ones of the physical drives: seekmark: /dev/sdb: 75 seeks/s /dev/sdc: 75 seeks/s

hdparm -t: /dev/sdb: 140 MB/s /dev/sdc: 140 MB/s

The physical numbers are consistent and what I would expect to see from the sata drives. The guests are minimal centos 6 installations so after booting they have virtually no processes running that could influence the benchmarks in any significant way. The hosts system is installed on /dev/sda so /dev/sd(b|c) are not influenced by the host system either. The entire setup is arranged to make benchmarking mostly reliable. If there were minor temporary fluctuations I would blame some external process but differences of almost 100% that are consistent for the lifetime of the virtual machine do not fit such a scenario.

Regards, Dennis

Stoyan Marinov

7:08 p.m.

OK, I think I managed to fix it. Please check and let me know.

Stoyan

On Oct 10, 2011, at 9:43 PM, Dennis Jacobfeuerborn wrote:

...

On 10/10/2011 08:27 PM, Nehemiah wrote:

...
but thats what I'm asking you to investigate. All you see is a symptom, more data may clarify the situation. Weren't you taking data from both inside the guests an on the host's disks. It was kind of unclear.

The guest number are the ones that are not consistent with each other or across restarts of the guests:

seekmark: guest 1: 120 seeks/s guest 2: 220 seeks/s

"hdparm -t" shows: guest 1: 100 MB/s guest 2: 160 MB/s

The host numbers are the ones of the physical drives: seekmark: /dev/sdb: 75 seeks/s /dev/sdc: 75 seeks/s

hdparm -t: /dev/sdb: 140 MB/s /dev/sdc: 140 MB/s

The physical numbers are consistent and what I would expect to see from the sata drives. The guests are minimal centos 6 installations so after booting they have virtually no processes running that could influence the benchmarks in any significant way. The hosts system is installed on /dev/sda so /dev/sd(b|c) are not influenced by the host system either. The entire setup is arranged to make benchmarking mostly reliable. If there were minor temporary fluctuations I would blame some external process but differences of almost 100% that are consistent for the lifetime of the virtual machine do not fit such a scenario.

Regards, Dennis _______________________________________________ CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt

Stoyan Marinov

7:14 p.m.

Sorry, reply to wrong email :)

On Oct 10, 2011, at 10:08 PM, Stoyan Marinov wrote:

...

OK, I think I managed to fix it. Please check and let me know.

Stoyan

On Oct 10, 2011, at 9:43 PM, Dennis Jacobfeuerborn wrote:

...
On 10/10/2011 08:27 PM, Nehemiah wrote:

...
but thats what I'm asking you to investigate. All you see is a symptom, more data may clarify the situation. Weren't you taking data from both inside the guests an on the host's disks. It was kind of unclear.

The guest number are the ones that are not consistent with each other or across restarts of the guests:

seekmark: guest 1: 120 seeks/s guest 2: 220 seeks/s

"hdparm -t" shows: guest 1: 100 MB/s guest 2: 160 MB/s

The host numbers are the ones of the physical drives: seekmark: /dev/sdb: 75 seeks/s /dev/sdc: 75 seeks/s

hdparm -t: /dev/sdb: 140 MB/s /dev/sdc: 140 MB/s

The physical numbers are consistent and what I would expect to see from the sata drives. The guests are minimal centos 6 installations so after booting they have virtually no processes running that could influence the benchmarks in any significant way. The hosts system is installed on /dev/sda so /dev/sd(b|c) are not influenced by the host system either. The entire setup is arranged to make benchmarking mostly reliable. If there were minor temporary fluctuations I would blame some external process but differences of almost 100% that are consistent for the lifetime of the virtual machine do not fit such a scenario.

Regards, Dennis _______________________________________________ CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt

CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt

Charles Polisher

20 Oct 20 Oct

6:13 a.m.

Mon, Oct 10, 2011, Dennis Jacobfeuerborn wrote: <snip />

...

The physical numbers are consistent and what I would expect to see from the sata drives. The guests are minimal centos 6 installations so after booting they have virtually no processes running that could influence the benchmarks in any significant way. The hosts system is installed on /dev/sda so /dev/sd(b|c) are not influenced by the host system either. The entire setup is arranged to make benchmarking mostly reliable. If there were minor temporary fluctuations I would blame some external process but differences of almost 100% that are consistent for the lifetime of the virtual machine do not fit such a scenario.

Hi Dennis,

Did you figure this out?

I'd suggest running something like smartctl -t long to rule out underlying issues with a drive.

My favorite tool for testing the data path was Robin Miller's dt, which now appears to be absent from the web. I'll get you a source tarball if you contact me off list.

Moving up the data chain to the kernel level, I wonder what iostat would show while the benchmarks are running. Something like -

/path/to//startbenchmark & for i in `seq 1 10`; do sleep 30 iostat -x >> /path/to/iostat.log done

While that's running, in another term, /usr/bin/top might show unexpected values for iowait/w_chan/process-status.

Perhaps the beginning block offset for the .img files are not identical. If the alignment was unfortunate, perhaps the OS needs to do an extra seek for every track it reads or some such.

Best regards,

-- Charles Polisher

Dennis Jacobfeuerborn

23 Oct 23 Oct

1 a.m.

On 10/20/2011 08:13 AM, Charles Polisher wrote:

...

Mon, Oct 10, 2011, Dennis Jacobfeuerborn wrote:

<snip /> > The physical numbers are consistent and what I would expect to see from the > sata drives. > The guests are minimal centos 6 installations so after booting they have > virtually no processes running that could influence the benchmarks in any > significant way. The hosts system is installed on /dev/sda so /dev/sd(b|c) > are not influenced by the host system either. > The entire setup is arranged to make benchmarking mostly reliable. > If there were minor temporary fluctuations I would blame some external > process but differences of almost 100% that are consistent for the lifetime > of the virtual machine do not fit such a scenario.

Hi Dennis,

Did you figure this out?

I'd suggest running something like smartctl -t long to rule out underlying issues with a drive.

But that's what puzzles me. On the host side both drives behave exactly identical and as expected. It's only in the guests where things get strange. I now have created a second disk /dev/vdb with an image on the second drive in my first guest so that it has disk from both physical drives that I can test in the same VM. These are the results:

seekmark: /dev/vda: 130 seeks/s /dev/vdb: 9615 seeks/s

hdparm -t: /dev/vda: 95 MB/s /dev/vdb: 1691 MB/s

Running iostat shows no I/O activity when running the tests for /dev/vdb which explains the insane numbers. The question is why I get such different results when both devices are defined exactly the same way?

In the guest the drive are running using virtio with type=raw and cache=none. On the host the backing filesystems look exactly the same:

[dennis@nexus ~]$ cat /proc/mounts|grep backup /dev/sdb3 /mnt/backup01 ext4 rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0 /dev/sdc3 /mnt/backup02 ext4 rw,seclabel,relatime,user_xattr,acl,barrier=1,data=ordered 0 0

The drives are identical in every possible way:

[dennis@nexus ~]$ sudo hdparm -i /dev/sdb

/dev/sdb:

Model=SAMSUNG HD103SJ, FwRev=1AJ100E5, SerialNo=S246J9JB801028 Config={ Fixed } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=yes: disabled (255) WriteCache=enabled Drive conforms to: unknown: ATA/ATAPI-0,1,2,3,4,5,6,7

* signifies the current active mode

[dennis@nexus ~]$ sudo hdparm -i /dev/sdc

/dev/sdc:

Model=SAMSUNG HD103SJ, FwRev=1AJ100E5, SerialNo=S246J9JB801029 Config={ Fixed } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=yes: disabled (255) WriteCache=enabled Drive conforms to: unknown: ATA/ATAPI-0,1,2,3,4,5,6,7

* signifies the current active mode

Regards, Dennis

Eric Shubert

2:14 a.m.

On 10/22/2011 06:00 PM, Dennis Jacobfeuerborn wrote:

...

Running iostat shows no I/O activity when running the tests for /dev/vdb which explains the insane numbers. The question is why I get such different results when both devices are defined exactly the same way?

I'm not very familiar with KVM yet (got my first real lesson today), but I notice you said: "In the guest the drive are running using virtio with type=raw and cache=none." Are these KVM settings, or did you use kernel parameters on the guest machine?

Also, what about the elevator (i/o scheduler) in the guest? In a VMware Server2 host (on Centos, so I'm not far OT) it's best to use the elevator=noop parameter. I wouldn't expect the elevator to skew results quite as much as you're seeing, but what do I know? ;)

-- -Eric 'shubes'

Dennis Jacobfeuerborn

3:30 p.m.

On 10/23/2011 04:14 AM, Eric Shubert wrote:

...

On 10/22/2011 06:00 PM, Dennis Jacobfeuerborn wrote:

...
Running iostat shows no I/O activity when running the tests for /dev/vdb which explains the insane numbers. The question is why I get such different results when both devices are defined exactly the same way?

I'm not very familiar with KVM yet (got my first real lesson today), but I notice you said: "In the guest the drive are running using virtio with type=raw and cache=none." Are these KVM settings, or did you use kernel parameters on the guest machine?

I'm using the settings in virt-manager. This is what my disk definition looks like in xml:

... <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/mnt/backup01/libvirt/images/gw1.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/mnt/backup02/libvirt/images/gw1-data.img'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> ...

...

Also, what about the elevator (i/o scheduler) in the guest? In a VMware Server2 host (on Centos, so I'm not far OT) it's best to use the elevator=noop parameter. I wouldn't expect the elevator to skew results quite as much as you're seeing, but what do I know? ;)

At the moment I'm not really interested in optimizing performance. The plan was to toy around with KVM to see how it behaves compared to Xen but right now things don't look too well. Given that all parameters are the same I have no idea what could such completely different bahavior. It's almost as if the guests treats /dev/vdb as /dev/null but I don't know why.

Regards, Dennis

5148

Age (days ago)

5163

Last active (days ago)

virt@lists.centos.org

10 comments

5 participants

tags (0)

participants (5)

Charles Polisher
Dennis Jacobfeuerborn
Eric Shubert
Nehemiah
Stoyan Marinov