On 04/04/2011 09:00 AM, compdoc wrote:
It's possible to set up guests to use a block device that will get you the same disk I/O as the underlying storage.
Is that what you're seeing? What speed does the host see when benchmarking the RAID volumes, and what speeds do the guests see?
Yes, I have been going on the assumption that I get close to native block device performance, but the test results tell me otherwise. I see array rebuild data rates which seem reasonable ... in the order of 60 to 80 MBytes/sec. I'm using 256k chunks, with the stride size set to match the number of data drives.
Using bonnie++, I mounted one of the Guest RAID-6 filesystems on the Host, ran the default tests, unmounted, then booted the Guest and ran the same default tests. The amount of RAM assigned was the same for both, to level the playing field a bit.
Direct comparisons between the two were difficult to judge, but the general result was that the Host was between 2:1 and 3:1 better than the Guest, which seems to be a rather large performance gap. Latency differences were all over the map, which I find puzzling. The Host is 64-bit and the Guest 32-bit, if that makes any difference. Perhaps caching between Host and Guest accounts for some of the differences.
At the moment my questions tend to be a bit academic. I'm primarily wondering if RAID-10 is paranoid enough given the current quality of WD CaviarBlack drives (better than dirt-cheap consumer drives, but not enterprise grade). My second question relates to whether or not the added overhead of using something like qcow2 would be offset by the advantages of more space efficiency and the copy-on-write feature.
I'd love to hear what other software RAID users think, especially regarding large-capacity drives. It's rare for a modern drive to hand out bad data without an accompanying error condition (which the md driver should handle), but I have read that uncaught bad data is possible and would not be flagged in RAID arrays which don't use parity calculations.
Chuck
On 04/04/11 11:32 AM, Chuck Munro wrote:
I'd love to hear what other software RAID users think, especially regarding large-capacity drives. It's rare for a modern drive to hand out bad data without an accompanying error condition (which the md driver should handle), but I have read that uncaught bad data is possible and would not be flagged in RAID arrays which don't use parity calculations.
AFAIK, no standard raid modes verify parity on reads, as this would require reading the whole slice for every random read. Only raid systems like ZFS that use block checksuming can verify data on reads. parity (or mirrors) are verified by doing 'scrubs'
Further, even if a raid DID verify parity/mirroring on reads, this would at best create a nonrecoverable error (bad data on one of the N drives in the slice, no way of knowing which one is the bad one).
Direct comparisons between the two were difficult to judge, but the general result was that the Host was between 2:1 and 3:1 better than the Guest, which seems to be a rather large performance gap. Latency differences were all over the map, which I find puzzling. The Host is 64-bit and the Guest 32-bit, if that makes any difference. Perhaps caching between Host and Guest accounts for some of the differences.
It does sound as if the guests are relying on the host rather than accessing the block device directly.
Drives should not use much cpu overhead thanks to DMA and improvements to drivers and hardware. When it's done correctly the host has little work to do. That doesn't sound like what's happening with your setup.
Basically, you have to think about the guests as independent systems which are competing for disk access with the other guests, and with the host. If you have just one drive or array that's used by all, that's a large bottleneck.
I've been working with VMs for a while now and have tried various ways to set up guests. Block devices can be done with or without LVM, although I've stopped using LVM on my systems these days.
For reasons of speed and ease of maintenance and backups, what I've settled on is: a small separate drive for the host to boot from, a small separate drive for the guest OSes (I like using qcow2 on WD Raptors), and then a large array on a raid controller for storage which the guests and host can share access to.
On Tue, Apr 5, 2011 at 11:49 AM, compdoc compdoc@hotrodpc.com wrote:
I've been working with VMs for a while now and have tried various ways to set up guests. Block devices can be done with or without LVM, although I've stopped using LVM on my systems these days.
Just curious, why have you stopped using LVM? I've found it to be useful for allocating disk space to to KVM for virtual machines. I usually set up logical volumes on a separate volume group as "block devices" for the virtual machine to use. If there's an issue with this, I'd like to know about it.
-Iain
Just curious, why have you stopped using LVM?
Simply for ease of maintenance: some recovery and backup utilities like clonezilla can't work with LVM. And because the same names for volume groups are used for each centos install, so trying to attach a drive or volume to a new system for rescue causes conflicts unless you take steps and use unique names from the start. (Although I hear that newer versions of centos/RH will create unique names for you)
As I said, LVM works fine for VMs and can be used slice up a volume for guests to be used as a true block device.
By the way, a true block device means a raw partition on the disk is given to the guest to format and use as its own - so no existing file system is present. It's almost like giving a guest its own drive to work from, and should operate at the same native speeds as the host.
On Tue, Apr 05, 2011 at 08:22:08PM -0600, compdoc wrote:
Just curious, why have you stopped using LVM?
Simply for ease of maintenance: some recovery and backup utilities like clonezilla can't work with LVM. And because the same names for volume groups are used for each centos install, so trying to attach a drive or volume to a new system for rescue causes conflicts unless you take steps and use unique names from the start. (Although I hear that newer versions of centos/RH will create unique names for you)
Not all that unique, but a bit better--I think it's VolumeGroup00/lvm_root, VolumeGroup00/lvm_swap, and things like that.
(Keeping both LVs in the same VG by default.)
On Wed, 6 Apr 2011, Scott Robbins wrote:
Not all that unique, but a bit better--I think it's VolumeGroup00/lvm_root, VolumeGroup00/lvm_swap, and things like that.
(Keeping both LVs in the same VG by default.)
As far as I know it's much better than that:
The volume group by default with EL6 is vg_$HOSTNAME (with some characters stripped).
Even EL5 only created one VG.
jh
On 5.4.2011 21.49, compdoc wrote:
For reasons of speed and ease of maintenance and backups, what I've settled on is: a small separate drive for the host to boot from, a small separate drive for the guest OSes (I like using qcow2 on WD Raptors), and then a large array on a raid controller for storage which the guests and host can share access to.
Aren't the guests and host then competing for that large array? How is this arrangement better than other setups, for example having the host & each guest (with their associated data) each on their own disk, or partition?
- Jussi
On Wed, 6 Apr 2011, Jussi Hirvi wrote:
On 5.4.2011 21.49, compdoc wrote:
For reasons of speed and ease of maintenance and backups, what I've settled on is: a small separate drive for the host to boot from, a small separate drive for the guest OSes (I like using qcow2 on WD Raptors), and then a large array on a raid controller for storage which the guests and host can share access to.
Aren't the guests and host then competing for that large array? How is this arrangement better than other setups, for example having the host & each guest (with their associated data) each on their own disk, or partition?
Doesn't it feel a bit wrong to be tying physical infrastructure to VMs in that way? I'd have thought the large iSCSI device providing storage (not all necessarily identical) to a Virtual Server that then slices and dices appropriately to VMs seems a lot more sane.
jh