Direct comparisons between the two were difficult to judge, but the general result was that the Host was between 2:1 and 3:1 better than the Guest, which seems to be a rather large performance gap. Latency differences were all over the map, which I find puzzling. The Host is 64-bit and the Guest 32-bit, if that makes any difference. Perhaps caching between Host and Guest accounts for some of the differences.
It does sound as if the guests are relying on the host rather than accessing the block device directly.
Drives should not use much cpu overhead thanks to DMA and improvements to drivers and hardware. When it's done correctly the host has little work to do. That doesn't sound like what's happening with your setup.
Basically, you have to think about the guests as independent systems which are competing for disk access with the other guests, and with the host. If you have just one drive or array that's used by all, that's a large bottleneck.
I've been working with VMs for a while now and have tried various ways to set up guests. Block devices can be done with or without LVM, although I've stopped using LVM on my systems these days.
For reasons of speed and ease of maintenance and backups, what I've settled on is: a small separate drive for the host to boot from, a small separate drive for the guest OSes (I like using qcow2 on WD Raptors), and then a large array on a raid controller for storage which the guests and host can share access to.