----- "Dennis J." dennisml@conversis.de wrote:
Hi, What is the best way to deal with I/O load when running several VMs on a physical machine with local or remote storage?
Fill up on controller cache, add more spindles until you saturate the bus, then start adding more controllers.
What I'm primarily worried about is the case when several VMs cause disk I/O at the same time. One example would be the "updatedb" cronjob of the mlocate package. If you have say 5 VMs running on a physical System with a local software raid-1 as storage and the all run updatedb at the same time that causes all of them to run really slowly because the starve each other fighting over the disk.
Is this updatedb issue your only concern? Does it affect anything important? If it's just annoying, try to ignore it. Like any time sharing system, virtualization is a bet that things like this typically don't happen.
What is the best way to soften the impact of such a situation? Does it make sense to use a hardware raid instead?
It won't matter that much to change from software to hardware. Arguably, the benefits of software RAID outweigh the negatives. It won't help you one bit in this case. A controller with a battery-backed cache will, though.
How would the raid type affect the performance in this case? Would the fact that the I/O load gets distributed across multiple spindles in, say, a 4 disk hardware raid-5 have a big impact on this?
It can affect it an awful lot. RAID 5 sucks. Avoid it. A little research will give you all of the reasons why. If you have a random access pattern like you'd typically see in a busy virtualized setup, RAID 10 is your best bet. If you're not heavy on writes, RAID 6 will save you some disk space... but disk is cheap enough to err on the side of unused performance if you don't want to spend the time to benchmark. Proper partitioning can give you back some performance in many cases, too.
I'm currently facing the problem where I fear that random disk I/O by too many VMs on a physical system could cripple their performance even though I have plenty of CPU cores/RAM left to run them.
In most virtualized situations, I've seen it go down like this. In approximately this order, you will run out of...
1. RAM 2. disk I/O 3. CPU 4. network
3 and 4 will be swapped in a lot of cases. Regardless, this is the general order of priorities you should address while spec'ing your hardware or expect while setting your expectations with reappropriated hardware. If you have a two disk RAID 1 and 128GB of RAM and you run write-heavy databases (block device backing stores for anyone aiming to nitpick about OS filesystem cache) in all of the VMs, performance will not be that great.