[CentOS-virt] Thoughts on storage infrastructure for small scale HA virtual machine deployments

Wed Mar 3 09:41:53 UTC 2010
Pasi Kärkkäinen <pasik at iki.fi>

On Mon, Mar 01, 2010 at 09:34:45PM -0600, Christopher G. Stach II wrote:
> ----- "Dennis J." <dennisml at conversis.de> wrote:
> > What I'm aiming for as a starting point is a 3-4 host cluster with
> > about 10 VMs on each host and a 2 system DRBD based cluster as a
> > redundant storage backend.
> That's a good idea.
> > The question that bugs me is how I can get enough bandwidth between the 
> > hosts and the storage to provide the VMs with reasonable I/O
> > performance.
> You may also want to investigate whether or not a criss-cross replication setup  (1A->2a, 2B->1b) is worth the complexity to you. That will spread the load across two drbd hosts and give you approximately the same fault tolerance at a slightly higher risk. (This is assuming that risk-performance tradeoff is important enough to your project.)
> > If all the 40 VMs start copying files at the same time that would mean
> > that the bandwidth share for each VM would be tiny.
> Would they? It's a possibility, and fun to think about, but what are the chances? You will usually run into this with backups, cron, and other scheduled [non-business load] tasks. These are far cheaper to fix with manually adjusting schedules than any other way, unless you are rolling in dough.
> > Would I maybe get away with 4 bonded gbit ethernet ports? Would I
> > require fiber channel or 10gbit infrastructure?
> Fuck FC, unless you want to get some out of date, used, gently broken, or no-name stuff, or at least until FCoE comes out. (You're probably better off getting unmanaged IB switches and using iSER.)
> Can't say if 10GbE would even be enough, but it's probably overkill. 

10 Gbit Ethernet makes sense if you need over 110MB/sec throughput with 
sequential reads/writes with large block sizes.. that's what 1 Gbit ethernet 
can give you.

If we're talking about random IO, then 1 Gbit ethernet is good/enough
for many environments.

Disks are the bottleneck with random IO.

-- Pasi