Hi, up until now I've always deployed VMs with their storage located directly on the host system but as the number of VMs grows and the hardware becomes more powerful and can handle more virtual machines I'm concerned about a failure of the host taking down too many VMs in one go. As a result I'm now looking at moving to an infrastructure that uses shared storage instead so I can live-migrate VMs or restart them quickly on another host if the one they are running on dies. The problem is that I'm not sure how to go about this bandwidth-wise. What I'm aiming for as a starting point is a 3-4 host cluster with about 10 VMs on each host and a 2 system DRBD based cluster as a redundant storage backend. The question that bugs me is how I can get enough bandwidth between the hosts and the storage to provide the VMs with reasonable I/O performance. If all the 40 VMs start copying files at the same time that would mean that the bandwidth share for each VM would be tiny. Granted this is a worst case scenario and that's why I want to ask if someone in here has experience with such a setup, can give recommendations or comment on alternative setups? Would I maybe get away with 4 bonded gbit ethernet ports? Would I require fiber channel or 10gbit infrastructure?
Regards, Dennis
PS: The sheepdog project (http://www.osrg.net/sheepdog/) looks interesting in that regard but apparently still is far from production-ready.
----- "Dennis J." dennisml@conversis.de wrote:
What I'm aiming for as a starting point is a 3-4 host cluster with about 10 VMs on each host and a 2 system DRBD based cluster as a redundant storage backend.
That's a good idea.
The question that bugs me is how I can get enough bandwidth between the hosts and the storage to provide the VMs with reasonable I/O performance.
You may also want to investigate whether or not a criss-cross replication setup (1A->2a, 2B->1b) is worth the complexity to you. That will spread the load across two drbd hosts and give you approximately the same fault tolerance at a slightly higher risk. (This is assuming that risk-performance tradeoff is important enough to your project.)
If all the 40 VMs start copying files at the same time that would mean that the bandwidth share for each VM would be tiny.
Would they? It's a possibility, and fun to think about, but what are the chances? You will usually run into this with backups, cron, and other scheduled [non-business load] tasks. These are far cheaper to fix with manually adjusting schedules than any other way, unless you are rolling in dough.
Would I maybe get away with 4 bonded gbit ethernet ports? Would I require fiber channel or 10gbit infrastructure?
Fuck FC, unless you want to get some out of date, used, gently broken, or no-name stuff, or at least until FCoE comes out. (You're probably better off getting unmanaged IB switches and using iSER.)
Can't say if 10GbE would even be enough, but it's probably overkill. Just add up the PCI(-whatever) bus speeds of your hosts, benchmark your current load or realistically estimate what sort of 95th percentile loads you would have across the board, multiply by that percentage, and fudge that result for SLAs and whatnot. Maybe go ahead and do some FMEA and see if losing a host or two is going to peak the others over that bandwidth. If you find that 10GbE may be necessary, a lot of mobos and SuperMicro have a better price per port for DDR IB (maybe QDR now) and that may save you some money. Again, probably overkill. Check your math. :)
Definitely use bonding. Definitely make sure you aren't going to saturate the bus that card (or cards, if you are worried about losing an entire adapter) is plugged into. If you're paranoid, get switches that can do bonding across supervisors or across physical fixed configuration switches. If you can't afford those, you may want to opt for 2Nx2N bonding-bridging. That would limit you to probably two 4-1GbE cards per host, just for your SAN, but that's probably plenty. Don't waste your money on iSCSI adapters. Just get ones with TOEs.
Don't waste your money on iSCSI adapters. Just get ones with TOEs.
Just a point of note, if your hypervisor is derived from Linux (excluding some vendors who may have hacked in support), the TOEs (TCP Offload Engine) functions are *not* supported in Linux.
On Mon, Mar 01, 2010 at 09:34:45PM -0600, Christopher G. Stach II wrote:
----- "Dennis J." dennisml@conversis.de wrote:
What I'm aiming for as a starting point is a 3-4 host cluster with about 10 VMs on each host and a 2 system DRBD based cluster as a redundant storage backend.
That's a good idea.
The question that bugs me is how I can get enough bandwidth between the hosts and the storage to provide the VMs with reasonable I/O performance.
You may also want to investigate whether or not a criss-cross replication setup (1A->2a, 2B->1b) is worth the complexity to you. That will spread the load across two drbd hosts and give you approximately the same fault tolerance at a slightly higher risk. (This is assuming that risk-performance tradeoff is important enough to your project.)
If all the 40 VMs start copying files at the same time that would mean that the bandwidth share for each VM would be tiny.
Would they? It's a possibility, and fun to think about, but what are the chances? You will usually run into this with backups, cron, and other scheduled [non-business load] tasks. These are far cheaper to fix with manually adjusting schedules than any other way, unless you are rolling in dough.
Would I maybe get away with 4 bonded gbit ethernet ports? Would I require fiber channel or 10gbit infrastructure?
Fuck FC, unless you want to get some out of date, used, gently broken, or no-name stuff, or at least until FCoE comes out. (You're probably better off getting unmanaged IB switches and using iSER.)
Can't say if 10GbE would even be enough, but it's probably overkill.
10 Gbit Ethernet makes sense if you need over 110MB/sec throughput with sequential reads/writes with large block sizes.. that's what 1 Gbit ethernet can give you.
If we're talking about random IO, then 1 Gbit ethernet is good/enough for many environments.
Disks are the bottleneck with random IO.
-- Pasi
Pasi Kärkkäinen wrote:
10 Gbit Ethernet makes sense if you need over 110MB/sec throughput with sequential reads/writes with large block sizes.. that's what 1 Gbit ethernet can give you.
You can also bond 1Ge ports to get higher throughput. Buying an ethernet switch that supports bonded ports is quite a bit cheaper than 10Ge if you don't need to go all the way to 10G while giving substantial performance boosts.
If all the 40 VMs start copying files at the same time that would mean that the bandwidth share for each VM would be tiny.
Would they? It's a possibility, and fun to think about, but what are the
chances? You will usually run into this with backups, cron, and other scheduled [non-business load] tasks. These are far cheaper to fix with manually adjusting schedules than any other way, unless you are rolling in dough.
I have a classroom environment where every VM is always doing the same thing in step ie. formatting partitions, installing software etc.. We hit the disk like a bunch of crazy people. I'm replacing my setup with three Intel SSDs in a RAID0 with either iSCSI or ATAoE. The RAID0 will be synced to a disk based storage as backup. We'll see pretty soon how many concurrent disk based operations this setup can handle.
I'll be bonding 3 or 4 of the iSCSI box ethernet cards and then going from there to see what each of the servers in the cloud needs as far as their connection.
Grant McWilliams
Some people, when confronted with a problem, think "I know, I'll use Windows." Now they have two problems.
----- "Grant McWilliams" grantmasterflash@gmail.com wrote:
I'm replacing my setup with three Intel SSDs in a RAID0 with either iSCSI or ATAoE. The RAID0 will be synced to a disk based storage as backup. We'll see pretty soon how many concurrent disk based operations this setup can handle.
I haven't benchmarked anything like that in a while. I'm not saying that RAID 0 with 3 targets is going to be non-performant, but I would expect a parallel array to be better for random ops unless a classroom workload falls into sequential for some reason (software installation). Do you have any numbers testing this, or better, real world stats?