[CentOS] KVM HA

Wed Jun 22 06:21:29 UTC 2016

On 22/06/16 02:10 AM, Tom Robinson wrote:
> Hi Digimer,
> 
> Thanks for your reply.
> 
> On 22/06/16 15:20, Digimer wrote:
>> On 22/06/16 01:01 AM, Tom Robinson wrote:
>>> Hi,
>>>
>>> I have two KVM hosts (CentOS 7) and would like them to operate as High Availability servers,
>>> automatically migrating guests when one of the hosts goes down.
>>>
>>> My question is: Is this even possible? All the documentation for HA that I've found appears to not
>>> do this. Am I missing something?
>>
>> Very possible. It's all I've done for years now.
>>
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2
>>
>> That's for EL 6, but the basic concepts port perfectly. In EL7, just
>> change out cman + rgmanager for pacemaker. The commands change, but the
>> concepts don't. Also, we use DRBD but you can conceptually swap that for
>> "SAN" and the logic is the same (though I would argue that a SAN is less
>> reliable).
> 
> In what way is the SAN method less reliable? Am I going to get into a world of trouble going that way?

In the HA space, there should be no single point of failure. A SAN, for
all of it's redundancies, is a single thing. Google for tales of bad SAN
firmware upgrades to get an idea of what I am talking about.

We've found that using DRBD and build clusters in pairs to be a far more
resilient design. First, you don't put all you eggs in one basket, as it
were. So if you have multiple failures and lose a cluster, you lose one
pair and the servers it was hosting. Very bad, but less so that losing
the storage for all your systems.

Consider this case that happened a couple of years ago;

We had a client, through no malicious intent and misunderstanding of how
"hot swap" worked, walk up to a machine and start ejecting drives. We
got in touch and stopped him in very short order, but the damage was
done and the node's array was hosed. Certainly not a failure scenario we
had ever considered.

DRBD (which is sort of like "RAID 1 over a network") simply market the
local storage as Diskless and routed all read/writes to the good peer.
The hosted VM servers (and the software underpinning them) kept working
just fine. We lost the ability to live migrate because we couldn't read
from the local disk anymore, but the client continued to operate for
about an hour until we could schedule a controlled reboot to move the
servers without interrupting production.

In short, to do HA right, you have to be able to look at every piece of
you stack and say "what happens if this goes away?" and design it so
that the answer is "we'll recover".

For clients who need the performance of SANs (go big enough and the
caching and whatnot of a SAN is superior), we then recommend two SANs
and connect each one to a node and then treat them from there up as DAS.

>> There is an active mailing list for HA clustering, too:
>>
>> http://clusterlabs.org/mailman/listinfo/users
> I've had a brief look at the web-site. Lots of good info there. Thanks!

Clusterlabs is now the umbrella for a collection of different open
source HA projects, so it will continue to grow as time goes on.

>>> My configuration so fare includes:
>>>
>>>  * SAN Storage Volumes for raw device mappings for guest vms (single volume per guest).
>>>  * multipathing of iSCSI and Infiniband paths to raw devices
>>>  * live migration of guests works
>>>  * a cluster configuration (pcs, corosync, pacemaker)
>>>
>>> Currently when I migrate a guest, I can all too easily start it up on both hosts! There must be some
>>> way to fence these off but I'm just not sure how to do this.
>>
>> Fencing, exactly.
>>
>> What we do is create a small /shared/definitions (on gfs2) to host the
>> VM XML definitions and then undefine the VMs from the nodes. This makes
>> the servers disappear on non-cluster aware tools, like
>> virsh/virt-manager. Pacemaker can still start the servers just fine and
>> pacemaker, with fencing, makes sure that the server is only ever running
>> on one node at a time.
> 
> That sounds simple enough :-P. Although, I wanted to be able to easily open VM Consoles which I do
> currently through virt-manager. I also use virsh for all kinds of ad-hoc management. Is there an
> easy way to still have my cake and eat it? We also have a number of Windows VM's. Remote desktop is
> great but sometimes you just have to have a console.

We use virt-manager, too. It's just fine. Virsh also works just fine.
The only real difference is that once the server shuts off, it
"vanishes" from those tools. I would say about 75%+ of our clients run
some flavour of windows on our systems and both access and performance
is just fine.

>> We also have an active freenode IRC channel; #clusterlabs. Stop on by
>> and say hello. :)
> 
> Will do. I have a bit of reading now to catch up but I'm sure I'll have a few more questions before
> long.
> 
> Kind regards,
> Tom

Happy to help. If you stop by and don't get a reply, please idle. Folks
there span all timezones but are generally good about reading
scroll-back and replying.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?