[CentOS] KVM HA

Wed Jun 22 18:44:09 UTC 2016
Digimer <lists at alteeve.ca>

On 22/06/16 02:34 PM, m.roth at 5-cent.us wrote:
> Digimer wrote:
>> On 22/06/16 02:01 PM, Chris Adams wrote:
>>> Once upon a time, John R Pierce <pierce at hogranch.com> said:
>>>> On 6/22/2016 10:47 AM, Digimer wrote:
>>>>> This is called "fabric fencing" and was originally the only supported
>>>>> option in the very early days of HA. It has fallen out of favour for
>>>>> several reasons, but it does still work fine. The main issues is that
>>>>> it leaves the node in an unclean state. If an admin (out of ignorance or
>>>>> panic) reconnects the node, all hell can break lose. So generally
>>>>> power cycling is much safer.
> <snip>
>>> If the node is just disconnected and left running, and later
>>> reconnected, it can try to write out (now old/incorrect) data to the
>>> storage, corrupting things.
>>>
>>> Speaking of shared storage, another fencing option is SCSI reservations.
>>> It can be terribly finicky, but it can be useful.
>>
>> Close.
>>
>> The cluster software and any hosted services aren't running. It's not
>> that they think they're wrong, they just have no existing state so they
>> won't try to touch anything without first ensuring it is safe to do so.
> <snip>
> Question: when y'all are saying "reconnect", is this different from
> stopping the h/a services, reconnecting to the network, and then starting
> the services (which would let you avoid a reboot)?
> 
>           mark

Expecting a lost node to behave in any predictable manner is not allowed
in HA. In theory, with fabric fencing, that is exactly how you could
recover (stop all HA software, reconnect, start), but even then a reboot
is highly recommended before reconnecting.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?