Wed Jun 22 18:06:20 UTC 2016
Digimer <lists at alteeve.ca>

On 22/06/16 02:01 PM, Chris Adams wrote:
> Once upon a time, John R Pierce <pierce at hogranch.com> said:
>> On 6/22/2016 10:47 AM, Digimer wrote:
>>> This is called "fabric fencing" and was originally the only supported
>>> option in the very early days of HA. It has fallen out of favour for
>>> several reasons, but it does still work fine. The main issues is that it
>>> leaves the node in an unclean state. If an admin (out of ignorance or
>>> panic) reconnects the node, all hell can break lose. So generally power
>>> cycling is much safer.
>> how is that any different than said ignorant admin powering up the
>> shutdown node ?
> On boot, the cluster software assumes it is "wrong" and doesn't connect
> to any resources until it can verify state.
> If the node is just disconnected and left running, and later
> reconnected, it can try to write out (now old/incorrect) data to the
> storage, corrupting things.
> Speaking of shared storage, another fencing option is SCSI reservations.
> It can be terribly finicky, but it can be useful.


The cluster software and any hosted services aren't running. It's not
that they think they're wrong, they just have no existing state so they
won't try to touch anything without first ensuring it is safe to do so.

SCSI reservations, and anything that blocks access is technically OK.
However, I stand by the recommendation to power cycle lost nodes. It's
by far the safest (and easiest) approach. I know this goes against the
grain of sysadmins to yank power, but in an HA setup, nodes should be
disposable and replaceable. The nodes are not important, the hosted
services are.

Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?