Wed Jun 22 18:12:07 UTC 2016
Chris Adams <linux at cmadams.net>

Once upon a time, Digimer <lists at alteeve.ca> said:
> The cluster software and any hosted services aren't running. It's not
> that they think they're wrong, they just have no existing state so they
> won't try to touch anything without first ensuring it is safe to do so.

Well, I was being short; what I meant was, in HA, if you aren't known to
be right, you are wrong, and you do nothing.

> SCSI reservations, and anything that blocks access is technically OK.
> However, I stand by the recommendation to power cycle lost nodes. It's
> by far the safest (and easiest) approach. I know this goes against the
> grain of sysadmins to yank power, but in an HA setup, nodes should be
> disposable and replaceable. The nodes are not important, the hosted
> services are.

One advantage SCSI reservations have is that if you can access the
storage, you can lock out everybody else.  It doesn't require access to
a switch, management card, etc. (that may have its own problems).  If
you can access the storage, you own it, if you can't, you don't.
Putting a lock directly on the actual shared resource can be the safest
path (if you can't access it, you can't screw it up).

I agree that rebooting a failed node is also good, just pointing out
that putting the lock directly on the shared resource is also good.

Chris Adams <linux at cmadams.net>