[CentOS] Question about clustering

Mon Jun 16 18:55:02 UTC 2014
m.roth at 5-cent.us <m.roth at 5-cent.us>

Digimer wrote:
> On 16/06/14 02:19 PM, John R Pierce wrote:
>> On 6/16/2014 10:55 AM, Digimer wrote:
>>> The main downside to fabric fencing is that the failed node will have
>>> no
>>> chance of recovering without human intervention. Further, it places the
>>> onus on the admin to not simply unfence the node without first doing
>>> proper cleanup/recovery. For these reasons, I always recommend power
>>> fencing (IPMI, PDUs, etc).
>>
>> how does power fencing change your first 2 statements in any fashion ?
>> as I see it, it would make manual recovery even harder, as you couldn't
>> even power up the failed system without first disconnecting it from the
>> network
>>
>> When I have used network fencing, I've left the admin ports live, that
>> way, the operator can access the system console to find out WHY it is
>> fubar, and put it in a proper state for recovery.   of course, this
>> implies you have several LAN connections, which is always a good idea
>> for a clustered system anyways.
>
> Most power fencing methods are set to "reboot", which is "off -> verify
> -> try to boot", with the "try to boot" part not effecting success of
> the overall fence call. In my experience (dozens of clusters going back
> to 2009), this has always left the nodes booted, save for cases where
> the node itself had totally failed. I also do not start the cluster on
> boot in most cases, so the node is there and waiting for an admin to
> login, in a clean state (no concept of cluster state in memory, thanks
> to the reboot).
>
> If you're curious, this is how I build my clusters. It also goes into
> length on the fencing topology and rationale:
>
> https://alteeve.ca/w/AN!Cluster_Tutorial_2

One can also set the cluster nodes to failover, and when the failed node
comes up, to *not* try to take back the services, leaving it in a state
for you to fix it.

        mark, first work on h/a clusters 1997-2001