[CentOS] Question about clustering

Mon Jun 16 18:40:51 UTC 2014
Digimer <lists at alteeve.ca>

On 16/06/14 02:19 PM, John R Pierce wrote:
> On 6/16/2014 10:55 AM, Digimer wrote:
>> The main downside to fabric fencing is that the failed node will have no
>> chance of recovering without human intervention. Further, it places the
>> onus on the admin to not simply unfence the node without first doing
>> proper cleanup/recovery. For these reasons, I always recommend power
>> fencing (IPMI, PDUs, etc).
>
> how does power fencing change your first 2 statements in any fashion ?
> as I see it, it would make manual recovery even harder, as you couldn't
> even power up the failed system without first disconnecting it from the
> network
>
> When I have used network fencing, I've left the admin ports live, that
> way, the operator can access the system console to find out WHY it is
> fubar, and put it in a proper state for recovery.   of course, this
> implies you have several LAN connections, which is always a good idea
> for a clustered system anyways.

Most power fencing methods are set to "reboot", which is "off -> verify 
-> try to boot", with the "try to boot" part not effecting success of 
the overall fence call. In my experience (dozens of clusters going back 
to 2009), this has always left the nodes booted, save for cases where 
the node itself had totally failed. I also do not start the cluster on 
boot in most cases, so the node is there and waiting for an admin to 
login, in a clean state (no concept of cluster state in memory, thanks 
to the reboot).

If you're curious, this is how I build my clusters. It also goes into 
length on the fencing topology and rationale:

https://alteeve.ca/w/AN!Cluster_Tutorial_2

Cheers


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?