On 16/06/14 02:19 PM, John R Pierce wrote: > On 6/16/2014 10:55 AM, Digimer wrote: >> The main downside to fabric fencing is that the failed node will have no >> chance of recovering without human intervention. Further, it places the >> onus on the admin to not simply unfence the node without first doing >> proper cleanup/recovery. For these reasons, I always recommend power >> fencing (IPMI, PDUs, etc). > > how does power fencing change your first 2 statements in any fashion ? > as I see it, it would make manual recovery even harder, as you couldn't > even power up the failed system without first disconnecting it from the > network > > When I have used network fencing, I've left the admin ports live, that > way, the operator can access the system console to find out WHY it is > fubar, and put it in a proper state for recovery. of course, this > implies you have several LAN connections, which is always a good idea > for a clustered system anyways. Most power fencing methods are set to "reboot", which is "off -> verify -> try to boot", with the "try to boot" part not effecting success of the overall fence call. In my experience (dozens of clusters going back to 2009), this has always left the nodes booted, save for cases where the node itself had totally failed. I also do not start the cluster on boot in most cases, so the node is there and waiting for an admin to login, in a clean state (no concept of cluster state in memory, thanks to the reboot). If you're curious, this is how I build my clusters. It also goes into length on the fencing topology and rationale: https://alteeve.ca/w/AN!Cluster_Tutorial_2 Cheers -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?