Digimer wrote: > On 16/06/14 02:19 PM, John R Pierce wrote: >> On 6/16/2014 10:55 AM, Digimer wrote: >>> The main downside to fabric fencing is that the failed node will have >>> no >>> chance of recovering without human intervention. Further, it places the >>> onus on the admin to not simply unfence the node without first doing >>> proper cleanup/recovery. For these reasons, I always recommend power >>> fencing (IPMI, PDUs, etc). >> >> how does power fencing change your first 2 statements in any fashion ? >> as I see it, it would make manual recovery even harder, as you couldn't >> even power up the failed system without first disconnecting it from the >> network >> >> When I have used network fencing, I've left the admin ports live, that >> way, the operator can access the system console to find out WHY it is >> fubar, and put it in a proper state for recovery. of course, this >> implies you have several LAN connections, which is always a good idea >> for a clustered system anyways. > > Most power fencing methods are set to "reboot", which is "off -> verify > -> try to boot", with the "try to boot" part not effecting success of > the overall fence call. In my experience (dozens of clusters going back > to 2009), this has always left the nodes booted, save for cases where > the node itself had totally failed. I also do not start the cluster on > boot in most cases, so the node is there and waiting for an admin to > login, in a clean state (no concept of cluster state in memory, thanks > to the reboot). > > If you're curious, this is how I build my clusters. It also goes into > length on the fencing topology and rationale: > > https://alteeve.ca/w/AN!Cluster_Tutorial_2 One can also set the cluster nodes to failover, and when the failed node comes up, to *not* try to take back the services, leaving it in a state for you to fix it. mark, first work on h/a clusters 1997-2001