> -----Original Message----- > From: Digimer [mailto:lists at alteeve.ca] > Sent: Monday, June 16, 2014 3:20 PM > To: CentOS mailing list > Subject: Re: [CentOS] Question about clustering > > On 16/06/14 02:55 PM, m.roth at 5-cent.us wrote: <SNIP> > > One can also set the cluster nodes to failover, and when the failed node > > comes up, to *not* try to take back the services, leaving it in a state > > for you to fix it. > > > > mark, first work on h/a clusters 1997-2001 > > Failover and recovery are secondary to fencing. The surviving node(s) > can't begin recovery until the lost node is in a known state. To make an > assumption about the node's state (by, for example, assuming that no > access to the node is sufficient to determine it is off) is to risk a > split-brain. Even something as relatively "minor" as a floating IP can > potentially cause problems with ARP, for example. > > Cheers Having operated a file serving cluster for a few years (~2001-2006) without ANY fencing device, I can tell you that it causes split-brain in the admins too, i.e., I AGREE. Earlier, Alessandro Baggi wrote: > there is a chance to make fencing without hardware, but only software? To which Digimer, answered: No. <SNIP info about fence device independence> However, there is an *Almost* software only fence. Unfortunately for me I learned about (or at least understood) the stonith devices late in the above system's life. I expect even meatware stonith[1] could have saved me considerable pain five or six times. Understand that I am not recommending meatware stonith to be a good operational stonith device, see [2] for how much subtle understanding the meat has to have, but it would be much better than NO operational stonith device. [1] http://clusterlabs.org/doc/crm_fencing.html#_meatware [2] http://oss.clusterlabs.org/pipermail/pacemaker/2011-June/010693.html Even when this disclaimer is not here: I am not a contracting officer. I do not have authority to make or modify the terms of any contract.