Il 17/06/2014 16:32, Digimer ha scritto:
On 17/06/14 10:23 AM, Denniston, Todd A CIV NAVSURFWARCENDIV Crane wrote:
-----Original Message----- From: Digimer [mailto:lists@alteeve.ca] Sent: Monday, June 16, 2014 3:20 PM To: CentOS mailing list Subject: Re: [CentOS] Question about clustering
On 16/06/14 02:55 PM, m.roth@5-cent.us wrote:
<SNIP> >> One can also set the cluster nodes to failover, and when the failed node >> comes up, to *not* try to take back the services, leaving it in a state >> for you to fix it. >> >> mark, first work on h/a clusters 1997-2001 > > Failover and recovery are secondary to fencing. The surviving node(s) > can't begin recovery until the lost node is in a known state. To make an > assumption about the node's state (by, for example, assuming that no > access to the node is sufficient to determine it is off) is to risk a > split-brain. Even something as relatively "minor" as a floating IP can > potentially cause problems with ARP, for example. > > Cheers
Having operated a file serving cluster for a few years (~2001-2006) without ANY fencing device, I can tell you that it causes split-brain in the admins too, i.e., I AGREE.
To which I can use the analogy that in the 18 years I've driven a car, I've never needed my seat belt or airbags. I still put my seatbelt on every time I go anywhere though, and I won't buy a car without airbags. ;)
Earlier, Alessandro Baggi wrote:
there is a chance to make fencing without hardware, but only software?
To which Digimer, answered: No. <SNIP info about fence device independence>
However, there is an *Almost* software only fence.
If you goal is high-availability, there is a strong argument that "almost" isn't enough.
Unfortunately for me I learned about (or at least understood) the stonith devices late in the above system's life. I expect even meatware stonith[1] could have saved me considerable pain five or six times.
Manual fencing was dropped as a supported fence method in RHEL 6 because it was too prone to human mistakes. When an HA cluster is hung and an admin who might not have touched the cluster in months has users and managers yelling at them, mistakes with potentially massive consequences happen.
Manual fencing is just not safe.
Understand that I am not recommending meatware stonith to be a good operational stonith device, see [2] for how much subtle understanding the meat has to have, but it would be much better than NO operational stonith device.
Bingo on the meat, disagree on "no stonith" at all. A cluster must have fencing.
[1] http://clusterlabs.org/doc/crm_fencing.html#_meatware [2] http://oss.clusterlabs.org/pipermail/pacemaker/2011-June/010693.html
Even when this disclaimer is not here: I am not a contracting officer. I do not have authority to make or modify the terms of any contract.
Cheers
Ok, fencing is a requirement for a cluster for hardware failure. I've another question about this arg, but for software failure. Supposing to have a cluster of httpd installation on 6 virtualized hosts, each one on a different server. Suppose also that a guest (named host6) has a problem and can't start apache. With this scenario, the ipmi, ups are unnecessary. How to work fencing in this way? How to make fencing node?
Thanks in advance.
Alessandro.