[CentOS] Question about clustering

Wed Jun 18 16:32:27 UTC 2014
Alessandro Baggi <alessandro.baggi at gmail.com>

Il 17/06/2014 16:32, Digimer ha scritto:
> On 17/06/14 10:23 AM, Denniston, Todd A CIV NAVSURFWARCENDIV Crane wrote:
>>> -----Original Message-----
>>> From: Digimer [mailto:lists at alteeve.ca]
>>> Sent: Monday, June 16, 2014 3:20 PM
>>> To: CentOS mailing list
>>> Subject: Re: [CentOS] Question about clustering
>>>
>>> On 16/06/14 02:55 PM, m.roth at 5-cent.us wrote:
>> <SNIP>
>>>> One can also set the cluster nodes to failover, and when the failed node
>>>> comes up, to *not* try to take back the services, leaving it in a state
>>>> for you to fix it.
>>>>
>>>>            mark, first work on h/a clusters 1997-2001
>>>
>>> Failover and recovery are secondary to fencing. The surviving node(s)
>>> can't begin recovery until the lost node is in a known state. To make an
>>> assumption about the node's state (by, for example, assuming that no
>>> access to the node is sufficient to determine it is off) is to risk a
>>> split-brain. Even something as relatively "minor" as a floating IP can
>>> potentially cause problems with ARP, for example.
>>>
>>> Cheers
>>
>> Having operated a file serving cluster for a few years (~2001-2006) without ANY fencing device, I can tell you that it causes split-brain in the admins too, i.e., I AGREE.
>
> To which I can use the analogy that in the 18 years I've driven a car,
> I've never needed my seat belt or airbags. I still put my seatbelt on
> every time I go anywhere though, and I won't buy a car without airbags. ;)
>
>> Earlier, Alessandro Baggi wrote:
>>> there is a chance to make fencing without hardware, but only software?
>> To which Digimer, answered: No. <SNIP info about fence device independence>
>>
>> However, there is an *Almost* software only fence.
>
> If you goal is high-availability, there is a strong argument that
> "almost" isn't enough.
>
>>    Unfortunately  for me I learned about (or at least understood) the stonith devices late in the above system's life.  I expect even meatware stonith[1]  could have saved me considerable pain five or six times.
>
> Manual fencing was dropped as a supported fence method in RHEL 6 because
> it was too prone to human mistakes. When an HA cluster is hung and an
> admin who might not have touched the cluster in months has users and
> managers yelling at them, mistakes with potentially massive consequences
> happen.
>
> Manual fencing is just not safe.
>
>> Understand that I am not recommending meatware stonith to be a good operational stonith device, see [2] for how much subtle understanding the meat has to have, but it would be much better than NO operational stonith device.
>
> Bingo on the meat, disagree on "no stonith" at all. A cluster must have
> fencing.
>
>> [1] http://clusterlabs.org/doc/crm_fencing.html#_meatware
>> [2] http://oss.clusterlabs.org/pipermail/pacemaker/2011-June/010693.html
>>
>> Even when this disclaimer is not here:
>> I am not a contracting officer. I do not have authority to make or modify the terms of any contract.
>
> Cheers
>

Ok, fencing is a requirement for a cluster for hardware failure.
I've  another question about this arg, but for software failure.
Supposing to have a cluster of httpd installation on 6 virtualized 
hosts, each one on a different server. Suppose also that a guest (named 
host6) has a problem and can't start apache. With this scenario, the 
ipmi, ups are unnecessary. How to work fencing in this way? How to make 
fencing node?

Thanks in advance.

Alessandro.