[CentOS] CentOS 6.5 RHCS fence loops

Wed Oct 29 14:30:10 UTC 2014

In 2-node clusters, never allow cman or rgmanager to start on boot. A 
node will reboot for two reasons; it was fenced or it is scheduled 
maintenance. In the former case, you want to review it before restoring 
it. In the later case, a human is there to start it already. This is 
good advice for 3+ clusters as well.

As an aside, the default timeout to wait for the peer on start is 6 
seconds, which I find to be too short. I up it to 30 seconds with:

<fence_daemon post_join_delay="30" />

As for the fence-on-start, it could be a network issue. Have you tried 
unicast instead of multicast? Try this:

<cman transport="udpu" expected_votes="1" two_node="1" />

Slight comment;

 > When cluster being quorum,

Nodes are always quorate in 2-node clusters.

digimer

On 29/10/14 04:44 AM, aditya hilman wrote:
> Hi Guys,
>
> I'm using centos 6.5 as guest on RHEV and rhcs for cluster web environment.
> The environtment :
> web1.example.com
> web2.example.com
>
> When cluster being quorum, the web1 reboots by web2. When web2 is going up,
> web2 reboots by web1.
> Does anybody know how to solving this "fence loop" ?
> master_wins="1" is not working properly, qdisk also.
> Below the cluster.conf, I re-create "fresh" cluster, but the fence loop is
> still exist.
>
> <?xml version="1.0"?>
> <cluster config_version="7" name="web-cluster">
>          <clusternodes>
>                  <clusternode name="web2.cluster" nodeid="1">
>                          <fence>
>                                  <method name="fence-web2">
>                                          <device name="fence-rhevm"
> port="web2.cluster"/>
>                                  </method>
>                          </fence>
>                  </clusternode>
>                  <clusternode name="web3.cluster" nodeid="2">
>                          <fence>
>                                  <method name="fence-web3">
>                                          <device name="fence-rhevm"
> port="web3.cluster"/>
>                                  </method>
>                          </fence>
>                  </clusternode>
>          </clusternodes>
>          <cman expected_votes="1" two_node="1"/>
>          <fencedevices>
>                  <fencedevice agent="fence_rhevm" ipaddr="192.168.1.1"
> login="admin at internal" name="fence-rhevm" passwd="secret" ssl="on"/>
>          </fencedevices>
> </cluster>
>
>
> Log : /var/log/messages
> Oct 29 07:34:04 web2 corosync[1182]:   [QUORUM] Members[1]: 1
> Oct 29 07:34:04 web2 corosync[1182]:   [QUORUM] Members[1]: 1
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster dev 0.0 agent
> fence_rhevm result: error from agent
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster dev 0.0 agent
> fence_rhevm result: error from agent
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster failed
> Oct 29 07:34:08 web2 fenced[1242]: fence web3.cluster failed
> Oct 29 07:34:12 web2 fenced[1242]: fence web3.cluster success
> Oct 29 07:34:12 web2 fenced[1242]: fence web3.cluster success
> Oct 29 07:34:12 web2 clvmd: Cluster LVM daemon started - connected to CMAN
> Oct 29 07:34:12 web2 clvmd: Cluster LVM daemon started - connected to CMAN
> Oct 29 07:34:12 web2 rgmanager[1790]: I am node #1
> Oct 29 07:34:12 web2 rgmanager[1790]: I am node #1
> Oct 29 07:34:12 web2 rgmanager[1790]: Resource Group Manager Starting
> Oct 29 07:34:12 web2 rgmanager[1790]: Resource Group Manager Starting
>
>
> Thanks
>

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?