[CentOS] CentOS 6.5 RHCS fence loops

Wed Oct 29 14:49:42 UTC 2014
aditya hilman <aditya.hilman at gmail.com>

Hello Digimer,

i'm already configured cluster.conf like your advice, but when start cman
manually on web3 ( cman already stopped ), web2 fenced by web3.
Here the log on web3 :
Oct 29 14:38:42 web3 ricci[2557]: Executing '/usr/bin/virsh nodeinfo'
Oct 29 14:38:42 web3 ricci[2557]: Executing '/usr/bin/virsh nodeinfo'
Oct 29 14:38:42 web3 ricci[2559]: Executing
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1604501608'
Oct 29 14:38:42 web3 ricci[2559]: Executing
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1604501608'
Oct 29 14:38:42 web3 modcluster: Updating cluster.conf
Oct 29 14:38:42 web3 modcluster: Updating cluster.conf
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Successfully parsed cman
config
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Successfully parsed cman
config
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] Initializing transport
(UDP/IP Unicast).
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] Initializing transport
(UDP/IP Unicast).
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] The network interface
[10.32.6.194] is now up.
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] The network interface
[10.32.6.194] is now up.
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]:   [CMAN  ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 14:39:05 web3 corosync[2651]:   [CMAN  ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony service
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony service
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync configuration service
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync configuration service
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster config database access v1.01
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster config database access v1.01
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync profile loading service
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync profile loading service
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] adding new UDPU member
{10.32.6.153}
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] adding new UDPU member
{10.32.6.153}
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] adding new UDPU member
{10.32.6.194}
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] adding new UDPU member
{10.32.6.194}
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 14:39:05 web3 corosync[2651]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 14:39:05 web3 corosync[2651]:   [CMAN  ] quorum regained, resuming
activity
Oct 29 14:39:05 web3 corosync[2651]:   [CMAN  ] quorum regained, resuming
activity
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] This node is within the
primary component and will provide service.
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] This node is within the
primary component and will provide service.
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]:   [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]:   [CPG   ] chosen downlist: sender
r(0) ip(10.32.6.194) ; members(old:0 left:0)
Oct 29 14:39:05 web3 corosync[2651]:   [CPG   ] chosen downlist: sender
r(0) ip(10.32.6.194) ; members(old:0 left:0)
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Oct 29 14:39:05 web3 corosync[2651]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Oct 29 14:39:09 web3 fenced[2708]: fenced 3.0.12.1 started
Oct 29 14:39:09 web3 fenced[2708]: fenced 3.0.12.1 started
Oct 29 14:39:09 web3 dlm_controld[2734]: dlm_controld 3.0.12.1 started
Oct 29 14:39:09 web3 dlm_controld[2734]: dlm_controld 3.0.12.1 started
Oct 29 14:39:09 web3 gfs_controld[2781]: gfs_controld 3.0.12.1 started
Oct 29 14:39:09 web3 gfs_controld[2781]: gfs_controld 3.0.12.1 started
Oct 29 14:40:24 web3 fenced[2708]: fencing node web2.cluster
Oct 29 14:40:24 web3 fenced[2708]: fencing node web2.cluster
Oct 29 14:40:29 web3 fenced[2708]: fence web2.cluster success
Oct 29 14:40:29 web3 fenced[2708]: fence web2.cluster success


I'm not configure corosync.conf
cluster.conf :
<?xml version="1.0"?>
<cluster config_version="8" name="web-cluster">
        <clusternodes>
                <clusternode name="web2.cluster" nodeid="1">
                        <fence>
                                <method name="fence-web2">
                                        <device name="fence-rhevm"
port="web2.cluster"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="web3.cluster" nodeid="2">
                        <fence>
                                <method name="fence-web3">
                                        <device name="fence-rhevm"
port="web3.cluster"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
<cman expected_votes="1" transport="udpu" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_rhevm" ipaddr="192.168.1.1"
login="admin at internal" name="fence-rhevm" passwd="secret" ssl="on"/>
        </fencedevices>
<fence_daemon post_join_delay="30"/>
</cluster>


Thanks

On Wed, Oct 29, 2014 at 9:33 PM, Digimer <lists at alteeve.ca> wrote:

> On 29/10/14 09:33 AM, aditya hilman wrote:
>
>> Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started
>> Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started
>> Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started
>> Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started
>> Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started
>> Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started
>> Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent
>> fence_rhevm result: error from agent
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent
>> fence_rhevm result: error from agent
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed
>> Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success
>> Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success
>>
>
> It didn't see the other node on boot, gave up and fenced the peer, it
> seems. The fence call failed before it succeeded, another sign of a general
> network issue.
>
> As an aside, did you configure corosync.conf? If so, don't. Let cman
> handle everything.
>
> Are you starting cman on both nodes at (close to) exactly the same time?
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



-- 
Regards,
Adit
http://a <http://simplyaddo.web.id>dityahilman.com
http://id.linkedin.com/in/adityahilman
ym : science2rule