[CentOS] CentOS 6.5 RHCS fence loops
aditya hilman
aditya.hilman at gmail.com
Wed Oct 29 14:49:42 UTC 2014
Hello Digimer,
i'm already configured cluster.conf like your advice, but when start cman
manually on web3 ( cman already stopped ), web2 fenced by web3.
Here the log on web3 :
Oct 29 14:38:42 web3 ricci[2557]: Executing '/usr/bin/virsh nodeinfo'
Oct 29 14:38:42 web3 ricci[2557]: Executing '/usr/bin/virsh nodeinfo'
Oct 29 14:38:42 web3 ricci[2559]: Executing
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1604501608'
Oct 29 14:38:42 web3 ricci[2559]: Executing
'/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1604501608'
Oct 29 14:38:42 web3 modcluster: Updating cluster.conf
Oct 29 14:38:42 web3 modcluster: Updating cluster.conf
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully parsed cman
config
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully parsed cman
config
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] The network interface
[10.32.6.194] is now up.
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] The network interface
[10.32.6.194] is now up.
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync extended virtual synchrony service
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync extended virtual synchrony service
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync configuration service
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync configuration service
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster config database access v1.01
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster config database access v1.01
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync profile loading service
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync profile loading service
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider
quorum_cman
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Compatibility mode set to
whitetank. Using V1 and V2 of the synchronization engine.
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Compatibility mode set to
whitetank. Using V1 and V2 of the synchronization engine.
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member
{10.32.6.153}
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member
{10.32.6.153}
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member
{10.32.6.194}
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member
{10.32.6.194}
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] quorum regained, resuming
activity
Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] quorum regained, resuming
activity
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] This node is within the
primary component and will provide service.
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] This node is within the
primary component and will provide service.
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2
Oct 29 14:39:05 web3 corosync[2651]: [CPG ] chosen downlist: sender
r(0) ip(10.32.6.194) ; members(old:0 left:0)
Oct 29 14:39:05 web3 corosync[2651]: [CPG ] chosen downlist: sender
r(0) ip(10.32.6.194) ; members(old:0 left:0)
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Completed service
synchronization, ready to provide service.
Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Completed service
synchronization, ready to provide service.
Oct 29 14:39:09 web3 fenced[2708]: fenced 3.0.12.1 started
Oct 29 14:39:09 web3 fenced[2708]: fenced 3.0.12.1 started
Oct 29 14:39:09 web3 dlm_controld[2734]: dlm_controld 3.0.12.1 started
Oct 29 14:39:09 web3 dlm_controld[2734]: dlm_controld 3.0.12.1 started
Oct 29 14:39:09 web3 gfs_controld[2781]: gfs_controld 3.0.12.1 started
Oct 29 14:39:09 web3 gfs_controld[2781]: gfs_controld 3.0.12.1 started
Oct 29 14:40:24 web3 fenced[2708]: fencing node web2.cluster
Oct 29 14:40:24 web3 fenced[2708]: fencing node web2.cluster
Oct 29 14:40:29 web3 fenced[2708]: fence web2.cluster success
Oct 29 14:40:29 web3 fenced[2708]: fence web2.cluster success
I'm not configure corosync.conf
cluster.conf :
<?xml version="1.0"?>
<cluster config_version="8" name="web-cluster">
<clusternodes>
<clusternode name="web2.cluster" nodeid="1">
<fence>
<method name="fence-web2">
<device name="fence-rhevm"
port="web2.cluster"/>
</method>
</fence>
</clusternode>
<clusternode name="web3.cluster" nodeid="2">
<fence>
<method name="fence-web3">
<device name="fence-rhevm"
port="web3.cluster"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" transport="udpu" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_rhevm" ipaddr="192.168.1.1"
login="admin at internal" name="fence-rhevm" passwd="secret" ssl="on"/>
</fencedevices>
<fence_daemon post_join_delay="30"/>
</cluster>
Thanks
On Wed, Oct 29, 2014 at 9:33 PM, Digimer <lists at alteeve.ca> wrote:
> On 29/10/14 09:33 AM, aditya hilman wrote:
>
>> Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started
>> Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started
>> Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started
>> Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started
>> Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started
>> Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started
>> Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent
>> fence_rhevm result: error from agent
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent
>> fence_rhevm result: error from agent
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed
>> Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed
>> Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster
>> Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success
>> Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success
>>
>
> It didn't see the other node on boot, gave up and fenced the peer, it
> seems. The fence call failed before it succeeded, another sign of a general
> network issue.
>
> As an aside, did you configure corosync.conf? If so, don't. Let cman
> handle everything.
>
> Are you starting cman on both nodes at (close to) exactly the same time?
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
--
Regards,
Adit
http://a <http://simplyaddo.web.id>dityahilman.com
http://id.linkedin.com/in/adityahilman
ym : science2rule
More information about the CentOS
mailing list