Hello Digimer,
i'm already configured cluster.conf like your advice, but when start cman manually on web3 ( cman already stopped ), web2 fenced by web3. Here the log on web3 : Oct 29 14:38:42 web3 ricci[2557]: Executing '/usr/bin/virsh nodeinfo' Oct 29 14:38:42 web3 ricci[2557]: Executing '/usr/bin/virsh nodeinfo' Oct 29 14:38:42 web3 ricci[2559]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1604501608' Oct 29 14:38:42 web3 ricci[2559]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1604501608' Oct 29 14:38:42 web3 modcluster: Updating cluster.conf Oct 29 14:38:42 web3 modcluster: Updating cluster.conf Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service. Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Corosync built-in features: nss dbus rdma snmp Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully parsed cman config Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Successfully parsed cman config Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing transport (UDP/IP Unicast). Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing transport (UDP/IP Unicast). Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] The network interface [10.32.6.194] is now up. Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] The network interface [10.32.6.194] is now up. Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider quorum_cman Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider quorum_cman Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] CMAN 3.0.12.1 (built Sep 25 2014 15:07:47) started Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] CMAN 3.0.12.1 (built Sep 25 2014 15:07:47) started Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: openais checkpoint service B.01.01 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: openais checkpoint service B.01.01 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync configuration service Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync configuration service Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync profile loading service Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync profile loading service Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider quorum_cman Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Using quorum provider quorum_cman Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Oct 29 14:39:05 web3 corosync[2651]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member {10.32.6.153} Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member {10.32.6.153} Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member {10.32.6.194} Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] adding new UDPU member {10.32.6.194} Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 29 14:39:05 web3 corosync[2651]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] quorum regained, resuming activity Oct 29 14:39:05 web3 corosync[2651]: [CMAN ] quorum regained, resuming activity Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] This node is within the primary component and will provide service. Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] This node is within the primary component and will provide service. Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2 Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2 Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2 Oct 29 14:39:05 web3 corosync[2651]: [QUORUM] Members[1]: 2 Oct 29 14:39:05 web3 corosync[2651]: [CPG ] chosen downlist: sender r(0) ip(10.32.6.194) ; members(old:0 left:0) Oct 29 14:39:05 web3 corosync[2651]: [CPG ] chosen downlist: sender r(0) ip(10.32.6.194) ; members(old:0 left:0) Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Completed service synchronization, ready to provide service. Oct 29 14:39:05 web3 corosync[2651]: [MAIN ] Completed service synchronization, ready to provide service. Oct 29 14:39:09 web3 fenced[2708]: fenced 3.0.12.1 started Oct 29 14:39:09 web3 fenced[2708]: fenced 3.0.12.1 started Oct 29 14:39:09 web3 dlm_controld[2734]: dlm_controld 3.0.12.1 started Oct 29 14:39:09 web3 dlm_controld[2734]: dlm_controld 3.0.12.1 started Oct 29 14:39:09 web3 gfs_controld[2781]: gfs_controld 3.0.12.1 started Oct 29 14:39:09 web3 gfs_controld[2781]: gfs_controld 3.0.12.1 started Oct 29 14:40:24 web3 fenced[2708]: fencing node web2.cluster Oct 29 14:40:24 web3 fenced[2708]: fencing node web2.cluster Oct 29 14:40:29 web3 fenced[2708]: fence web2.cluster success Oct 29 14:40:29 web3 fenced[2708]: fence web2.cluster success
I'm not configure corosync.conf cluster.conf : <?xml version="1.0"?> <cluster config_version="8" name="web-cluster"> <clusternodes> <clusternode name="web2.cluster" nodeid="1"> <fence> <method name="fence-web2"> <device name="fence-rhevm" port="web2.cluster"/> </method> </fence> </clusternode> <clusternode name="web3.cluster" nodeid="2"> <fence> <method name="fence-web3"> <device name="fence-rhevm" port="web3.cluster"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" transport="udpu" two_node="1"/> <fencedevices> <fencedevice agent="fence_rhevm" ipaddr="192.168.1.1" login="admin@internal" name="fence-rhevm" passwd="secret" ssl="on"/> </fencedevices> <fence_daemon post_join_delay="30"/> </cluster>
Thanks
On Wed, Oct 29, 2014 at 9:33 PM, Digimer lists@alteeve.ca wrote:
On 29/10/14 09:33 AM, aditya hilman wrote:
Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent fence_rhevm result: error from agent Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent fence_rhevm result: error from agent Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success
It didn't see the other node on boot, gave up and fenced the peer, it seems. The fence call failed before it succeeded, another sign of a general network issue.
As an aside, did you configure corosync.conf? If so, don't. Let cman handle everything.
Are you starting cman on both nodes at (close to) exactly the same time?
-- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos