[CentOS] CentOS 6.5 RHCS fence loops

Wed Oct 29 13:33:49 UTC 2014
aditya hilman <aditya.hilman at gmail.com>

Hello Dominic,

Thanks for the response.

when I start cman manually, web3 fenced by web2. Here the logs :
web2 : /var/log/messages/

Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Corosync Cluster Engine
('1.4.1'): started and ready to provide service.
Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Successfully parsed cman
config
Oct 29 13:15:25 web2 corosync[1493]:   [MAIN  ] Successfully parsed cman
config
Oct 29 13:15:25 web2 corosync[1493]:   [TOTEM ] Initializing transport
(UDP/IP Multicast).
Oct 29 13:15:25 web2 corosync[1493]:   [TOTEM ] Initializing transport
(UDP/IP Multicast).
Oct 29 13:15:25 web2 corosync[1493]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 29 13:15:25 web2 corosync[1493]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 29 13:15:26 web2 corosync[1493]:   [TOTEM ] The network interface
[10.32.6.153] is now up.
Oct 29 13:15:26 web2 corosync[1493]:   [TOTEM ] The network interface
[10.32.6.153] is now up.
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:26 web2 corosync[1493]:   [CMAN  ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 13:15:26 web2 corosync[1493]:   [CMAN  ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony service
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony service
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync configuration service
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync configuration service
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster config database access v1.01
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster config database access v1.01
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync profile loading service
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync profile loading service
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:26 web2 corosync[1493]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:26 web2 corosync[1493]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Oct 29 13:15:26 web2 corosync[1493]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Oct 29 13:15:26 web2 corosync[1493]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 13:15:26 web2 corosync[1493]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 13:15:26 web2 corosync[1493]:   [CMAN  ] quorum regained, resuming
activity
Oct 29 13:15:26 web2 corosync[1493]:   [CMAN  ] quorum regained, resuming
activity
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] This node is within the
primary component and will provide service.
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] This node is within the
primary component and will provide service.
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Members[1]: 1
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Members[1]: 1
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Members[1]: 1
Oct 29 13:15:26 web2 corosync[1493]:   [QUORUM] Members[1]: 1
Oct 29 13:15:26 web2 corosync[1493]:   [CPG   ] chosen downlist: sender
r(0) ip(10.32.6.153) ; members(old:0 left:0)
Oct 29 13:15:26 web2 corosync[1493]:   [CPG   ] chosen downlist: sender
r(0) ip(10.32.6.153) ; members(old:0 left:0)
Oct 29 13:15:26 web2 corosync[1493]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Oct 29 13:15:26 web2 corosync[1493]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started
Oct 29 13:15:30 web2 fenced[1548]: fenced 3.0.12.1 started
Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started
Oct 29 13:15:30 web2 dlm_controld[1568]: dlm_controld 3.0.12.1 started
Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started
Oct 29 13:15:30 web2 gfs_controld[1621]: gfs_controld 3.0.12.1 started
Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster
Oct 29 13:16:21 web2 fenced[1548]: fencing node web3.cluster
Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent
fence_rhevm result: error from agent
Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster dev 0.0 agent
fence_rhevm result: error from agent
Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed
Oct 29 13:16:24 web2 fenced[1548]: fence web3.cluster failed
Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster
Oct 29 13:16:27 web2 fenced[1548]: fencing node web3.cluster
Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success
Oct 29 13:16:29 web2 fenced[1548]: fence web3.cluster success

---
web3 : /var/log/messages
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Corosync Cluster Engine
('1.4.1'): started and ready to provi
de service.
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Corosync Cluster Engine
('1.4.1'): started and ready to provi
de service.
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Corosync built-in features:
nss dbus rdma snmp
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Successfully read config
from /etc/cluster/cluster.conf
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Successfully parsed cman
config
Oct 29 13:15:26 web3 corosync[1526]:   [MAIN  ] Successfully parsed cman
config
Oct 29 13:15:26 web3 corosync[1526]:   [TOTEM ] Initializing transport
(UDP/IP Multicast).
Oct 29 13:15:26 web3 corosync[1526]:   [TOTEM ] Initializing transport
(UDP/IP Multicast).
Oct 29 13:15:26 web3 corosync[1526]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/
SHA1HMAC (mode 0).
Oct 29 13:15:26 web3 corosync[1526]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/
SHA1HMAC (mode 0).
Oct 29 13:15:26 web3 corosync[1526]:   [TOTEM ] The network interface
[10.32.6.194] is now up.
Oct 29 13:15:26 web3 corosync[1526]:   [TOTEM ] The network interface
[10.32.6.194] is now up.
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:27 web3 corosync[1526]:   [CMAN  ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 13:15:27 web3 corosync[1526]:   [CMAN  ] CMAN 3.0.12.1 (built Sep 25
2014 15:07:47) started
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync CMAN membership service 2.90
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
openais checkpoint service B.01.01
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony se
rvice
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony se
rvice
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync configuration service
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync configuration service
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster closed process group
service v1.01
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster closed process group
service v1.01
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster config database acces
s v1.01
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster config database acces
s v1.01
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync profile loading service
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync profile loading service
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Using quorum provider
quorum_cman
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:27 web3 corosync[1526]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Oct 29 13:15:27 web3 corosync[1526]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Oct 29 13:15:27 web3 corosync[1526]:   [MAIN  ] Compatibility mode set to
whitetank.  Using V1 and V2 of the synchronization engine.
Oct 29 13:15:27 web3 corosync[1526]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 13:15:27 web3 corosync[1526]:   [TOTEM ] A processor joined or left
the membership and a new membership was formed.
Oct 29 13:15:27 web3 corosync[1526]:   [CMAN  ] quorum regained, resuming
activity
Oct 29 13:15:27 web3 corosync[1526]:   [CMAN  ] quorum regained, resuming
activity
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] This node is within the
primary component and will provide service.
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] This node is within the
primary component and will provide service.
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Members[1]: 2
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Members[1]: 2
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Members[1]: 2
Oct 29 13:15:27 web3 corosync[1526]:   [QUORUM] Members[1]: 2
Oct 29 13:15:27 web3 corosync[1526]:   [CPG   ] chosen downlist: sender
r(0) ip(10.32.6.194) ; members(old:0 left:0)
Oct 29 13:15:27 web3 corosync[1526]:   [CPG   ] chosen downlist: sender
r(0) ip(10.32.6.194) ; members(old:0 left:0)
Oct 29 13:15:27 web3 corosync[1526]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Oct 29 13:15:27 web3 corosync[1526]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Oct 29 13:15:30 web3 fenced[1582]: fenced 3.0.12.1 started
Oct 29 13:15:30 web3 fenced[1582]: fenced 3.0.12.1 started
Oct 29 13:15:31 web3 dlm_controld[1608]: dlm_controld 3.0.12.1 started
Oct 29 13:15:31 web3 dlm_controld[1608]: dlm_controld 3.0.12.1 started
Oct 29 13:15:31 web3 gfs_controld[1655]: gfs_controld 3.0.12.1 started
Oct 29 13:15:31 web3 gfs_controld[1655]: gfs_controld 3.0.12.1 started
Oct 29 13:14:54 web3 kernel: : Events: unsupported p6 CPU model 44 no PMU
driver, software events only.
Oct 29 13:14:54 web3 kernel: : Events: unsupported p6 CPU model 44 no PMU
driver, software events only.
Oct 29 13:14:54 web3 kernel: : NMI watchdog disabled (cpu0): hardware
events not enabled
Oct 29 13:14:54 web3 kernel: : NMI watchdog disabled (cpu0): hardware
events not enabled
Oct 29 13:14:54 web3 kernel: : Brought up 1 CPUs
Oct 29 13:14:54 web3 kernel: : Brought up 1 CPUs



Thanks.

On Wed, Oct 29, 2014 at 5:14 PM, Dominic Geevarghese <share2dom at gmail.com>
wrote:

> Hi,
>
> Does anybody know how to solving this "fence loop" ?
> > master_wins="1" is not working properly, qdisk also.
> >
>
> Logs shared are not sufficient to identify the cause of fence loop. I would
> suggest you to
>
> 1. Disable cman - chkconfig cman off ( and rgmanager also if you wish ) -
> on both the nodes .
> 2. Reboot both the nodes
> 3. Once the machine is up, open two terminals
> 4. Start cman manually on both the nodes
> 5. share the behaviour and logs generated.
>
> Cheers,
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>



-- 
Regards,
Adit
http://a <http://simplyaddo.web.id>dityahilman.com
http://id.linkedin.com/in/adityahilman
ym : science2rule