Hi,
For a testing purposes I'm trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4 KVM guests.
I was able to create a cluster and some resources. I followed the document https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/...
But my cluster behaves not as expected: After start of cluster sw on both nodes, they can see each other. ---------------------------------------- [root@lb1 ~]# pcs status Cluster name: LB.STK Last updated: Mon Jan 13 15:34:21 2014 Last change: Mon Jan 13 15:24:47 2014 via cibadmin on lb1.asol.local Stack: cman Current DC: lb1.asol.local - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 2 Nodes configured 2 Resources configured
Online: [ lb1.asol.local lb2.asol.local ]
Full list of resources:
Resource Group: LB LAN.VIP (ocf::heartbeat:IPaddr2): Started lb2.asol.local WAN.VIP (ocf::heartbeat:IPaddr2): Started lb2.asol.local ---------------------------------------- After manual shutdown of one node 2 (pcs cluster stop), the node 1 doesn't get this information and still believes node 2 is up and running. In the log of corosync @lb2 these lines are repeating:
Jan 13 15:38:43 [1712] lb2.asol.local cib: info: crm_client_new: Connecting 0x25a3810 for uid=0 gid=0 pid=10763 id=2b06a195-11f6-452d-992b-5ea0c69be21a Jan 13 15:38:43 [1712] lb2.asol.local cib: info: cib_process_request: Completed cib_query operation for section 'all': OK (rc=0, origin=local/crm_resource/2, version=0.7.4) Jan 13 15:38:43 [1712] lb2.asol.local cib: info: crm_client_destroy: Destroying 0 events Jan 13 17:24:24 corosync [TOTEM ] Retransmit List: 9a 9b 9c
The firewall on both nodes is open for incomming traffic from these nodes and stonith-enabled is set to false. I created keys for root user, so I can make ssh back and forth without using password. The pacemaker's version is 1.1.10-14.
Do you have any idea, where might be a problem?
thanks
martin
On 13-01-14 14:52, Martin Moravcik wrote:
Hi,
For a testing purposes I'm trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4 KVM guests.
Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.
Regards, Patrick
I'm sorry. My systems are fully updated CentOS 6.5. I'm using only standard centos repositories.
martin
On 13/01/14 15:17, Patrick Lists wrote:
On 13-01-14 14:52, Martin Moravcik wrote:
Hi,
For a testing purposes I'm trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4 KVM guests.
Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.
Regards, Patrick
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
2014/1/13 Martin Moravcik centos@datalock.sk:
I'm sorry. My systems are fully updated CentOS 6.5. I'm using only standard centos repositories.
martin
On 13/01/14 15:17, Patrick Lists wrote:
On 13-01-14 14:52, Martin Moravcik wrote:
Hi,
For a testing purposes I'm trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4 KVM guests.
Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.
Regards, Patrick
Hy Martin, I've not looked carefully at what your problem is and don't know how skilled in HA you are but I heartily suggest you - if you haven't done before - to read/study Digimer's tutorial https://alteeve.ca/w/AN!Cluster_Tutorial_2
I think it's unbeatable!
Best regards, Giorgio
Hi Martin.
if you could provide us your config like, put the output of the command below.
pcs configure show
or
crm configure show
maybe we could get a better idea of your setup.
On 01/14/2014 06:34 PM, Giorgio Bersano wrote:
2014/1/13 Martin Moravcik centos@datalock.sk:
I'm sorry. My systems are fully updated CentOS 6.5. I'm using only standard centos repositories.
martin
On 13/01/14 15:17, Patrick Lists wrote:
On 13-01-14 14:52, Martin Moravcik wrote:
Hi,
For a testing purposes I'm trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4 KVM guests.
Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.
Regards, Patrick
Hy Martin, I've not looked carefully at what your problem is and don't know how skilled in HA you are but I heartily suggest you - if you haven't done before - to read/study Digimer's tutorial https://alteeve.ca/w/AN!Cluster_Tutorial_2
I think it's unbeatable!
Best regards, Giorgio _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 14/01/14 19:37, marlon guao wrote:
Hi Martin.
if you could provide us your config like, put the output of the command below.
pcs configure show
or
crm configure show
maybe we could get a better idea of your setup.
On 01/14/2014 06:34 PM, Giorgio Bersano wrote:
2014/1/13 Martin Moravcik centos@datalock.sk:
I'm sorry. My systems are fully updated CentOS 6.5. I'm using only standard centos repositories.
martin
On 13/01/14 15:17, Patrick Lists wrote:
On 13-01-14 14:52, Martin Moravcik wrote:
Hi,
For a testing purposes I'm trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4 KVM guests.
Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.
Regards, Patrick
Hy Martin, I've not looked carefully at what your problem is and don't know how skilled in HA you are but I heartily suggest you - if you haven't done before - to read/study Digimer's tutorial https://alteeve.ca/w/AN!Cluster_Tutorial_2
I think it's unbeatable!
Best regards, Giorgio _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thanks for your interest and for your help. Here is the output from command (pcs config show)
[root@lb1 ~]# pcs config show Cluster Name: LB.STK Corosync Nodes:
Pacemaker Nodes: lb1.asol.local lb2.asol.local
Resources: Group: LB Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.113 cidr_netmask=24 nic=eth1 Operations: monitor interval=15s (LAN.VIP-monitor-interval-15s) Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.110 cidr_netmask=24 nic=eth0 Operations: monitor interval=15s (WAN.VIP-monitor-interval-15s) Resource: OPENVPN (class=lsb type=openvpn) Operations: monitor interval=20s (OPENVPN-monitor-interval-20s) start interval=0s timeout=20s (OPENVPN-start-timeout-20s) stop interval=0s timeout=20s (OPENVPN-stop-timeout-20s)
Stonith Devices: Fencing Levels:
Location Constraints: Ordering Constraints: Colocation Constraints:
Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.10-14.el6_5.1-368c726 stonith-enabled: false
When I start cluster after reboot of both nodes, everythings looks fine. But when shoot command "pcs resource delete OPENVPN" from node lb1 in the log starts to popup these lines: Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202 Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203 Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204 Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209
I also noticed, that these retransmit entries starts to appear even after some time (7 minutes) from fresh cluster start without doing any change or manipulation with cluster.
Thanks
martin
Hi Martin.
for how long you turned off the other node? I suspect that you need to configure time-outs to the cluster. Additional cluster parameters can be found here.
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/...
On Wed, Jan 15, 2014 at 6:56 PM, Martin Moravcik centos@datalock.sk wrote:
On 14/01/14 19:37, marlon guao wrote:
Hi Martin.
if you could provide us your config like, put the output of the command below.
pcs configure show
or
crm configure show
maybe we could get a better idea of your setup.
On 01/14/2014 06:34 PM, Giorgio Bersano wrote:
2014/1/13 Martin Moravcik centos@datalock.sk:
I'm sorry. My systems are fully updated CentOS 6.5. I'm using only standard centos repositories.
martin
On 13/01/14 15:17, Patrick Lists wrote:
On 13-01-14 14:52, Martin Moravcik wrote:
Hi,
For a testing purposes I'm trying to create two node HA environment
for
running some service (openvpn and haproxy). I installed two CentOS
6.4
KVM guests.
Iirc CentOS 6.5 came with several updates to cluster related packages
so
you may want to investigate and update to 6.5.
Regards, Patrick
Hy Martin, I've not looked carefully at what your problem is and don't know how skilled in HA you are but I heartily suggest you - if you haven't done before - to read/study Digimer's tutorial https://alteeve.ca/w/AN!Cluster_Tutorial_2
I think it's unbeatable!
Best regards, Giorgio _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thanks for your interest and for your help. Here is the output from command (pcs config show)
[root@lb1 ~]# pcs config show Cluster Name: LB.STK Corosync Nodes:
Pacemaker Nodes: lb1.asol.local lb2.asol.local
Resources: Group: LB Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.113 cidr_netmask=24 nic=eth1 Operations: monitor interval=15s (LAN.VIP-monitor-interval-15s) Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.110 cidr_netmask=24 nic=eth0 Operations: monitor interval=15s (WAN.VIP-monitor-interval-15s) Resource: OPENVPN (class=lsb type=openvpn) Operations: monitor interval=20s (OPENVPN-monitor-interval-20s) start interval=0s timeout=20s (OPENVPN-start-timeout-20s) stop interval=0s timeout=20s (OPENVPN-stop-timeout-20s)
Stonith Devices: Fencing Levels:
Location Constraints: Ordering Constraints: Colocation Constraints:
Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.10-14.el6_5.1-368c726 stonith-enabled: false
When I start cluster after reboot of both nodes, everythings looks fine. But when shoot command "pcs resource delete OPENVPN" from node lb1 in the log starts to popup these lines: Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202 Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203 Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204 Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209
I also noticed, that these retransmit entries starts to appear even after some time (7 minutes) from fresh cluster start without doing any change or manipulation with cluster.
Thanks
martin
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Am 15.01.2014 um 11:56 schrieb Martin Moravcik centos@datalock.sk:
Thanks for your interest and for your help. Here is the output from command (pcs config show)
[root@lb1 ~]# pcs config show Cluster Name: LB.STK Corosync Nodes:
Pacemaker Nodes: lb1.asol.local lb2.asol.local
Resources: Group: LB Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.113 cidr_netmask=24 nic=eth1 Operations: monitor interval=15s (LAN.VIP-monitor-interval-15s) Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.110 cidr_netmask=24 nic=eth0 Operations: monitor interval=15s (WAN.VIP-monitor-interval-15s) Resource: OPENVPN (class=lsb type=openvpn) Operations: monitor interval=20s (OPENVPN-monitor-interval-20s) start interval=0s timeout=20s (OPENVPN-start-timeout-20s) stop interval=0s timeout=20s (OPENVPN-stop-timeout-20s)
Stonith Devices: Fencing Levels:
Location Constraints: Ordering Constraints: Colocation Constraints:
Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.10-14.el6_5.1-368c726 stonith-enabled: false
When I start cluster after reboot of both nodes, everythings looks fine. But when shoot command "pcs resource delete OPENVPN" from node lb1 in the log starts to popup these lines: Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202 Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203 Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204 Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209
I also noticed, that these retransmit entries starts to appear even after some time (7 minutes) from fresh cluster start without doing any change or manipulation with cluster.
there exists multicast issues on virtual nodes - therefore your bridged network will for sure not operate reliable out of the box for HA setups.
try
echo 1 > /sys/class/net/YOURDEVICE/bridge/multicast_querier
-- LF
On 16.01.2014 00:29, Leon Fauster wrote:
Am 15.01.2014 um 11:56 schrieb Martin Moravcik centos@datalock.sk:
Thanks for your interest and for your help. Here is the output from command (pcs config show)
[root@lb1 ~]# pcs config show Cluster Name: LB.STK Corosync Nodes:
Pacemaker Nodes: lb1.asol.local lb2.asol.local
Resources: Group: LB Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.113 cidr_netmask=24 nic=eth1 Operations: monitor interval=15s (LAN.VIP-monitor-interval-15s) Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=172.16.139.110 cidr_netmask=24 nic=eth0 Operations: monitor interval=15s (WAN.VIP-monitor-interval-15s) Resource: OPENVPN (class=lsb type=openvpn) Operations: monitor interval=20s (OPENVPN-monitor-interval-20s) start interval=0s timeout=20s (OPENVPN-start-timeout-20s) stop interval=0s timeout=20s (OPENVPN-stop-timeout-20s)
Stonith Devices: Fencing Levels:
Location Constraints: Ordering Constraints: Colocation Constraints:
Cluster Properties: cluster-infrastructure: cman dc-version: 1.1.10-14.el6_5.1-368c726 stonith-enabled: false
When I start cluster after reboot of both nodes, everythings looks fine. But when shoot command "pcs resource delete OPENVPN" from node lb1 in the log starts to popup these lines: Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202 Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203 Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204 Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209
I also noticed, that these retransmit entries starts to appear even after some time (7 minutes) from fresh cluster start without doing any change or manipulation with cluster.
there exists multicast issues on virtual nodes - therefore your bridged network will for sure not operate reliable out of the box for HA setups.
try
echo 1 > /sys/class/net/YOURDEVICE/bridge/multicast_querier
For a two node cluster using unicast is probably easier and less error prone way.
Regards, Dennis