[CentOS] Fwd: HA cluster - strange communication between nodes

Wed Jan 15 10:56:55 UTC 2014
Martin Moravcik <centos at datalock.sk>

On 14/01/14 19:37, marlon guao wrote:
> Hi Martin.
>
> if you could provide us your config like, put the output of the command
> below.
>
> pcs configure show
>
> or
>
> crm configure show
>
> maybe we could get a better idea of your setup.
>
>
> On 01/14/2014 06:34 PM, Giorgio Bersano wrote:
>> 2014/1/13 Martin Moravcik <centos at datalock.sk>:
>>> I'm sorry.
>>> My systems are fully updated CentOS 6.5.
>>> I'm using only standard centos repositories.
>>>
>>> martin
>>>
>>> On 13/01/14 15:17, Patrick Lists wrote:
>>>> On 13-01-14 14:52, Martin Moravcik wrote:
>>>>> Hi,
>>>>>
>>>>> For a testing purposes I'm trying to create two node HA environment for
>>>>> running some service (openvpn and haproxy). I installed two CentOS 6.4
>>>>> KVM guests.
>>>> Iirc CentOS 6.5 came with several updates to cluster related packages so
>>>> you may want to investigate and update to 6.5.
>>>>
>>>> Regards,
>>>> Patrick
>>>>
>> Hy Martin,
>> I've not looked carefully at what your problem is and don't know how
>> skilled in HA you are but I heartily suggest you - if you haven't done
>> before - to read/study Digimer's tutorial
>> https://alteeve.ca/w/AN!Cluster_Tutorial_2
>>
>> I think it's unbeatable!
>>
>> Best regards,
>> Giorgio
>> _______________________________________________
>> CentOS mailing list
>> CentOS at centos.org
>> http://lists.centos.org/mailman/listinfo/centos
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

Thanks for your interest and for your help.
Here is the output from command (pcs config show)

[root at lb1 ~]# pcs config show
Cluster Name: LB.STK
Corosync Nodes:

Pacemaker Nodes:
  lb1.asol.local lb2.asol.local

Resources:
  Group: LB
   Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: ip=172.16.139.113 cidr_netmask=24 nic=eth1
    Operations: monitor interval=15s (LAN.VIP-monitor-interval-15s)
   Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: ip=172.16.139.110 cidr_netmask=24 nic=eth0
    Operations: monitor interval=15s (WAN.VIP-monitor-interval-15s)
   Resource: OPENVPN (class=lsb type=openvpn)
    Operations: monitor interval=20s (OPENVPN-monitor-interval-20s)
                start interval=0s timeout=20s (OPENVPN-start-timeout-20s)
                stop interval=0s timeout=20s (OPENVPN-stop-timeout-20s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:

Cluster Properties:
  cluster-infrastructure: cman
  dc-version: 1.1.10-14.el6_5.1-368c726
  stonith-enabled: false


When I start cluster after reboot of both nodes, everythings looks fine. 
But when shoot command "pcs resource delete OPENVPN" from node lb1 in 
the log starts to popup these lines:
Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202
Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203
Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204
Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206
Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208
Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209

I also noticed, that these retransmit entries starts to appear even 
after some time (7 minutes) from fresh cluster start without doing any 
change or manipulation with cluster.

Thanks

martin