[CentOS] HA cluster - strange communication between nodes

Thu Jan 16 06:25:33 UTC 2014
Dennis Jacobfeuerborn <dennisml at conversis.de>

On 16.01.2014 00:29, Leon Fauster wrote:
> Am 15.01.2014 um 11:56 schrieb Martin Moravcik <centos at datalock.sk>:
>>
>> Thanks for your interest and for your help.
>> Here is the output from command (pcs config show)
>>
>> [root at lb1 ~]# pcs config show
>> Cluster Name: LB.STK
>> Corosync Nodes:
>>
>> Pacemaker Nodes:
>>   lb1.asol.local lb2.asol.local
>>
>> Resources:
>>   Group: LB
>>    Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
>>     Attributes: ip=172.16.139.113 cidr_netmask=24 nic=eth1
>>     Operations: monitor interval=15s (LAN.VIP-monitor-interval-15s)
>>    Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
>>     Attributes: ip=172.16.139.110 cidr_netmask=24 nic=eth0
>>     Operations: monitor interval=15s (WAN.VIP-monitor-interval-15s)
>>    Resource: OPENVPN (class=lsb type=openvpn)
>>     Operations: monitor interval=20s (OPENVPN-monitor-interval-20s)
>>                 start interval=0s timeout=20s (OPENVPN-start-timeout-20s)
>>                 stop interval=0s timeout=20s (OPENVPN-stop-timeout-20s)
>>
>> Stonith Devices:
>> Fencing Levels:
>>
>> Location Constraints:
>> Ordering Constraints:
>> Colocation Constraints:
>>
>> Cluster Properties:
>>   cluster-infrastructure: cman
>>   dc-version: 1.1.10-14.el6_5.1-368c726
>>   stonith-enabled: false
>>
>>
>> When I start cluster after reboot of both nodes, everythings looks fine.
>> But when shoot command "pcs resource delete OPENVPN" from node lb1 in
>> the log starts to popup these lines:
>> Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202
>> Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203
>> Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204
>> Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206
>> Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208
>> Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209
>>
>> I also noticed, that these retransmit entries starts to appear even
>> after some time (7 minutes) from fresh cluster start without doing any
>> change or manipulation with cluster.
>
>
> there exists multicast issues on virtual nodes - therefore your bridged network
> will for sure not operate reliable out of the box for HA setups.
>
> try
>
> echo 1 > /sys/class/net/YOURDEVICE/bridge/multicast_querier

For a two node cluster using unicast is probably easier and less error 
prone way.

Regards,
   Dennis