[CentOS] Fwd: HA cluster - strange communication between nodes

Mon Jan 13 13:52:43 UTC 2014
Martin Moravcik <centos at datalock.sk>

Hi,

For a testing purposes I'm trying to create two node HA environment for
running some service (openvpn and haproxy). I installed two CentOS 6.4
KVM guests.

I was able to create a cluster and some resources. I followed the document
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/index.html

But my cluster behaves not as expected:
After start of cluster sw on both nodes, they can see each other.
----------------------------------------
[root at lb1 ~]# pcs status
Cluster name: LB.STK
Last updated: Mon Jan 13 15:34:21 2014
Last change: Mon Jan 13 15:24:47 2014 via cibadmin on lb1.asol.local
Stack: cman
Current DC: lb1.asol.local - partition with quorum
Version: 1.1.10-14.el6_5.1-368c726
2 Nodes configured
2 Resources configured

Online: [ lb1.asol.local lb2.asol.local ]

Full list of resources:

  Resource Group: LB
      LAN.VIP	(ocf::heartbeat:IPaddr2):	Started lb2.asol.local
      WAN.VIP	(ocf::heartbeat:IPaddr2):	Started lb2.asol.local
----------------------------------------
After manual shutdown of one node 2 (pcs cluster stop), the node 1
doesn't get this information and still believes node 2 is up and
running. In the log of corosync @lb2 these lines are repeating:

Jan 13 15:38:43 [1712] lb2.asol.local        cib:     info:
crm_client_new: 	Connecting 0x25a3810 for uid=0 gid=0 pid=10763
id=2b06a195-11f6-452d-992b-5ea0c69be21a
Jan 13 15:38:43 [1712] lb2.asol.local        cib:     info:
cib_process_request: 	Completed cib_query operation for section 'all':
OK (rc=0, origin=local/crm_resource/2, version=0.7.4)
Jan 13 15:38:43 [1712] lb2.asol.local        cib:     info:
crm_client_destroy: 	Destroying 0 events
Jan 13 17:24:24 corosync [TOTEM ] Retransmit List: 9a 9b 9c

The firewall on both nodes is open for incomming traffic from these
nodes and stonith-enabled is set to false. I created keys for root user, 
so I can make ssh back and forth without using password. The pacemaker's 
version is 1.1.10-14.

Do you have any idea, where might be a problem?

thanks

martin