Corosync on a home network - Discuss

10 Sep 2017


      I've been trying to build a model cluster using three virtual machines
on my home server.  Each VM boots off its own dedicated partition
(CentOS 7.3).  One partition is designated to be the common /home
partition for the VMs, (on the real machine it will mount as /cluster).
I'm intending to run GFS2 on the shared partition, so I need to
configure DLM and corosync.  That's where I'm getting bogged down.
The VMs and the real machine are bridged onto one ethernet.  There is
another ethernet in the main machine on a different network, but that is
not used for clustering.  The ethernet port is connected to a switch
which in turn connects to a BT Home Hub 6.  All four adresses are
static, Network Manager is off, ssh works across the nodes without a
password and ping gives sensible times.
--------------%<-------------------
# brctl show
bridge name	bridge id	STP enabled	interfaces
br3		XXXXXXXXX	no		enp3s0
    					vnet0
    					vnet1
    					vnet2
virbr0		XXXXXXXXX	yes		virbr0-nic
--------------%<-------------------
When I start corosync each node starts up but does not see the others.
For instance I see:
--------------%<----------------------
# corosync-quorumtool
Quorum information
------------------
Date:             Sun Sep 10 12:56:56 2017
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          3
Ring ID:          3/28648
Quorate:          No
Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      1
Quorum:           3 Activity blocked
Flags:
Membership information
----------------------
    Nodeid      Votes Name
         3          1 192.168.1.52 (local)
----------------%<-------------------
All four nodes are similar, but with different node IDs, IP addresses
and Ring IDs.
The documentation warns that not all routers will handle multicast
datagrams correctly.  I therefore attempted to force unicast
communication by making the following changes from the distributed
corosync.conf:
transport: updu
    cluster_name: <set to the same as the domain>
#	crypto_cipher: none
#	crypto_hash: none
#		mcastaddr: 239.255.1.1
#		mcastport: 5405
#		ttl: 1
The following are unchanged:
version: 2
    secauth: off
    	ringnumber: 0
    	bindnetaddr: 192.168.1.0
The nodelist is:
---------%<----------------
nodelist {
    node {
    	ring0_addr: 192.168.1.2
    	nodeid: 1
    }
    node {
    	ring0_addr: 192.168.1.51
    	nodeid: 2
    }
    node {
    	ring0_addr: 192.168.1.52
    	nodeid: 3
    }
    node {
    	ring0_addr: 192.168.1.53
    	nodeid: 4
    }
}
--------%<------------------
logging and quorum are as supplied.
Any help will be gratefully received.
Regards,
Martin