[CentOS] Corosync on a home network

Mon Sep 11 07:31:31 UTC 2017
Digimer <lists at alteeve.ca>

On 2017-09-10 08:33 AM, J Martin Rushton wrote:
> I've been trying to build a model cluster using three virtual machines
> on my home server.  Each VM boots off its own dedicated partition
> (CentOS 7.3).  One partition is designated to be the common /home
> partition for the VMs, (on the real machine it will mount as /cluster).
> I'm intending to run GFS2 on the shared partition, so I need to
> configure DLM and corosync.  That's where I'm getting bogged down.
> 
> The VMs and the real machine are bridged onto one ethernet.  There is
> another ethernet in the main machine on a different network, but that is
> not used for clustering.  The ethernet port is connected to a switch
> which in turn connects to a BT Home Hub 6.  All four adresses are
> static, Network Manager is off, ssh works across the nodes without a
> password and ping gives sensible times.
> 
> --------------%<-------------------
> # brctl show
> bridge name	bridge id	STP enabled	interfaces
> br3		XXXXXXXXX	no		enp3s0
> 						vnet0
> 						vnet1
> 						vnet2
> virbr0		XXXXXXXXX	yes		virbr0-nic
> --------------%<-------------------
> 
> When I start corosync each node starts up but does not see the others.
> For instance I see:
> 
> --------------%<----------------------
> # corosync-quorumtool
> Quorum information
> ------------------
> Date:             Sun Sep 10 12:56:56 2017
> Quorum provider:  corosync_votequorum
> Nodes:            1
> Node ID:          3
> Ring ID:          3/28648
> Quorate:          No
> 
> Votequorum information
> ----------------------
> Expected votes:   4
> Highest expected: 4
> Total votes:      1
> Quorum:           3 Activity blocked
> Flags:
> 
> Membership information
> ----------------------
>     Nodeid      Votes Name
>          3          1 192.168.1.52 (local)
> ----------------%<-------------------
> 
> All four nodes are similar, but with different node IDs, IP addresses
> and Ring IDs.
> 
> The documentation warns that not all routers will handle multicast
> datagrams correctly.  I therefore attempted to force unicast
> communication by making the following changes from the distributed
> corosync.conf:
> 
> 	transport: updu
> 	cluster_name: <set to the same as the domain>
> #	crypto_cipher: none
> #	crypto_hash: none
> #		mcastaddr: 239.255.1.1
> #		mcastport: 5405
> #		ttl: 1
> 
> The following are unchanged:
> 
> 	version: 2
> 	secauth: off
> 		ringnumber: 0
> 		bindnetaddr: 192.168.1.0
> 
> The nodelist is:
> 
> ---------%<----------------
> nodelist {
> 	node {
> 		ring0_addr: 192.168.1.2
> 		nodeid: 1
> 	}
> 	node {
> 		ring0_addr: 192.168.1.51
> 		nodeid: 2
> 	}
> 	node {
> 		ring0_addr: 192.168.1.52
> 		nodeid: 3
> 	}
> 	node {
> 		ring0_addr: 192.168.1.53
> 		nodeid: 4
> 	}
> }
> --------%<------------------
> 
> logging and quorum are as supplied.
> 
> Any help will be gratefully received.
> 
> Regards,
> Martin

You should repost on the Clusterlabs - Users list, it's the most active
HA list and many/most of the devs are there.

http://lists.clusterlabs.org/mailman/listinfo/users

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould