I've been trying to build a model cluster using three virtual machines on my home server. Each VM boots off its own dedicated partition (CentOS 7.3). One partition is designated to be the common /home partition for the VMs, (on the real machine it will mount as /cluster). I'm intending to run GFS2 on the shared partition, so I need to configure DLM and corosync. That's where I'm getting bogged down.
The VMs and the real machine are bridged onto one ethernet. There is another ethernet in the main machine on a different network, but that is not used for clustering. The ethernet port is connected to a switch which in turn connects to a BT Home Hub 6. All four adresses are static, Network Manager is off, ssh works across the nodes without a password and ping gives sensible times.
--------------%<------------------- # brctl show bridge name bridge id STP enabled interfaces br3 XXXXXXXXX no enp3s0 vnet0 vnet1 vnet2 virbr0 XXXXXXXXX yes virbr0-nic --------------%<-------------------
When I start corosync each node starts up but does not see the others. For instance I see:
--------------%<---------------------- # corosync-quorumtool Quorum information ------------------ Date: Sun Sep 10 12:56:56 2017 Quorum provider: corosync_votequorum Nodes: 1 Node ID: 3 Ring ID: 3/28648 Quorate: No
Votequorum information ---------------------- Expected votes: 4 Highest expected: 4 Total votes: 1 Quorum: 3 Activity blocked Flags:
Membership information ---------------------- Nodeid Votes Name 3 1 192.168.1.52 (local) ----------------%<-------------------
All four nodes are similar, but with different node IDs, IP addresses and Ring IDs.
The documentation warns that not all routers will handle multicast datagrams correctly. I therefore attempted to force unicast communication by making the following changes from the distributed corosync.conf:
transport: updu cluster_name: <set to the same as the domain> # crypto_cipher: none # crypto_hash: none # mcastaddr: 239.255.1.1 # mcastport: 5405 # ttl: 1
The following are unchanged:
version: 2 secauth: off ringnumber: 0 bindnetaddr: 192.168.1.0
The nodelist is:
---------%<---------------- nodelist { node { ring0_addr: 192.168.1.2 nodeid: 1 } node { ring0_addr: 192.168.1.51 nodeid: 2 } node { ring0_addr: 192.168.1.52 nodeid: 3 } node { ring0_addr: 192.168.1.53 nodeid: 4 } } --------%<------------------
logging and quorum are as supplied.
Any help will be gratefully received.
Regards, Martin
On Sep 10, 2017, at 11:33 AM, J Martin Rushton martinrushton56@btinternet.com wrote:
# mcastport: 5405
Does tcpdump see this traffic leaving each VM? Yes then the app is working. Does tcpdump see this traffic making it to each VM? Yes then the switching is working. Is the port opened in firewall/iptables? Yes then shrug.
On 2017-09-10 08:33 AM, J Martin Rushton wrote:
I've been trying to build a model cluster using three virtual machines on my home server. Each VM boots off its own dedicated partition (CentOS 7.3). One partition is designated to be the common /home partition for the VMs, (on the real machine it will mount as /cluster). I'm intending to run GFS2 on the shared partition, so I need to configure DLM and corosync. That's where I'm getting bogged down.
The VMs and the real machine are bridged onto one ethernet. There is another ethernet in the main machine on a different network, but that is not used for clustering. The ethernet port is connected to a switch which in turn connects to a BT Home Hub 6. All four adresses are static, Network Manager is off, ssh works across the nodes without a password and ping gives sensible times.
--------------%<------------------- # brctl show bridge name bridge id STP enabled interfaces br3 XXXXXXXXX no enp3s0 vnet0 vnet1 vnet2 virbr0 XXXXXXXXX yes virbr0-nic --------------%<-------------------
When I start corosync each node starts up but does not see the others. For instance I see:
--------------%<---------------------- # corosync-quorumtool Quorum information
Date: Sun Sep 10 12:56:56 2017 Quorum provider: corosync_votequorum Nodes: 1 Node ID: 3 Ring ID: 3/28648 Quorate: No
Votequorum information
Expected votes: 4 Highest expected: 4 Total votes: 1 Quorum: 3 Activity blocked Flags:
Membership information
Nodeid Votes Name 3 1 192.168.1.52 (local)
----------------%<-------------------
All four nodes are similar, but with different node IDs, IP addresses and Ring IDs.
The documentation warns that not all routers will handle multicast datagrams correctly. I therefore attempted to force unicast communication by making the following changes from the distributed corosync.conf:
transport: updu cluster_name: <set to the same as the domain> # crypto_cipher: none # crypto_hash: none # mcastaddr: 239.255.1.1 # mcastport: 5405 # ttl: 1
The following are unchanged:
version: 2 secauth: off ringnumber: 0 bindnetaddr: 192.168.1.0
The nodelist is:
---------%<---------------- nodelist { node { ring0_addr: 192.168.1.2 nodeid: 1 } node { ring0_addr: 192.168.1.51 nodeid: 2 } node { ring0_addr: 192.168.1.52 nodeid: 3 } node { ring0_addr: 192.168.1.53 nodeid: 4 } } --------%<------------------
logging and quorum are as supplied.
Any help will be gratefully received.
Regards, Martin
You should repost on the Clusterlabs - Users list, it's the most active HA list and many/most of the devs are there.
http://lists.clusterlabs.org/mailman/listinfo/users
Am 10.09.2017 um 17:33 schrieb J Martin Rushton martinrushton56@btinternet.com:
When I start corosync each node starts up but does not see the others.
for multicast mode; did you tried to set [1] on the main host (not VMs)?
[1] echo 1 > /sys/class/net/${yourbridgeinterface}/bridge/multicast_querier
-- LF
On 11/09/17 11:56, Leon Fauster wrote:
Am 10.09.2017 um 17:33 schrieb J Martin Rushton martinrushton56@btinternet.com:
When I start corosync each node starts up but does not see the others.
for multicast mode; did you tried to set [1] on the main host (not VMs)?
[1] echo 1 > /sys/class/net/${yourbridgeinterface}/bridge/multicast_querier
-- LF
I hadn't, so I've just tried but with no success. Thanks for the suggestion though. Martin
On 11/09/17 22:48, J Martin Rushton wrote:
On 11/09/17 11:56, Leon Fauster wrote:
Am 10.09.2017 um 17:33 schrieb J Martin Rushton martinrushton56@btinternet.com:
When I start corosync each node starts up but does not see the others.
for multicast mode; did you tried to set [1] on the main host (not VMs)?
[1] echo 1 > /sys/class/net/${yourbridgeinterface}/bridge/multicast_querier
-- LF
I hadn't, so I've just tried but with no success. Thanks for the suggestion though. Martin
Big thankyou. Over the last week I've had the firewall switched off for testing, but tonight I forgot. Once the firewall was off setting the multicast_querier seems to have done the trick.
Thanks, Martin