Hi,
I am running two node active/passive cluster running Centos3 update 8 64 bit on Hp Box with external hp storage connected via scsi. My cluster was running fine for last 3 years.But all of a sudden cluster service keep on shifting (atleast one time in a day )form one node to another.
After analysed the syslog i found that due to some network fluctuation service was getting shifted.Both the nodes has two NIC bonded together and configured with below ip.
My network details:
192.168.1.2 --node 1 physical ip with class c subnet (bond0 ) 192.168.1.3 --node 2 physical ip with class c subnet (bond0 ) 192.168.1.4 --- floating ip ( cluster )
Since it is a very critical and busy server may be due to heavy network load some hear beat signal is getting missed resulting in shifting of service from one node to another.
So i planned to connect crossover cable for heart beat messages, can any one guide me or provide me the link that best explains how to do the same and the changes i have to made in cluster configuration file after connecting the crossover cable.
Regards,
Lingu
lingu wrote:
Since it is a very critical and busy server may be due to heavy network load some hear beat signal is getting missed resulting in shifting of service from one node to another.
For automated takeover systems, especially critical ones (tho you can argue that any system setup with automatic takeover is critical by definition), you should have multiple heartbeat paths. Ethernet, serial cable, on shared disk, fibre or whatnot.
Having false takeovers due to missed heartbeat on one set of ethernet cards could also likely be missed on another set of cards, even with a crossover cable.
Maybe you should investigate alternate paths?
Morten Torstensen wrote:
lingu wrote:
Since it is a very critical and busy server may be due to heavy network load some hear beat signal is getting missed resulting in shifting of service from one node to another.
For automated takeover systems, especially critical ones (tho you can argue that any system setup with automatic takeover is critical by definition), you should have multiple heartbeat paths. Ethernet, serial cable, on shared disk, fibre or whatnot.
Having false takeovers due to missed heartbeat on one set of ethernet cards could also likely be missed on another set of cards, even with a crossover cable.
Maybe you should investigate alternate paths?
indeed, commercial cluster software like Veritas REQUIRES dual path dedicated heartbeat networks, and highly recommends implementing storage 'fencing' so that there's no way physically possible both systems could simultaneously mount the storage.
fencing is fairly easy with SAN storage, you instruct the SAN switch to only allow the currently active server to have access to the storage, and when the standby server takes over, it instructs the switch to disable access by the old active server before enabling access by itself. fencing with shared scsi is much harder and requires special hardware.
I am running two node active/passive cluster running Centos3 update 8 64 bit on Hp Box with external hp storage connected via scsi. My cluster was running fine for last 3 years.But all of a sudden cluster service keep on shifting (atleast one time in a day )form one node to another.
After analysed the syslog i found that due to some network fluctuation service was getting shifted.Both the nodes has two NIC bonded together and configured with below ip.
My network details:
192.168.1.2 --node 1 physical ip with class c subnet (bond0 ) 192.168.1.3 --node 2 physical ip with class c subnet (bond0 ) 192.168.1.4 --- floating ip ( cluster )
Since it is a very critical and busy server may be due to heavy network load some hear beat signal is getting missed resulting in shifting of service from one node to another.
So i planned to connect crossover cable for heart beat messages, can any one guide me or provide me the link that best explains how to do the same and the changes i have to made in cluster configuration file after connecting the crossover cable.
Hi Lingu,
I realize you're just trying to get this fixed, but what happened on your network to make an ha pair that has been stable for three years start getting flakey?
Everything I know about heartbeat comes from http://www.linux-ha.org/ha.cf.
I'm fairly sure all you need to add to your config is a bcast line. In a simple setup, "bcast eth0 eth1" will send heartbeats over eth0 and eth1, you will probably want "bcast bond0 eth3" depending on your interface names. Make sure you are also pinging a third host (the subnet's gateway is a good choice). There was a heartbeat security issue a couple years ago, you should consider planning to upgrade to a patched version.
Patrick