On 22/04/16 04:40 PM, Paul Heinlein wrote:
On Fri, 22 Apr 2016, Digimer wrote:
Then you would use pacemaker to manage the floating IP, fence (stonith) a lost node, and promote drbd->mount FS->start nfsd->start floating IP.
My favorite acronym: stonith -- shoot the other node in the head.
It's brutal, but it solves a very important problem. True HA is based on never making an assumption, because if you do make assumptions, you will eventually go wrong. I see all the time people who build HA clusters come along complaining (usually in a panic) that their clusters have totally failed.
Inevitably, they have not setup stonith and retort with "ya, but it ran fine for $long_time!!". Sure, and like a long flight, you don't realize you've lost your landing gears until you try to land.