Once upon a time, Digimer <lists at alteeve.ca> said: > The cluster software and any hosted services aren't running. It's not > that they think they're wrong, they just have no existing state so they > won't try to touch anything without first ensuring it is safe to do so. Well, I was being short; what I meant was, in HA, if you aren't known to be right, you are wrong, and you do nothing. > SCSI reservations, and anything that blocks access is technically OK. > However, I stand by the recommendation to power cycle lost nodes. It's > by far the safest (and easiest) approach. I know this goes against the > grain of sysadmins to yank power, but in an HA setup, nodes should be > disposable and replaceable. The nodes are not important, the hosted > services are. One advantage SCSI reservations have is that if you can access the storage, you can lock out everybody else. It doesn't require access to a switch, management card, etc. (that may have its own problems). If you can access the storage, you own it, if you can't, you don't. Putting a lock directly on the actual shared resource can be the safest path (if you can't access it, you can't screw it up). I agree that rebooting a failed node is also good, just pointing out that putting the lock directly on the shared resource is also good. -- Chris Adams <linux at cmadams.net>