[CentOS-virt] Sanlock disk leases on drbd/gfs2 volume

Sat Mar 9 00:34:29 UTC 2013
Russell Jones <russell at jonesmail.me>

Hi all,

I have a 2 node cluster that consists of the following:

* 1 drbd/gfs2 partition that holds VM images and XML
* Sanlock configured with the disk lease directory on the same drbd/gfs 
partition


Everything is working well, aside from one small issue I ran into. When 
testing fencing, on one particular test GFS began replaying the journal 
for the remaining node, and in the middle of it rgmanager attempted to 
recover the VM.  Normally this wouldn't be an issue, as libvirt would 
just pause until GFS was ready, however since it's talking to sanlock 
first, sanlock attempted to acquire the lock, while GFS was not ready, 
and failed. This caused the recovery itself to fail.

I'm attempting to keep the lease directory on the shared storage so that 
I do not have introduce another single point of failure in the cluster 
by having an outside NFS mount. It seems like I could get around this 
particular scenario by changing the recovery policy to "restart" (it's 
on relocate right now), and have it try restarts several times before 
giving up, but I wanted to see first if you guys had any advice for this 
issue as well. Perhaps I'm missing a setting that would correct this?

Thanks!