[CentOS-virt] GFS2 hangs after one node going down

Thu Mar 21 18:09:20 UTC 2013
Maurizio Giungato <m.giungato at pixnamic.com>

Il 21/03/2013 18:48, Maurizio Giungato ha scritto:
> Il 21/03/2013 18:14, Digimer ha scritto:
>> On 03/21/2013 01:11 PM, Maurizio Giungato wrote:
>>> Hi guys,
>>>
>>> my goal is to create a reliable virtualization environment using CentOS
>>> 6.4 and KVM, I've three nodes and a clustered GFS2.
>>>
>>> The enviroment is up and working, but I'm worry for the reliability, if
>>> I turn the network interface down on one node to simulate a crash (for
>>> example on the node "node6.blade"):
>>>
>>> 1) GFS2 hangs (processes go in D state) until node6.blade get fenced
>>> 2) not only node6.blade get fenced, but also node5.blade!
>>>
>>> Help me to save my last neurons!
>>>
>>> Thanks
>>> Maurizio
>>
>> DLM, the distributed lock manager provided by the cluster, is 
>> designed to block when a known goes into an unknown state. It does 
>> not unblock until that node is confirmed to be fenced. This is by 
>> design. GFS2, rgmanager and clustered LVM all use DLM, so they will 
>> all block as well.
>>
>> As for why two nodes get fenced, you will need to share more about 
>> your configuration.
>>
> My configuration is very simple I attached cluster.conf and hosts files.
> This is the row I added in /etc/fstab:
> /dev/mapper/KVM_IMAGES-VL_KVM_IMAGES /var/lib/libvirt/images gfs2 
> defaults,noatime,nodiratime 0 0
>
> I set also fallback_to_local_locking = 0 in lvm.conf (but nothing change)
>
> PS: I had two virtualization enviroments working like a charm on 
> OCFS2, but since Centos 6.x I'm not able to install it, there is same 
> way to achieve the same results with GFS2 (with GFS2 sometime I've a 
> crash after only a "service network restart" [I've many interfaces 
> then this operation takes more than 10 seconds], with OCFS2 I've never 
> had this problem.
>
> Thanks 
I attached my logs from /var/log/cluster/*



-------------- next part --------------

Mar 21 19:00:10 fenced fencing node lama6.blade
Mar 21 19:00:14 fenced fence lama6.blade dev 0.0 agent fence_bladecenter result: error from agent
Mar 21 19:00:14 fenced fence lama6.blade failed
Mar 21 19:00:17 fenced fencing node lama6.blade
Mar 21 19:00:39 fenced fence lama6.blade success
Mar 21 19:00:45 fenced fencing node lama5.blade
Mar 21 19:00:57 fenced fence lama5.blade success


Mar 21 18:59:00 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 21 18:59:00 corosync [QUORUM] Members[3]: 1 2 3
Mar 21 18:59:00 corosync [QUORUM] Members[3]: 1 2 3
Mar 21 18:59:00 corosync [CPG   ] chosen downlist: sender r(0) ip(20.11.11.104) ; members(old:2 left:0)
Mar 21 18:59:00 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Mar 21 18:59:41 corosync [TOTEM ] A processor failed, forming new configuration.
Mar 21 19:00:10 corosync [QUORUM] Members[2]: 1 2
Mar 21 19:00:10 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 21 19:00:10 corosync [CPG   ] chosen downlist: sender r(0) ip(20.11.11.104) ; members(old:3 left:1)
Mar 21 19:00:10 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Mar 21 19:00:33 corosync [TOTEM ] A processor failed, forming new configuration.
Mar 21 19:00:45 corosync [QUORUM] Members[1]: 1
Mar 21 19:00:45 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 21 19:00:45 corosync [CPG   ] chosen downlist: sender r(0) ip(20.11.11.104) ; members(old:2 left:1)
Mar 21 19:00:45 corosync [MAIN  ] Completed service synchronization, ready to provide service.


Mar 21 19:00:10 rgmanager State change: lama6.blade DOWN
Mar 21 19:00:45 rgmanager State change: lama5.blade DOWN


Mar 21 19:00:10 fenced fencing node lama6.blade
Mar 21 19:00:14 fenced fence lama6.blade dev 0.0 agent fence_bladecenter result: error from agent
Mar 21 19:00:14 fenced fence lama6.blade failed
Mar 21 19:00:17 fenced fencing node lama6.blade
Mar 21 19:00:39 fenced fence lama6.blade success
Mar 21 19:00:45 fenced fencing node lama5.blade
Mar 21 19:00:57 fenced fence lama5.blade success


Mar 21 19:00:27 qdiskd Writing eviction notice for node 3
Mar 21 19:00:28 qdiskd Writing eviction notice for node 2
Mar 21 19:00:28 qdiskd Node 3 evicted
Mar 21 19:00:29 qdiskd Node 2 evicted