[CentOS-virt] GFS2 hangs after one node going down

Il 22/03/2013 16:27, Digimer ha scritto:
> On 03/22/2013 11:21 AM, Maurizio Giungato wrote:
>> Il 22/03/2013 00:34, Digimer ha scritto:
>>> On 03/21/2013 02:09 PM, Maurizio Giungato wrote:
>>>> Il 21/03/2013 18:48, Maurizio Giungato ha scritto:
>>>>> Il 21/03/2013 18:14, Digimer ha scritto:
>>>>>> On 03/21/2013 01:11 PM, Maurizio Giungato wrote:
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> my goal is to create a reliable virtualization environment using
>>>>>>> CentOS
>>>>>>> 6.4 and KVM, I've three nodes and a clustered GFS2.
>>>>>>>
>>>>>>> The enviroment is up and working, but I'm worry for the
>>>>>>> reliability, if
>>>>>>> I turn the network interface down on one node to simulate a crash
>>>>>>> (for
>>>>>>> example on the node "node6.blade"):
>>>>>>>
>>>>>>> 1) GFS2 hangs (processes go in D state) until node6.blade get 
>>>>>>> fenced
>>>>>>> 2) not only node6.blade get fenced, but also node5.blade!
>>>>>>>
>>>>>>> Help me to save my last neurons!
>>>>>>>
>>>>>>> Thanks
>>>>>>> Maurizio
>>>>>>
>>>>>> DLM, the distributed lock manager provided by the cluster, is
>>>>>> designed to block when a known goes into an unknown state. It does
>>>>>> not unblock until that node is confirmed to be fenced. This is by
>>>>>> design. GFS2, rgmanager and clustered LVM all use DLM, so they will
>>>>>> all block as well.
>>>>>>
>>>>>> As for why two nodes get fenced, you will need to share more about
>>>>>> your configuration.
>>>>>>
>>>>> My configuration is very simple I attached cluster.conf and hosts
>>>>> files.
>>>>> This is the row I added in /etc/fstab:
>>>>> /dev/mapper/KVM_IMAGES-VL_KVM_IMAGES /var/lib/libvirt/images gfs2
>>>>> defaults,noatime,nodiratime 0 0
>>>>>
>>>>> I set also fallback_to_local_locking = 0 in lvm.conf (but nothing
>>>>> change)
>>>>>
>>>>> PS: I had two virtualization enviroments working like a charm on
>>>>> OCFS2, but since Centos 6.x I'm not able to install it, there is same
>>>>> way to achieve the same results with GFS2 (with GFS2 sometime I've a
>>>>> crash after only a "service network restart" [I've many interfaces
>>>>> then this operation takes more than 10 seconds], with OCFS2 I've 
>>>>> never
>>>>> had this problem.
>>>>>
>>>>> Thanks
>>>> I attached my logs from /var/log/cluster/*
>>>
>>> The configuration itself seems ok, though I think you can safely take
>>> qdisk out to simplify things. That's neither here nor there though.
>>>
>>> This concerns me:
>>>
>>> Mar 21 19:00:14 fenced fence lama6.blade dev 0.0 agent
>>> fence_bladecenter result: error from agent
>>> Mar 21 19:00:14 fenced fence lama6.blade failed
>>>
>>> How are you triggering the failure(s)? The failed fence would
>>> certainly help explain the delays. As I mentioned earlier, DLM is
>>> designed to block when a node is in an unknowned state (failed but not
>>> yet successfully fenced).
>>>
>>> As an aside; I do my HA VMs using clustered LVM LVs as the backing
>>> storage behind the VMs. GFS2 is an excellent file system, but it is
>>> expensive. Putting your VMs directly on the LV takes them out of the
>>> equation
>>
>> I used 'service network stop' to simulate the failure, the node get
>> fenced through fence_bladecenter (BladeCenter HW)
>>
>> Anyway, I took qdisk out and put GFS2 aside and now I've my VM on LVM
>> LVs, I'm trying for many hours to reproduce the issue
>>
>> - only the node where I execute 'service network stop' get fenced
>> - using fallback_to_local_locking = 0 in lvm.conf LVM LVs remain
>> writable also while fencing take place
>>
>> All seems to work like a charm now.
>>
>> I'd like to understand what was happening. I'll try for same day before
>> trusting it.
>>
>> Thank you so much.
>> Maurizio
>>
>
> Testing testing testing. It's good that you plan to test before 
> trusting. I wish everyone had that philosophy!
>
> The clustered locking for LVM comes into play for 
> activating/inactivating, creating, deleting, resizing and so on. It 
> does not affect what happens in an LV. That's why an LV remains 
> writeable when a fence is pending. However, I feel this is safe 
> because rgmanager won't recover a VM on another node until the lost 
> node is fenced.
>
> Cheers

Thank you very much! The cluster continue working like a charm. Failure 
after failure I mean :)

We are not using rgmanager fault management because doesn't have a check 
about the memory availability on the destination node, then we prefer to 
manage this situation with custom script we wrote.

last questions:
- have you any advice to improve the tollerance against network failures?
- to avoid having a gfs2 only for VM's xml, I've thought to keep them on 
each node synced with rsync. Any alternatives?
- If I want to have only the clustered LVM without no other functions, 
can you advice about a minimal configuration? (for example I think that 
rgmanager is not necessary)

Thank you in advance