[CentOS-virt] GFS2 hangs after one node going down

Mon Mar 25 17:09:41 UTC 2013
Maurizio Giungato <m.giungato at pixnamic.com>

Il 25/03/2013 17:49, Digimer ha scritto:
> On 03/25/2013 08:44 AM, Maurizio Giungato wrote:
>> Il 22/03/2013 16:27, Digimer ha scritto:
>>> On 03/22/2013 11:21 AM, Maurizio Giungato wrote:
>>>> Il 22/03/2013 00:34, Digimer ha scritto:
>>>>> On 03/21/2013 02:09 PM, Maurizio Giungato wrote:
>>>>>> Il 21/03/2013 18:48, Maurizio Giungato ha scritto:
>>>>>>> Il 21/03/2013 18:14, Digimer ha scritto:
>>>>>>>> On 03/21/2013 01:11 PM, Maurizio Giungato wrote:
>>>>>>>>> Hi guys,
>>>>>>>>> my goal is to create a reliable virtualization environment using
>>>>>>>>> CentOS
>>>>>>>>> 6.4 and KVM, I've three nodes and a clustered GFS2.
>>>>>>>>> The enviroment is up and working, but I'm worry for the
>>>>>>>>> reliability, if
>>>>>>>>> I turn the network interface down on one node to simulate a crash
>>>>>>>>> (for
>>>>>>>>> example on the node "node6.blade"):
>>>>>>>>> 1) GFS2 hangs (processes go in D state) until node6.blade get
>>>>>>>>> fenced
>>>>>>>>> 2) not only node6.blade get fenced, but also node5.blade!
>>>>>>>>> Help me to save my last neurons!
>>>>>>>>> Thanks
>>>>>>>>> Maurizio
>>>>>>>> DLM, the distributed lock manager provided by the cluster, is
>>>>>>>> designed to block when a known goes into an unknown state. It does
>>>>>>>> not unblock until that node is confirmed to be fenced. This is by
>>>>>>>> design. GFS2, rgmanager and clustered LVM all use DLM, so they 
>>>>>>>> will
>>>>>>>> all block as well.
>>>>>>>> As for why two nodes get fenced, you will need to share more about
>>>>>>>> your configuration.
>>>>>>> My configuration is very simple I attached cluster.conf and hosts
>>>>>>> files.
>>>>>>> This is the row I added in /etc/fstab:
>>>>>>> /dev/mapper/KVM_IMAGES-VL_KVM_IMAGES /var/lib/libvirt/images gfs2
>>>>>>> defaults,noatime,nodiratime 0 0
>>>>>>> I set also fallback_to_local_locking = 0 in lvm.conf (but nothing
>>>>>>> change)
>>>>>>> PS: I had two virtualization enviroments working like a charm on
>>>>>>> OCFS2, but since Centos 6.x I'm not able to install it, there is 
>>>>>>> same
>>>>>>> way to achieve the same results with GFS2 (with GFS2 sometime 
>>>>>>> I've a
>>>>>>> crash after only a "service network restart" [I've many interfaces
>>>>>>> then this operation takes more than 10 seconds], with OCFS2 I've
>>>>>>> never
>>>>>>> had this problem.
>>>>>>> Thanks
>>>>>> I attached my logs from /var/log/cluster/*
>>>>> The configuration itself seems ok, though I think you can safely take
>>>>> qdisk out to simplify things. That's neither here nor there though.
>>>>> This concerns me:
>>>>> Mar 21 19:00:14 fenced fence lama6.blade dev 0.0 agent
>>>>> fence_bladecenter result: error from agent
>>>>> Mar 21 19:00:14 fenced fence lama6.blade failed
>>>>> How are you triggering the failure(s)? The failed fence would
>>>>> certainly help explain the delays. As I mentioned earlier, DLM is
>>>>> designed to block when a node is in an unknowned state (failed but 
>>>>> not
>>>>> yet successfully fenced).
>>>>> As an aside; I do my HA VMs using clustered LVM LVs as the backing
>>>>> storage behind the VMs. GFS2 is an excellent file system, but it is
>>>>> expensive. Putting your VMs directly on the LV takes them out of the
>>>>> equation
>>>> I used 'service network stop' to simulate the failure, the node get
>>>> fenced through fence_bladecenter (BladeCenter HW)
>>>> Anyway, I took qdisk out and put GFS2 aside and now I've my VM on LVM
>>>> LVs, I'm trying for many hours to reproduce the issue
>>>> - only the node where I execute 'service network stop' get fenced
>>>> - using fallback_to_local_locking = 0 in lvm.conf LVM LVs remain
>>>> writable also while fencing take place
>>>> All seems to work like a charm now.
>>>> I'd like to understand what was happening. I'll try for same day 
>>>> before
>>>> trusting it.
>>>> Thank you so much.
>>>> Maurizio
>>> Testing testing testing. It's good that you plan to test before
>>> trusting. I wish everyone had that philosophy!
>>> The clustered locking for LVM comes into play for
>>> activating/inactivating, creating, deleting, resizing and so on. It
>>> does not affect what happens in an LV. That's why an LV remains
>>> writeable when a fence is pending. However, I feel this is safe
>>> because rgmanager won't recover a VM on another node until the lost
>>> node is fenced.
>>> Cheers
>> Thank you very much! The cluster continue working like a charm. Failure
>> after failure I mean :)
>> We are not using rgmanager fault management because doesn't have a check
>> about the memory availability on the destination node, then we prefer to
>> manage this situation with custom script we wrote.
>> last questions:
>> - have you any advice to improve the tollerance against network 
>> failures?
>> - to avoid having a gfs2 only for VM's xml, I've thought to keep them on
>> each node synced with rsync. Any alternatives?
>> - If I want to have only the clustered LVM without no other functions,
>> can you advice about a minimal configuration? (for example I think that
>> rgmanager is not necessary)
>> Thank you in advance
> For network redundancy, I use two switches and bonded (mode=1) links 
> with one link going to either switch. This way, losing a NIC or a 
> switch won't break the cluster. Details here:
> https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Network
> Using rsync to keep the XML files in sync is fine, if you really don't 
> want to use GFS2.
> You do not need rgmanager for clvmd to work. All you need is the base 
> cluster.conf (and working fencing, as you've seen).
> If you are over-provisioning VMs and need to worry about memory on 
> target systems, then you might want to take a look at pacemaker. It's 
> in tech-preview currently and will replace rgmanager in rhel7 (well, 
> expected to, nothing is guaranteed 'til release day). Pacemaker is 
> designed, as I understand it, to handle conditions like yours. 
> Further, it is *much* better tested than anything you roll yourself. 
> You can use clvmd with pacemaker by tieing cman into pacemaker.
> digime
Perfect, I've the same network configuration, on the other cluster I've 
four switches and I could create two bonds, one is dedicated to corosync 
then I was afraid that a single bond was little ;)

Thank you again