[CentOS] problem with gfs_controld
sandra-llistes
sandra-llistes at fib.upc.eduWed Sep 15 08:39:45 UTC 2010
- Previous message: [CentOS] [Fwd: Re: Problem with SSHD update.]
- Next message: [CentOS] is Intel VT-d "really" necessary?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
We have two nodes with centos 5.5 x64 and cluster+gfs offering samba and
NFS services.
Recently one node displayed the following messages in log files:
Sep 13 08:19:07 NODE1 gfs_controld[3101]: cpg_mcast_joined error 2
handle 2846d7ad00000000 MSG_PLOCK
Sep 13 08:19:07 NODE1 gfs_controld[3101]: send plock message error -1
Sep 13 08:19:11 NODE1 gfs_controld[3101]: cpg_mcast_joined error 2
handle 2846d7ad00000000 MSG_PLOCK
Sep 13 08:19:11 NODE1 gfs_controld[3101]: send plock message error -1
When this happens in the other node access to samba services begin to
freeze and this error appears:
Sep 13 08:08:22 NODE2 kernel: INFO: task smbd:23084 blocked for more
than 120 seconds.
Sep 13 08:08:22 NODE2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 08:08:22 NODE2 kernel: smbd D ffff810001576420 0
23084 6602 23307 19791 (NOTLB)
Sep 13 08:08:22 NODE2 kernel: ffff81003e187e08 0000000000000086
ffff81003e187e24 0000000000000092
Sep 13 08:08:22 NODE2 kernel: ffff810005dbdc38 000000000000000a
ffff81003f4f77a0 ffffffff80309b60
Sep 13 08:08:22 NODE2 kernel: 000062f1773ef4c3 000000000000624f
ffff81003f4f7988 000000008008c597
Sep 13 08:08:22 NODE2 kernel: Call Trace:
Sep 13 08:08:22 NODE2 kernel: [<ffffffff8875cb7d>]
:dlm:dlm_posix_lock+0x172/0x210
Sep 13 08:08:22 NODE2 kernel: [<ffffffff800a1ba4>]
autoremove_wake_function+0x0/0x2e
Sep 13 08:08:22 NODE2 kernel: [<ffffffff8882a5b9>] :gfs:gfs_lock+0x9c/0xa8
Sep 13 08:08:22 NODE2 kernel: [<ffffffff8003a142>] fcntl_setlk+0x11e/0x273
Sep 13 08:08:22 NODE2 kernel: [<ffffffff800b878c>]
audit_syscall_entry+0x180/0x1b3
Sep 13 08:08:22 NODE2 kernel: [<ffffffff8002e7da>] sys_fcntl+0x269/0x2dc
Sep 13 08:08:22 NODE2 kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0
The configuration of the cluster is the following:
<?xml version="1.0"?>
<cluster alias="lcfib" config_version="60" name="lcfib">
<quorumd device="/dev/gfs-webn/quorum" interval="1"
label="quorum" min_score="1" tko="10" votes="2">
<heuristic interval="10" program="/bin/ping -t1 -c1
numIP.1" score="1" tko="5"/>
</quorumd>
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="NODE2.fib.upc.es" nodeid="1" votes="1">
<fence>
<method name="1">
<device lanplus="1" name="NODE2SP"/>
</method>
</fence>
</clusternode>
<clusternode name="NODE1.fib.upc.es" nodeid="2" votes="1">
<fence>
<method name="1">
<device lanplus="1" name="NODE1SP"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman broadcast="yes" expected_votes="4" two_node="0"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="md5"
ipaddr="192.168.13.77" login="" name="NODE2SP" passwd="5jSTv3Mb"/>
<fencedevice agent="fence_ipmilan" auth="md5"
ipaddr="192.168.13.78" login="" name="NODE1SP" passwd="5jSTv3Mb"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="NODE1-NODE2" ordered="1"
restricted="1">
<failoverdomainnode
name="NODE2.fib.upc.es" priority="2"/>
<failoverdomainnode
name="NODE1.fib.upc.es" priority="1"/>
</failoverdomain>
<failoverdomain name="NODE2-NODE1" ordered="1"
restricted="1">
<failoverdomainnode
name="NODE2.fib.upc.es" priority="1"/>
<failoverdomainnode
name="NODE1.fib.upc.es" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<script file="/etc/init.d/fibsmb1" name="fibsmb1"/>
<script file="/etc/init.d/fibsmb2" name="fibsmb2"/>
<clusterfs device="/dev/gfs-webn/gfs-webn"
force_unmount="0" fsid="14417" fstype="gfs" mountpoint="/web" name="web"
options=""/>
<clusterfs device="/dev/gfs-perfils/gfs-assig"
force_unmount="0" fsid="21646" fstype="gfs" mountpoint="/assig"
name="assig" options=""/>
<smb name="FIBSMB1" workgroup="FIBSMB"/>
<smb name="FIBSMB2" workgroup="FIBSMB"/>
<ip address="numIP.111/24" monitor_link="1"/>
<ip address="numIP.110/24" monitor_link="1"/>
<ip address="numIP.112/24" monitor_link="1"/>
</resources>
<service autostart="1" domain="NODE2-NODE1" name="samba"
recovery="disable">
<clusterfs ref="web"/>
<ip ref="numIP.110/24"/>
<ip ref="numIP.112/24"/>
<clusterfs ref="assig"/>
<script ref="fibsmb2"/>
<smb ref="FIBSMB2"/>
</service>
<service domain="NODE2-NODE1" name="sambalin"
recovery="disable">
<clusterfs ref="web"/>
<ip ref="numIP.111/24"/>
<smb ref="FIBSMB1"/>
<script ref="fibsmb1"/>
</service>
</rm>
</cluster>
I thinks it's a problem with the configuration, and the cluster cannot
communicate together at one point.
Any hints about this?
Thanks,
Sandra
PD: versions of kernel and cluster
kernel-2.6.18-194.el5
cman-2.0.115-34.el5
kmod-gfs-0.1.34-12.el5.centos
gfs-utils-0.1.20-7.el5
- Previous message: [CentOS] [Fwd: Re: Problem with SSHD update.]
- Next message: [CentOS] is Intel VT-d "really" necessary?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CentOS mailing list