[CentOS] Pacemaker bugs?
Johnny Hughes
johnny at centos.org
Fri Nov 25 15:24:50 UTC 2016
On 11/25/2016 04:30 AM, Andreas Haumer wrote:
> Hi!
>
> I think I stumbled on at least two bugs in the CentOS 7.2 pacemaker package,
> though I'm not quite sure if or where to report it.
>
> I'm using the following package to set up a 2-node active/passive cluster:
>
> [root at clnode1 ~]# rpm -q pacemaker
> pacemaker-1.1.13-10.el7_2.4.x86_64
>
> The installation is up-to-date on both nodes as of the current PIT.
>
> I have currently the following cluster resources running:
>
> [root at clnode2 ~]# pcs status
> Cluster name: rucluster1
> Last updated: Fri Nov 25 11:26:51 2016 Last change: Fri Nov 25 10:51:32 2016 by root via cibadmin on clnode1
> Stack: corosync
> Current DC: clnode2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
> 2 nodes and 12 resources configured
>
> Online: [ clnode1 clnode2 ]
>
> Full list of resources:
>
> p_ip_cluster (ocf::heartbeat:IPaddr2): Started clnode2
> Master/Slave Set: ms_drbd_r0 [p_drbd_r0]
> Masters: [ clnode2 ]
> Slaves: [ clnode1 ]
> p_fs_drbd1 (ocf::heartbeat:Filesystem): Started clnode2
> p_apache (ocf::heartbeat:apache): Started clnode2
> p_dhcpd (ocf::heartbeat:dhcpd): Started clnode2
> p_named (ocf::heartbeat:named): Started clnode2
> p_slapd (ocf::heartbeat:slapd): Started clnode2
> p_postgres (ocf::heartbeat:pgsql): Started clnode2
> p_nmb (systemd:nmb): Started clnode2
> p_smb (systemd:smb): Started clnode2
> p_winbind (systemd:winbind): Started clnode2
>
> PCSD Status:
> clnode1: Online
> clnode2: Online
>
> Daemon Status:
> corosync: active/enabled
> pacemaker: active/enabled
> pcsd: active/enabled
>
>
> The first bug is rather serious, though a workaround exists!
>
> The cluster works fine, but as soon as I add a cluster resource of
> class "service", the cluster manager software runs havoc on node
> failover. In that situation, the lrmd process hangs in an infinite
> loop (neither strace nor ltrace show any outout so it seems to be
> an internal loop without any system or library call) and almost any
> call to the cluster manager software (crmsh or pcs) runs into a timeout.
> It's quite hard to recover the whole cluster from this situation.
>
> When I replace the resource class "service" with resource class
> "systemd", everything seems to work just fine.
>
> I found a rather old, already closed bug for Fedora which looks similar:
>
> <https://bugzilla.redhat.com/show_bug.cgi?id=1117151>
>
>
> Another bug seems to be rather minor: I see following assertions in the corosync logs:
>
> Nov 25 11:13:56 [3206] clnode1 crmd: error: crm_abort: pcmkRegisterNode: Triggered assert at xml.c:594 : node->type == XML_ELEMENT_NODE
>
> They seem to be related with the drbd resource, but do not cause any functional problem it seems.
>
> For this particular problem I found the following patch:
>
> <https://github.com/ClusterLabs/pacemaker/commit/68c7506aa84c69e5f425ef5f3025a9efb41d13da>
>
>
> Are these already known bugs?
> (I searched the CentOS bugzilla site but couldn't find any ticket
> describing these bugs)
>
>
> Any advise on if or where I should report it?
>
The new pacemaker from RHEL 7.3 source code is now in CR
(pacemaker-1.1.15-11.el7).
There will be a newer still version later today in CR :
pacemaker-1.1.15-11.el7_3.2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/centos/attachments/20161125/0168640a/attachment.sig>
More information about the CentOS
mailing list