Bug 208221 - [new port] net/pacemaker: Scalable High-Availability cluster resource manager
Summary: [new port] net/pacemaker: Scalable High-Availability cluster resource manager
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Tijl Coosemans
URL:
Keywords:
Depends on: 208182
Blocks: 208222
  Show dependency treegraph
 
Reported: 2016-03-22 23:25 UTC by David Shane Holden
Modified: 2016-03-25 08:27 UTC (History)
1 user (show)

See Also:


Attachments
pacemaker.shar (203.19 KB, application/x-shar)
2016-03-22 23:25 UTC, David Shane Holden
no flags Details
poudriere testport (293.94 KB, text/x-log)
2016-03-22 23:26 UTC, David Shane Holden
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Shane Holden 2016-03-22 23:25:20 UTC
Created attachment 168516 [details]
pacemaker.shar

lib/services/services_linux.c and lib/common/utils.c patches have been committed upstream. extra/resources/ping and crmd/throttle.c are trivial, but I haven't submitted them yet.  The crmd/pengine.c patch is just to lower the default message size for FreeBSD specifically due to the default kern.ipc.maxsockbuf sysctl.  This patch is purely to have pacemaker work without having to tweak it.  In my simple testing I was only able to max out the message size when creating hundreds of ping resource agents.  If someone runs into a problem they could always increase the sysctl and set the pacemaker_ipc_buffer rc variable to something larger.

The only resource agent I've tested and patched was ping, I doubt the others will work.  I just haven't got around to looking at them.  pacemaker itself though appears to run fine in a simple 2 node zabbix setup.  I also had a 3 node setup between debian8/freebsd9/freebsd10 working too.

# crm_mon -1
Last updated: Tue Mar 22 23:11:02 2016          Last change: Thu Mar 17 18:34:38 2016 by hacluster via crmd on col1002.mon.corp.fl1.dpejesh.net
Stack: corosync
Current DC: col1001.mon.corp.fl1.dpejesh.net (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured

Online: [ col1001.mon.corp.fl1.dpejesh.net col1002.mon.corp.fl1.dpejesh.net ]

 zabbixserver   (ocf::heartbeat:zabbixserver):  Started col1001.mon.corp.fl1.dpejesh.net
Comment 1 David Shane Holden 2016-03-22 23:26:31 UTC
Created attachment 168517 [details]
poudriere testport
Comment 2 Tijl Coosemans freebsd_committer 2016-03-23 21:09:40 UTC
Would it be ok if I moved corosync and pacemaker under sysutils?  I'm working on an update of heartbeat and that already lives under sysutils.  In gentoo they are all in a sys-cluster category.  Also, if I understand things correctly pacemaker can work with heartbeat instead of corosync.  I wonder if we should have separate ports for that, like pacemaker-corosync and pacemaker-heartbeat.
Comment 3 David Shane Holden 2016-03-23 23:08:24 UTC
That's fine, I initially had them under sysutils in my local ports tree but moved them since they're kind of in that grey area of network/system services.

I don't think pacemaker can be setup to use our port of heartbeat since we're still using 2.1.4.  From my reading, heartbeat 2.1 was an all in one type of deal and with heartbeat 3 they split it up into separate projects with heartbeat becoming just the messaging layer.  I initially looked at having a config option to support building against our heartbeat port too but ran into problems with it.  If I remember correctly it was because they both shared a number of files and conflicted with one another.  But that was a few months ago and could be mistaken.  I think a heartbeat3 port and a config option for pacemaker to support corosync/heartbeat3 would make more sense though than having 2 separate pacemaker ports.
Comment 4 David Shane Holden 2016-03-23 23:21:49 UTC
I should also add that my reasoning for using net instead of sysutils was I felt crmsh (bug 208222) belonged under net-mgmt, and with that under net-mgmt corosync/pacemaker would fall under the net category.  So if you decide to move these under sysutils then crmsh might belong in there too.
Comment 5 commit-hook freebsd_committer 2016-03-24 15:56:14 UTC
A commit references this bug:

Author: tijl
Date: Thu Mar 24 15:56:08 UTC 2016
New revision: 411799
URL: https://svnweb.freebsd.org/changeset/ports/411799

Log:
  Add net/pacemaker.

  Pacemaker is an advanced, scalable High-Availability cluster resource
  manager for Linux-HA (Heartbeat) and/or Corosync.

  It supports "n-node" clusters with significant capabilities for managing
  resources and dependencies.

  It will run scripts at initialization, when machines go up or down, when
  related resources fail and can be configured to periodically check resource
  health.

  PR:		208221
  Submitted by:	David Shane Holden <dpejesh@yahoo.com>

Changes:
  head/net/Makefile
  head/net/pacemaker/
  head/net/pacemaker/Makefile
  head/net/pacemaker/distinfo
  head/net/pacemaker/files/
  head/net/pacemaker/files/pacemaker.in
  head/net/pacemaker/files/patch-crmd_pengine.c
  head/net/pacemaker/files/patch-crmd_throttle.c
  head/net/pacemaker/files/patch-extra_resources_ping
  head/net/pacemaker/files/patch-lib-common-utils.c
  head/net/pacemaker/files/patch-lib-services-services_linux.c
  head/net/pacemaker/pkg-descr
  head/net/pacemaker/pkg-plist
Comment 6 Tijl Coosemans freebsd_committer 2016-03-24 16:01:45 UTC
I've committed it with a DOCS and MANPAGES option added.
Comment 7 David Shane Holden 2016-03-24 20:20:06 UTC
Thanks Tijl for getting these committed.

I have one question/concern though.  Since the default permissions on /var/run/qb are now root:wheel 1770, pacemaker will require users to chmod 1777 or chown root:haclient it to work since it starts up processes as the hacluster user.  This will also become a problem if qpid gets a port and has its own uid/gid which will also need access to /var/run/qb if configured to use corosync.  Which will then become compounded if pacemaker and qpid are running on the same host.  The way I see it there's 3 options:

1) /var/run/qb set to 1777
2) add a pkg-install script to adjust the group to haclient if it's root:wheel 1770
3) add a pkg-message informing the user to adjust it

This is why I initially used option 1 when I submitted the port.  None of them are really ideal but it seemed to cause the least confusion for a new user and should just work regardless of the machine setup.  Option 2 will kind of work if only pacemaker is installed, but if another package (eg qpid) wanted to change the group in a pkg-install script then they'd be stomping on each others toes.  And we all know option 3 will end up being overlooked by the user resulting in time spent troubleshooting why pacemaker won't start only to realize it was a permissions problem on /var/run/qb.

What are your thoughts?
Comment 8 Tijl Coosemans freebsd_committer 2016-03-24 20:36:43 UTC
I made a mistake.  I changed 1777 into 1770 and forgot to set the group to haclient.  I'm really reluctant to make it 1777.  That allows any user to interfere with pacemaker by create files or sockets there with names that pacemaker wants to use.  Isn't it possible to let qpid use haclient as group?
Comment 9 David Shane Holden 2016-03-24 21:16:57 UTC
I was just pondering with qpid what would it use for USERS/GROUPS be in the port Makefile.  To support clustering out of the box it could be qpid/haclient but that's wrong and typically services get their own dedicated uid/gid.  So it should be qpid/qpid, but then in that setup it wouldn't have access to /var/run/qb without some user intervention.  But thinking about it more I think in qpid's case a pkg-message informing the user to add qpid to haclient as a supplementary group if they wanted to support clustering probably makes the most sense, especially because it should run fine in a stand-alone mode.

If you plan on patching libqb to set the permissions to root:haclient by default I think everything should be good.
Comment 10 commit-hook freebsd_committer 2016-03-24 21:44:51 UTC
A commit references this bug:

Author: tijl
Date: Thu Mar 24 21:43:58 UTC 2016
New revision: 411813
URL: https://svnweb.freebsd.org/changeset/ports/411813

Log:
  Change the group on /var/run/qb to haclient.  I forgot this when I changed
  the mode.

  PR:		208221
  Approved by:	dpejesh@yahoo.com (maintainer)

Changes:
  head/devel/libqb/Makefile
  head/devel/libqb/pkg-plist
Comment 11 David Shane Holden 2016-03-24 21:59:16 UTC
Should GROUPS=haclient be set in the libqb Makefile that you just committed?
Comment 12 commit-hook freebsd_committer 2016-03-25 08:27:33 UTC
A commit references this bug:

Author: tijl
Date: Fri Mar 25 08:26:35 UTC 2016
New revision: 411823
URL: https://svnweb.freebsd.org/changeset/ports/411823

Log:
  Create the haclient group used in pkg-plist.

  PR:		208221

Changes:
  head/devel/libqb/Makefile