Created attachment 168516 [details]
lib/services/services_linux.c and lib/common/utils.c patches have been committed upstream. extra/resources/ping and crmd/throttle.c are trivial, but I haven't submitted them yet. The crmd/pengine.c patch is just to lower the default message size for FreeBSD specifically due to the default kern.ipc.maxsockbuf sysctl. This patch is purely to have pacemaker work without having to tweak it. In my simple testing I was only able to max out the message size when creating hundreds of ping resource agents. If someone runs into a problem they could always increase the sysctl and set the pacemaker_ipc_buffer rc variable to something larger.
The only resource agent I've tested and patched was ping, I doubt the others will work. I just haven't got around to looking at them. pacemaker itself though appears to run fine in a simple 2 node zabbix setup. I also had a 3 node setup between debian8/freebsd9/freebsd10 working too.
# crm_mon -1
Last updated: Tue Mar 22 23:11:02 2016 Last change: Thu Mar 17 18:34:38 2016 by hacluster via crmd on col1002.mon.corp.fl1.dpejesh.net
Current DC: col1001.mon.corp.fl1.dpejesh.net (version 1.1.14-70404b0) - partition with quorum
2 nodes and 1 resource configured
Online: [ col1001.mon.corp.fl1.dpejesh.net col1002.mon.corp.fl1.dpejesh.net ]
zabbixserver (ocf::heartbeat:zabbixserver): Started col1001.mon.corp.fl1.dpejesh.net
Created attachment 168517 [details]
Would it be ok if I moved corosync and pacemaker under sysutils? I'm working on an update of heartbeat and that already lives under sysutils. In gentoo they are all in a sys-cluster category. Also, if I understand things correctly pacemaker can work with heartbeat instead of corosync. I wonder if we should have separate ports for that, like pacemaker-corosync and pacemaker-heartbeat.
That's fine, I initially had them under sysutils in my local ports tree but moved them since they're kind of in that grey area of network/system services.
I don't think pacemaker can be setup to use our port of heartbeat since we're still using 2.1.4. From my reading, heartbeat 2.1 was an all in one type of deal and with heartbeat 3 they split it up into separate projects with heartbeat becoming just the messaging layer. I initially looked at having a config option to support building against our heartbeat port too but ran into problems with it. If I remember correctly it was because they both shared a number of files and conflicted with one another. But that was a few months ago and could be mistaken. I think a heartbeat3 port and a config option for pacemaker to support corosync/heartbeat3 would make more sense though than having 2 separate pacemaker ports.
I should also add that my reasoning for using net instead of sysutils was I felt crmsh (bug 208222) belonged under net-mgmt, and with that under net-mgmt corosync/pacemaker would fall under the net category. So if you decide to move these under sysutils then crmsh might belong in there too.
A commit references this bug:
Date: Thu Mar 24 15:56:08 UTC 2016
New revision: 411799
Pacemaker is an advanced, scalable High-Availability cluster resource
manager for Linux-HA (Heartbeat) and/or Corosync.
It supports "n-node" clusters with significant capabilities for managing
resources and dependencies.
It will run scripts at initialization, when machines go up or down, when
related resources fail and can be configured to periodically check resource
Submitted by: David Shane Holden <firstname.lastname@example.org>
I've committed it with a DOCS and MANPAGES option added.
Thanks Tijl for getting these committed.
I have one question/concern though. Since the default permissions on /var/run/qb are now root:wheel 1770, pacemaker will require users to chmod 1777 or chown root:haclient it to work since it starts up processes as the hacluster user. This will also become a problem if qpid gets a port and has its own uid/gid which will also need access to /var/run/qb if configured to use corosync. Which will then become compounded if pacemaker and qpid are running on the same host. The way I see it there's 3 options:
1) /var/run/qb set to 1777
2) add a pkg-install script to adjust the group to haclient if it's root:wheel 1770
3) add a pkg-message informing the user to adjust it
This is why I initially used option 1 when I submitted the port. None of them are really ideal but it seemed to cause the least confusion for a new user and should just work regardless of the machine setup. Option 2 will kind of work if only pacemaker is installed, but if another package (eg qpid) wanted to change the group in a pkg-install script then they'd be stomping on each others toes. And we all know option 3 will end up being overlooked by the user resulting in time spent troubleshooting why pacemaker won't start only to realize it was a permissions problem on /var/run/qb.
What are your thoughts?
I made a mistake. I changed 1777 into 1770 and forgot to set the group to haclient. I'm really reluctant to make it 1777. That allows any user to interfere with pacemaker by create files or sockets there with names that pacemaker wants to use. Isn't it possible to let qpid use haclient as group?
I was just pondering with qpid what would it use for USERS/GROUPS be in the port Makefile. To support clustering out of the box it could be qpid/haclient but that's wrong and typically services get their own dedicated uid/gid. So it should be qpid/qpid, but then in that setup it wouldn't have access to /var/run/qb without some user intervention. But thinking about it more I think in qpid's case a pkg-message informing the user to add qpid to haclient as a supplementary group if they wanted to support clustering probably makes the most sense, especially because it should run fine in a stand-alone mode.
If you plan on patching libqb to set the permissions to root:haclient by default I think everything should be good.
A commit references this bug:
Date: Thu Mar 24 21:43:58 UTC 2016
New revision: 411813
Change the group on /var/run/qb to haclient. I forgot this when I changed
Approved by: email@example.com (maintainer)
Should GROUPS=haclient be set in the libqb Makefile that you just committed?
A commit references this bug:
Date: Fri Mar 25 08:26:35 UTC 2016
New revision: 411823
Create the haclient group used in pkg-plist.