SLURM now requires cgroups for basic functionality. As a result, sysutils/slurm-wlm is currently broken and unlikely to be fixed, unless someone is up for a long-term uphill battle. The port builds and installs, but the daemons don't work without the cgroups plugin. I'd suggest making it a default-off option for openmpi and mpich.
I'm not against it in principle, I think that not having a base dependency on slurm has its own merit even if slurm-wlm is not broken. I'll do some reading because I wasn't aware that slurm-wlm was currently broken, and I'll also do some tests.
Bug 276001
If I understand well <https://slurm.schedmd.com/cgroups.html>, cgroup is only an optional feature for SLURM. What am I missing?
(In reply to Jason W. Bacon from comment #0) I'm not sure I understand well. Are you saying that slurmctld fails with cgroup, but that if TaskPlugin is set to task/affinity or task/none then slurmctld also fails? https://slurm.schedmd.com/slurm.conf.html#OPT_TaskPlugin
Is anyone actually *running* SLURM on FreeBSD? It's been a while since I've tried, but I was unable to get anything later than 20.02.7 working. Bug 276001 was my last attempt, and I only got as far as unbreaking the build. I don't recall the details of why the daemons wouldn't run. I have always used task/affinity, and it was no longer working last time I checked. FYI, I ported SLURM to FreeBSD. There are many features that never worked due to Linuxisms (like parsing /proc files with scanf()), and the task of keeping it working has grown with each new release. Upstream isn't interested in supporting other platforms, and in fact only provides significant support to paying customers. I finally gave up and started https://github.com/outpaddling/LPJS/. At any rate, making SLURM an unconditional dependency for MPI seemed questionable to me from the beginning, though I understood the desire to avoid creating multiple packages.
When I proposed an upgrade of sysutils/slurm-wlm in Phabricator D42764 and asked for checking from its users, I got no answer on this point: then I guess that it is not widely used, excepted for the libraries. The question is: which is the impact of removing this dependency on MPICH and OpenMPI?
So, the dependency was in response to my request to support SLURM in OpenMPI. The maintainer at the time decided to make it unconditional rather than use flavors or some other mechanism to create alternative packages. I think a default-off option for SLURM support would be more than adequate at this point. If someone wants to do the work to get SLURM working on FreeBSD again, this can always be reexamined.
I also think that the default option should not depend on Slurm, as I see no reason for it. MPI and batch scheduling are complementary but distinct from each other. In fact, it appears from the above comments that Slurm and MPI are not currently used together on FreeBSD. I'll make flavors of openmpi and mpich with Slurm, with the base being independent of any scheduler. I won't make a TORQUE flavor, as it seems to no longer receive updates. The last TORQUE release dates back from 8 years ago and sysutils/torque is flagged as deprecated with expiry set to the end of this month. I would have liked to make these changes at the next version upgrade, as there is a chance, even if minor, that it will create a disruption to some users. I'd rather not do that in a port revision if it can be avoided, but the TORQUE removal is somewhat forcing my hand. Thoughts?
Actually Torque is not totally abandoned, but its repository layout is a bit strange: see <https://lists.freebsd.org/archives/dev-commits-ports-all/2023-December/093144.html> In December, Moin said that there is no need to remove it, because it could be updated. I didn't find neither the time nor the motivation to work on it, but anyways we could extend its deprecation date.
I forgot the 2nd URL: <https://lists.freebsd.org/archives/dev-commits-ports-all/2023-December/093196.html>.
(In reply to Thierry Thomas from comment #9) That's good news. I thought that it was abandoned for sure because the port states that it is, but also I couldn't find a link to the source on the official web page, and the github page has not received significant code since 2015. https://github.com/adaptivecomputing/torque/tags But if it's still alive, for sure I'll produce a flavor for it. I started looking into openPBS too, but have not gotten very far.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=09f0bf2ada083c098fadf42160afe96cad9c2660 commit 09f0bf2ada083c098fadf42160afe96cad9c2660 Author: Torsten Zuehlsdorff <tz@FreeBSD.org> AuthorDate: 2024-07-08 13:47:54 +0000 Commit: Torsten Zuehlsdorff <tz@FreeBSD.org> CommitDate: 2024-07-08 13:49:59 +0000 net/mpich: Upgrade mpich from 4.2.1 to 4.2.2 - Remove default slurm dependency and make it an option (Bug 279777) - Remove HYDRA dependency on torque (not needed) - Revert removal of HYDRA option PR: 280184, 279777 Approved by: Laurent Chardon <laurent.chardon@gmail.com> (maintainer) net/mpich/Makefile | 20 ++++++++++++-------- net/mpich/distinfo | 6 +++--- net/mpich/pkg-plist | 11 ++++++++--- 3 files changed, 23 insertions(+), 14 deletions(-)
Committed in #280184 :)