Bug 279777 - net/openmpi, net/mpich: Suggestion to drop slurm-wlm dependency
Summary: net/openmpi, net/mpich: Suggestion to drop slurm-wlm dependency
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Torsten Zuehlsdorff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-06-15 22:32 UTC by Jason W. Bacon
Modified: 2024-07-08 13:54 UTC (History)
5 users (show)

See Also:
laurent.chardon: maintainer-feedback+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jason W. Bacon freebsd_committer freebsd_triage 2024-06-15 22:32:56 UTC
SLURM now requires cgroups for basic functionality.  As a result, sysutils/slurm-wlm is currently broken and unlikely to be fixed, unless someone is up for a long-term uphill battle.  The port builds and installs, but the daemons don't work without the cgroups plugin.

I'd suggest making it a default-off option for openmpi and mpich.
Comment 1 Laurent Chardon 2024-06-16 15:11:15 UTC
I'm not against it in principle, I think that not having a base dependency on slurm has its own merit even if slurm-wlm is not broken.

I'll do some reading because I wasn't aware that slurm-wlm was currently broken, and I'll also do some tests.
Comment 2 Laurent Chardon 2024-06-16 15:21:51 UTC
Bug 276001
Comment 3 Thierry Thomas freebsd_committer freebsd_triage 2024-06-16 15:25:44 UTC
If I understand well <https://slurm.schedmd.com/cgroups.html>, cgroup is only an optional feature for SLURM. What am I missing?
Comment 4 Laurent Chardon 2024-06-16 15:49:11 UTC
(In reply to Jason W. Bacon from comment #0)

I'm not sure I understand well. Are you saying that slurmctld fails with cgroup, but that if TaskPlugin is set to task/affinity or task/none then slurmctld also fails?

https://slurm.schedmd.com/slurm.conf.html#OPT_TaskPlugin
Comment 5 Jason W. Bacon freebsd_committer freebsd_triage 2024-06-16 20:21:01 UTC
Is anyone actually *running* SLURM on FreeBSD?

It's been a while since I've tried, but I was unable to get anything later than 20.02.7 working.  Bug 276001 was my last attempt, and I only got as far as unbreaking the build.  I don't recall the details of why the daemons wouldn't run.

I have always used task/affinity, and it was no longer working last time I checked.

FYI, I ported SLURM to FreeBSD.  There are many features that never worked due to Linuxisms (like parsing /proc files with scanf()), and the task of keeping it working has grown with each new release.  Upstream isn't interested in supporting other platforms, and in fact only provides significant support to paying customers.  I finally gave up and started https://github.com/outpaddling/LPJS/.

At any rate, making SLURM an unconditional dependency for MPI seemed questionable to me from the beginning, though I understood the desire to avoid creating multiple packages.
Comment 6 Thierry Thomas freebsd_committer freebsd_triage 2024-06-19 12:28:49 UTC
When I proposed an upgrade of sysutils/slurm-wlm in Phabricator D42764 and asked for checking from its users, I got no answer on this point: then I guess that it is not widely used, excepted for the libraries.

The question is: which is the impact of removing this dependency on MPICH and OpenMPI?
Comment 7 Jason W. Bacon freebsd_committer freebsd_triage 2024-06-19 12:38:42 UTC
So, the dependency was in response to my request to support SLURM in OpenMPI.  The maintainer at the time decided to make it unconditional rather than use flavors or some other mechanism to create alternative packages.

I think a default-off option for SLURM support would be more than adequate at this point.

If someone wants to do the work to get SLURM working on FreeBSD again, this can always be reexamined.
Comment 8 Laurent Chardon 2024-06-24 11:36:55 UTC
I also think that the default option should not depend on Slurm, as I see no reason for it. MPI and batch scheduling are complementary but distinct from each other. In fact, it appears from the above comments that Slurm and MPI are not currently used together on FreeBSD.

I'll make flavors of openmpi and mpich with Slurm, with the base being independent of any scheduler. I won't make a TORQUE flavor, as it seems to no longer receive updates. The last TORQUE release dates back from 8 years ago and sysutils/torque is flagged as deprecated with expiry set to the end of this month. 

I would have liked to make these changes at the next version upgrade, as there is a chance, even if minor, that it will create a disruption to some users. I'd rather not do that in a port revision if it can be avoided, but the TORQUE removal is somewhat forcing my hand.

Thoughts?
Comment 9 Thierry Thomas freebsd_committer freebsd_triage 2024-06-24 16:54:33 UTC
Actually Torque is not totally abandoned, but its repository layout is a bit strange: see

<https://lists.freebsd.org/archives/dev-commits-ports-all/2023-December/093144.html>

In December, Moin said that there is no need to remove it, because it could be updated. I didn't find neither the time nor the motivation to work on it, but anyways we could extend its deprecation date.
Comment 10 Thierry Thomas freebsd_committer freebsd_triage 2024-06-24 16:55:19 UTC
I forgot the 2nd URL:
<https://lists.freebsd.org/archives/dev-commits-ports-all/2023-December/093196.html>.
Comment 11 Laurent Chardon 2024-06-24 18:44:21 UTC
(In reply to Thierry Thomas from comment #9)

That's good news. I thought that it was abandoned for sure because the port states that it is, but also I couldn't find a link to the source on the official web page, and the github page has not received significant code since 2015.

https://github.com/adaptivecomputing/torque/tags

But if it's still alive, for sure I'll produce a flavor for it.

I started looking into openPBS too, but have not gotten very far.
Comment 12 commit-hook freebsd_committer freebsd_triage 2024-07-08 13:50:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=09f0bf2ada083c098fadf42160afe96cad9c2660

commit 09f0bf2ada083c098fadf42160afe96cad9c2660
Author:     Torsten Zuehlsdorff <tz@FreeBSD.org>
AuthorDate: 2024-07-08 13:47:54 +0000
Commit:     Torsten Zuehlsdorff <tz@FreeBSD.org>
CommitDate: 2024-07-08 13:49:59 +0000

    net/mpich: Upgrade mpich from 4.2.1 to 4.2.2

      - Remove default slurm dependency and make it an option (Bug 279777)
      - Remove HYDRA dependency on torque (not needed)
      - Revert removal of HYDRA option

    PR:             280184, 279777
    Approved by:     Laurent Chardon <laurent.chardon@gmail.com> (maintainer)

 net/mpich/Makefile  | 20 ++++++++++++--------
 net/mpich/distinfo  |  6 +++---
 net/mpich/pkg-plist | 11 ++++++++---
 3 files changed, 23 insertions(+), 14 deletions(-)
Comment 13 Torsten Zuehlsdorff freebsd_committer freebsd_triage 2024-07-08 13:54:08 UTC
Committed in #280184 :)