Bug 253599 - sysutils/slurm-wlm 20.02.1 fails to install after building from ports
Summary: sysutils/slurm-wlm 20.02.1 fails to install after building from ports
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Jason W. Bacon
Depends on:
Reported: 2021-02-17 20:38 UTC by Patrick McMunn
Modified: 2021-02-24 00:07 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (jwb)


Note You need to log in before you can comment on or make changes to this bug.
Description Patrick McMunn 2021-02-17 20:38:13 UTC
I'm using portmaster. I uninstalled slurm-wlm so it could build. But after it finished building, it failed during installation:

install  -m 0644 /usr/ports/sysutils/slurm-wlm/work/slurm-20.02.1/etc/slurm.conf.example  /usr/ports/sysutils/slurm-wlm/work/stage/usr/local/etc/slurm.conf.sample
====> Compressing man pages (compress-man)
===> Staging rc.d startup script(s)
===>  Installing for slurm-wlm-20.02.1_6
===>  Checking if slurm-wlm is already installed
===>   Registering installation for slurm-wlm-20.02.1_6 as automatic
pkg-static: Unable to access file /usr/ports/sysutils/slurm-wlm/work/stage/usr/local/bin/sh5util:No such file or directory
pkg-static: Unable to access file /usr/ports/sysutils/slurm-wlm/work/stage/usr/local/lib/slurm/acct_gather_profile_hdf5.a:No such file or directory
pkg-static: Unable to access file /usr/ports/sysutils/slurm-wlm/work/stage/usr/local/lib/slurm/acct_gather_profile_hdf5.so:No such file or directory
pkg-static: Unable to access file /usr/ports/sysutils/slurm-wlm/work/stage/usr/local/man/man1/sh5util.1.gz:No such file or directory
pkg-static: Unable to access file /usr/ports/sysutils/slurm-wlm/work/stage/usr/local/share/doc/slurm-20.02.1/html/sh5util.html:No such file or directory
*** Error code 1

make[1]: stopped in /usr/ports/sysutils/slurm-wlm
*** Error code 1

make: stopped in /usr/ports/sysutils/slurm-wlm

===>>> Installation of slurm-wlm-20.02.1_6 (sysutils/slurm-wlm) failed
Comment 1 Patrick McMunn 2021-02-17 21:00:19 UTC
I found that installation succeeds if I disable the HDF5 port option.
Comment 2 Jason W. Bacon freebsd_committer 2021-02-18 14:16:31 UTC
It's working fine for me with hdf5 enabled.

Are all your ports and installed packages up-to-date?

Is your ports tree on the same branch as your packages?

To ensure both of these, you can run auto-update-system (sysutils/auto-admin).


Comment 3 Patrick McMunn 2021-02-23 03:19:52 UTC
I installed auto-admin and tried the command you suggested, but it wanted to update the system using binary packages. So that wasn't going to work for me. I did successfully update the system using "portmaster -atyd". Afterward, I ran "pkg check -da" to check for any broken dependencies. The only things broken were gcc9's dependency on isl and mpich's dependency of slurm-wlm. So I recompiled gcc9 and tried to compile and install slurm-wlm. But I still got the same error as originally reported in this bug. So I ran "portmaster -fR sysutils/slurm-wlm" which began the process of recompiling all the dependencies of slurm-wlm and then recompiling slurm-wlm. It successfully recompiled and reinstalled almost all the ports involved, but apparently near the end it got stuck in some kind of infinte recursion or loop of some kind involving mpich, hdf5, and slurm-wlm. So I had to ctrl-C the operation. I individually recompiled and installed hdf5, and then proceeded to do the same with slurm-wlm which still failed to install with the same error as initially reported.

But the thing is, the compilation phase of slurm-wlm completes without any obvious error messages. It only fails during installation. An inspection of the work directory shows that the files are in fact missing. I'm running on 13-CURRENT compiled on February 16, but I doubt something about the base system would cause an issue.
Comment 4 Patrick McMunn 2021-02-23 21:39:55 UTC
I finally figured out what the problem was. For some reason, the CXX option for the hdf5 port on my system was disabled. I only have a few ports with options differing from default, and I can't imagine that I would have had any reason to disable this manually. In any case, I enabled the CXX option for hdf5, rebuilt and installed hdf5, and then was able to successfully compile and install slurm-wlm. What doesn't make sense to me is why slurm-wlm never complained. I would think that compilation would have stopped because slurm-wlm's build system found that the dependency was missing. Or I would think the slurm-wlm port would check to make sure the dependency exists if it's needed for successful compilation. So my issue is resolved, but it might be a good idea to add a dependency check to the port since slurm-wlm's build system either doesn't check for it or just silently carries on as if nothing is wrong if the check fails.
Comment 5 Jason W. Bacon freebsd_committer 2021-02-24 00:07:04 UTC
Not sure if there's a way to improve much on this.  Currently we get an install error pointing to hdf5, which is pretty helpful.  slurm-wlm does have a dependency check for libhdf5.so.  I see that the configure script looks for h5cc, which is conditionally installed by hdf5 with the CXX option enabled.  I thought about making slurm depend on libhdf5_cpp.so instead of libhdf5.so, but that would just cause it to rebuild hdf5 (with the same local options) and error out on hdf5 install because it's already installed.  Then there's a possibility that other options like szip are disabled.  Checking that all dependencies are built with the correct options gets awfully hairy, but I'll ponder this for a while and see if an elegant solution comes to mind.