Bug 252506 - net/openmpi: incompatible dependencies on devel/hwloc and devel/hwloc2 via sysutils/slurm-wlm
Summary: net/openmpi: incompatible dependencies on devel/hwloc and devel/hwloc2 via sy...
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Danilo Egea Gondolfo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-08 09:11 UTC by Martin Birgmeier
Modified: 2021-01-11 19:47 UTC (History)
4 users (show)

See Also:


Attachments
portmaster build log (52.21 KB, application/x-gzip)
2021-01-09 10:41 UTC, Thomas Guymer
no flags Details
openmpi_avx (1.75 KB, patch)
2021-01-09 13:10 UTC, Danilo Egea Gondolfo
no flags Details | Diff
portmaster build log 2 (75.37 KB, application/x-gzip)
2021-01-10 10:58 UTC, Thomas Guymer
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Birgmeier 2021-01-08 09:11:14 UTC
Scenario:
- Using portmaster to install graphics/gdal with most options enabled as recommended by graphics/qgis

Result:
- Amongst others, graphics/gdal depends on net/openmpi
- Amongst others, net/openmpi depends on devel/hwloc and sysutils/slurm-wlm
- Amongst others, sysutils/slurm-wlm depends on devel/hwloc2
- devel/hwloc and devel/hwloc2 cannot be installed simultaneously

Expected result:
- All ports using hwloc should depend on only one version - or -
- Both versions of hwloc should be able to coexist.

Note:
- In my build, devel/hwloc is built first.
- Portmaster then remarks that the dependency on libhwloc.so is fulfilled by it and proceeds with building slurm-wlm; that build does not complain, so maybe it suffices to make slurm-wlm dependent on hwloc instead of hwloc2.

-- Martin
Comment 1 Danilo Egea Gondolfo freebsd_committer 2021-01-08 09:32:17 UTC
Hello Martin,

I just committed this https://svnweb.freebsd.org/ports?view=revision&revision=560755

May you check if it resolves your issue?
Comment 2 Thomas Guymer 2021-01-08 11:42:33 UTC
I think that I have just come across this too with the following dependency chain "ffmpeg-4.3.1_6,1 >> opencv-core-3.4.1_36 >> eigen-3.3.7 >> fftw3-3.3.8_6 >> openmpi-4.0.5" from portmaster. This is the last few lines of the build log:



op_avx_functions.c:458:5: error: always_inline function '_mm512_max_epi32'
      requires target feature 'avx512f', but would be inlined into function
      'ompi_op_avx_2buff_max_int32_t_avx512' that is compiled without support
      for 'avx512f'
op_avx_functions.c:124:5: note: expanded from macro 'OP_AVX_FUNC'
    OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);                  \
    ^
op_avx_functions.c:72:27: note: expanded from macro 'OP_AVX_AVX512_FUNC'
            __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, vecB);  \
                          ^
<scratch space>:46:1: note: expanded from here
_mm512_max_epi32
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
gmake[4]: *** [Makefile:1991: liblocal_ops_avx512_la-op_avx_functions.lo] Error 1
gmake[4]: *** Waiting for unfinished jobs....
gmake[4]: Leaving directory '/usr/ports/net/openmpi/work/openmpi-4.1.0/ompi/mca/op/avx'
gmake[3]: *** [Makefile:3555: all-recursive] Error 1
gmake[3]: Leaving directory '/usr/ports/net/openmpi/work/openmpi-4.1.0/ompi'
gmake[2]: *** [Makefile:1901: all-recursive] Error 1
gmake[2]: Leaving directory '/usr/ports/net/openmpi/work/openmpi-4.1.0'
===> Compilation failed unexpectedly.



Is this the same issue or do you think that I should open up a new ticket?
Comment 3 Jason W. Bacon freebsd_committer 2021-01-08 13:53:36 UTC
My apologies for jumping the gun on the slurm-wlm commit.  I was under a lot of pressure yesterday and dropped the ball on this one.

Is this under control, or should we maybe revert the hwloc->hwloc2 changes and do a coordinated commit after proper testing?

FYI, the following could result in users with a preinstalled hwloc1 not installing hwloc2 with "pkg install openmpi".

libhwloc.so:devel/hwloc2

Could this cause ABI issues when the openmpi package was compiled with hwloc2?

There are some libraries with API/ABI differences that the application derives from the header files.  To be safe, I specified libhwloc.so.15 in slurm-wlm.
Comment 4 Jason W. Bacon freebsd_committer 2021-01-08 14:07:22 UTC
FYI, there are other potential conflicts:

FreeBSD orca.acadix  bacon /usr/ports 414: port-grep 'hwloc.so' 
devel/hpx/Makefile:		libhwloc.so:devel/hwloc
lang/pocl/Makefile:LIB_DEPENDS=	libhwloc.so:devel/hwloc \
math/mprime/Makefile:		libhwloc.so:devel/hwloc				\
net/aluminum/Makefile:LIB_DEPENDS=	libhwloc.so:devel/hwloc \
net/mpich/Makefile:LIB_DEPENDS=	libhwloc.so:devel/hwloc2	\
net/openmpi/Makefile:LIB_DEPENDS=	libhwloc.so:devel/hwloc2 \
net/openmpi3/Makefile:LIB_DEPENDS=	libhwloc.so:devel/hwloc \
net-p2p/xmrig/Makefile:HWLOC_LIB_DEPENDS+=		libhwloc.so:devel/hwloc
science/gromacs/Makefile:LIB_DEPENDS=	libhwloc.so:devel/hwloc
security/snort3/Makefile:		libhwloc.so:devel/hwloc \
sysutils/slurm-wlm/Makefile:HWLOC_LIB_DEPENDS=	libhwloc.so.15:devel/hwloc2
www/trafficserver/Makefile:		libhwloc.so:devel/hwloc \

FreeBSD orca.acadix  bacon /usr/ports 415: grep -l openmpi `port-grep 'hwloc.so' -l`
net/aluminum/Makefile
net/openmpi/Makefile
net/openmpi3/Makefile
science/gromacs/Makefile

FreeBSD orca.acadix  bacon /usr/ports 416: grep -l mpich `port-grep 'hwloc.so' -l`
net/mpich/Makefile
science/gromacs/Makefile
Comment 5 Martin Birgmeier 2021-01-08 15:56:41 UTC
Danilo,

Thank you for your patches, for me they work nicely:

[0]# pkg query %rn-%rv hwloc2-2.3.0 
openmpi-4.1.0
mpich-3.4
slurm-wlm-20.02.1_5
[0]# pkg query %rn-%rv openmpi-4.1.0 
[0]# pkg query %rn-%rv mpich-3.4    
arpack-ng-3.8.0
vtk8-8.2.0
[0]# pkg query %rn-%rv slurm-wlm-20.02.1_5 
openmpi-4.1.0
mpich-3.4
[0]# 

Jason (comment 3 and comment 4): For the ports I need the new hwloc2 is the better choice, so going back to hwloc would not be preferred.

-- Martin
Comment 6 Jason W. Bacon freebsd_committer 2021-01-08 17:18:40 UTC
(In reply to Martin Birgmeier from comment #5)

For sure hwloc2 is preferred.  I'm only suggesting a brief temporary roll-back to avoid failures in other dependent ports like gromacs until we verify that they all work with hwloc2.
Comment 7 Jason W. Bacon freebsd_committer 2021-01-08 18:03:14 UTC
I'm testing the remaining ports that use hwloc and will submit PRs with appropriate patches.  So far lang/pocl has failed, but it has only a simple, direct dependency on hwloc, so it's not urgent.
Comment 8 Jason W. Bacon freebsd_committer 2021-01-08 20:57:41 UTC
Committed devel/hpx, math/mprime, and science/gromacs.

Still open:

252523 	math/mprime
252528 	www/trafficserver
252522 	lang/pocl
252527 	security/snort3
252525 	net-p2p/xmrig
Comment 9 Thomas Guymer 2021-01-09 10:41:07 UTC
Created attachment 221418 [details]
portmaster build log

I have attached the (compressed) output of "portmaster openmpi" on my FreeBSD machine. As you can see, the compilation of OpenMPI 4.1.0 fails due to AVX512 issues. Is this related to hwloc2? As you can see from the top of the output, portmaster reports that "All dependencies are up to date". If this is a new issue then I am happy to open another bug report. For reference, the FreeBSD machine that I am attempting this on has an Intel i7-4770 CPU. Wikipedia claims that this CPU does not have AVX512, only AVX and AVX2 -- so why is OpenMPI 4.1.0 trying to compile with that extension without checking?
Comment 10 Danilo Egea Gondolfo freebsd_committer 2021-01-09 12:21:11 UTC
Hi Thomas,

It's a known issue with OpenMPI 4.1, I didn't catch it because my machine supports it. Can you try the workaround suggested here [1]? Try to add --enable-mca-no-build=op-avx to the CONFIGURE_ARGS list.



[1] - https://github.com/open-mpi/ompi/issues/8306
Comment 11 Danilo Egea Gondolfo freebsd_committer 2021-01-09 13:10:53 UTC
Created attachment 221420 [details]
openmpi_avx
Comment 12 Danilo Egea Gondolfo freebsd_committer 2021-01-09 13:11:58 UTC
Can you try the attached patch? I added an option to enable AVX (disabled by default). I might remove it once upstream fixes the AVX auto-detection.
Comment 13 Jason W. Bacon freebsd_committer 2021-01-09 14:48:53 UTC
(In reply to Thomas Guymer from comment #9)

The build failure for lang/pocl is also related to avx and it only happens with hwloc2, not hwloc.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252522
Comment 14 Thomas Guymer 2021-01-10 10:58:47 UTC
Created attachment 221438 [details]
portmaster build log 2

(In reply to Danilo Egea Gondolfo from comment #10)

I added "op-avx" to the comma-separated list of names in the pre-existing "--enable-mca-no-build" bit of the "CONFIGURE_ARGS" bit of the Makefile and rebuilt it. It appears that the build was successful, however, the installation failed with the following message:

===>  Installing for openmpi-4.1.0
===>  Checking if openmpi is already installed
===>   Registering installation for openmpi-4.1.0
pkg-static: Unable to access file /usr/ports/net/openmpi/work/stage/usr/local/mpi/openmpi/lib/openmpi/mca_op_avx.la:No such file or directory
pkg-static: Unable to access file /usr/ports/net/openmpi/work/stage/usr/local/mpi/openmpi/lib/openmpi/mca_op_avx.so:No such file or directory

...I have attached the full (compressed) build log.

Thanks
Comment 15 Thomas Guymer 2021-01-10 11:02:51 UTC
(In reply to Danilo Egea Gondolfo from comment #10)

It is also interesting to note that, in the attached build log #2, even though the installation worked this time (due to adding "op-avx" into the Makefile like you suggested) it still tries to determine if it can build "op-avx", like this:

--- MCA component op:avx (m4 configuration macro)
checking for MCA component op:avx compile mode... dso
checking for AVX512 support (no additional flags)... no
checking for AVX512 support (with -march=skylake-avx512)... yes
checking if _mm512_loadu_si512 generates code that can be compiled... yes
checking if _mm512_mullo_epi64 generates code that can be compiled... yes
checking for AVX2 support (no additional flags)... yes
checking if _mm256_loadu_si256 generates code that can be compiled... no
checking for AVX support (no additional flags)... yes
checking for SSE4.1 support... yes
checking for SSE3 support... yes
checking if MCA component op:avx can compile... no

... fortunately it concludes that it cannot compile the "op-avx" component.
Comment 16 Thomas Guymer 2021-01-10 11:11:17 UTC
(In reply to Danilo Egea Gondolfo from comment #12)

I don't know how to apply patches, so I just manually added the "--enable-mca-no-build=op-avx" line to the Makefile and I manually removed the "%%MPIDIR%%/lib/openmpi/mca_op_avx.la" and "%%MPIDIR%%/lib/openmpi/mca_op_avx.so" lines from the pkg-plist and then the installation worked for me - thank you!

This patch gets two thumbs up from me (it did *not* test the new option to "make configure").
Comment 17 commit-hook freebsd_committer 2021-01-10 12:03:47 UTC
A commit references this bug:

Author: danilo
Date: Sun Jan 10 12:03:36 UTC 2021
New revision: 561056
URL: https://svnweb.freebsd.org/changeset/ports/561056

Log:
  net/openmpi: Add an option to enable AVX support

  OpenMPI 4.1 fails to detect if the host supports AVX instructions and will fail to build if it doesn't [1].

  Also, include the ABI version to the hwloc2 library dependency. If the user has devel/hwloc (and not devel/hwloc2) installed it will satify the dependency check anyway and link against the wrong lib.

  [1] - https://github.com/open-mpi/ompi/issues/8306

  PR:		252506

Changes:
  head/net/openmpi/Makefile
  head/net/openmpi/pkg-plist
Comment 18 Danilo Egea Gondolfo freebsd_committer 2021-01-10 12:06:12 UTC
Just committed the patch. Please, update your ports tree and try to build it again.
Comment 19 Thomas Guymer 2021-01-10 13:11:53 UTC
It took a while for the patch to get pushed out to all of the mirrors, but it finally arrived. I can confirm that unsetting AVX in "make config" enables OpenMPI 4.1 to be installed on my non-AVX512 system correctly - no manual edits required. Thank you very much.
Comment 20 Danilo Egea Gondolfo freebsd_committer 2021-01-11 19:47:33 UTC
Fixed, thanks!