Created attachment 225361 [details] v1 (apply via "git am") After ports 697c7df81364 mpich can offload some work to GPU, controlled by MPIR_CVAR_ENABLE_GPU=0 in environ(7). Currently, only lang/intel-compute-runtime provides L0 driver. Can someone check for regressions?
Thanks for the notification, but I have no device supporting GPU ATM.
(In reply to Thierry Thomas from comment #1) That's why I'm asking to "check for regressions". Enabling L0 shouldn't break GPU-less or GPU-incompatible setups.
(In reply to Jan Beich from comment #2) OK, everything seems good, on my workstation (without CUDA) and under my workload. BTW, nothing related with your change, but stage-qa displays this error: Error: /usr/local/bin/hydra_nameserver is linked to /usr/local/lib/libtorque.so.2 from sysutils/torque but it is not declared as a dependency Warning: you need LIB_DEPENDS+=libtorque.so:sysutils/torque Maybe Torque should be defined as an option, and disabled when not requested?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=88e134883dd2a2a78b909fdee59257513afe0c77 commit 88e134883dd2a2a78b909fdee59257513afe0c77 Author: Jan Beich <jbeich@FreeBSD.org> AuthorDate: 2021-05-29 17:29:17 +0000 Commit: Jan Beich <jbeich@FreeBSD.org> CommitDate: 2021-05-30 21:43:06 +0000 net/mpich: enable L0 by default for GPU support To disable at runtime set MPIR_CVAR_ENABLE_GPU=0 via environ(7). PR: 256244 Tested by: thierry net/mpich/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
It looks this one can be closed?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=b5815e7648a8e5307a20a234befa00e34306319d commit b5815e7648a8e5307a20a234befa00e34306319d Author: Henrik Gulbrandsen <henrik@gulbra.net> AuthorDate: 2021-08-12 14:35:20 +0000 Commit: Jan Beich <jbeich@FreeBSD.org> CommitDate: 2021-09-06 22:25:00 +0000 net/mpich: unbreak optimized runtime after 88e134883dd2 Runtime may fail without a L0 driver like intel-compute-runtime e.g., $ mpivars Abort(268484367) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(153): gpu_init failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=268484367 : system msg for write_line failure : Bad file descriptor Attempting to use an MPI routine before initializing MPICH $ MPIR_CVAR_ENABLE_GPU=0 mpivars Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(159)......: MPID_Init(591).............: MPIDI_SHM_mpi_init_hook(22): MPIDI_IPC_mpi_init_hook(36): MPIDI_GPU_mpi_init_hook(79): gpu_get_dev_count failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=2139535 : system msg for write_line failure : Bad file descriptor Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(159)......: MPID_Init(591).............: MPIDI_SHM_mpi_init_hook(22): MPIDI_IPC_mpi_init_hook(36): MPIDI_GPU_mpi_init_hook(79): gpu_get_dev_count failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=2139535 : system msg for write_line failure : Bad file descriptor Segmentation fault PR: 256244 (for tracking) net/mpich/Makefile | 2 +- net/mpich/files/patch-l0-fallback (new) | 44 +++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 1 deletion(-)
A commit in branch 2021Q3 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=fd490a171c3da0d7bcb9a5f3ee3b4b46075dfa9e commit fd490a171c3da0d7bcb9a5f3ee3b4b46075dfa9e Author: Henrik Gulbrandsen <henrik@gulbra.net> AuthorDate: 2021-08-12 14:35:20 +0000 Commit: Jan Beich <jbeich@FreeBSD.org> CommitDate: 2021-09-06 22:25:57 +0000 net/mpich: unbreak optimized runtime after 88e134883dd2 Runtime may fail without a L0 driver like intel-compute-runtime e.g., $ mpivars Abort(268484367) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(153): gpu_init failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=268484367 : system msg for write_line failure : Bad file descriptor Attempting to use an MPI routine before initializing MPICH $ MPIR_CVAR_ENABLE_GPU=0 mpivars Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(159)......: MPID_Init(591).............: MPIDI_SHM_mpi_init_hook(22): MPIDI_IPC_mpi_init_hook(36): MPIDI_GPU_mpi_init_hook(79): gpu_get_dev_count failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=2139535 : system msg for write_line failure : Bad file descriptor Abort(2139535) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(159)......: MPID_Init(591).............: MPIDI_SHM_mpi_init_hook(22): MPIDI_IPC_mpi_init_hook(36): MPIDI_GPU_mpi_init_hook(79): gpu_get_dev_count failed [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=2139535 : system msg for write_line failure : Bad file descriptor Segmentation fault PR: 256244 (for tracking) (cherry picked from commit b5815e7648a8e5307a20a234befa00e34306319d) net/mpich/Makefile | 2 +- net/mpich/files/patch-l0-fallback (new) | 44 +++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 1 deletion(-)