Created attachment 221408 [details] Build output When trying to build net/mpich-3.4, multiple build errors occur while compiling src/mpl/src/gpu/mpl_gpu_ze.lo. The build output is attached, but it looks like a missing/incorrect dependency. As an example: ``` src/gpu/mpl_gpu_ze.c:280:19: error: use of undeclared identifier 'device_handles'; did you mean 'dev_handle'? *dev_handle = device_handles[dev_id]; ^~~~~~~~~~~~~~ dev_handle src/gpu/mpl_gpu_ze.c:278:66: note: 'dev_handle' declared here int MPL_gpu_get_dev_handle(int dev_id, MPL_gpu_device_handle_t * dev_handle) ^ ```
Could you please describe your platform? (uname -mrU)
(In reply to Thierry Thomas from comment #1) Of course. > uname -mrU 12.2-RELEASE-p1 amd64 1202000
The encountered error seems caused by the file /usr/local/include/level_zero/ze_api.h and I don't know it: it does not exist on my machines! Could you please report the output of pkg which /usr/local/include/level_zero/ze_api.h A log of the different config.log could also be interesting, specially if you change the options. For comparison, the output of a build session in a clean jail (poudriere) is available at: https://people.freebsd.org/~thierry/mpich-3.4.log and it does not display anything related the reported error on src/gpu/mpl_gpu_ze.c.
(In reply to Thierry Thomas from comment #3) > pkg which /usr/local/include/level_zero/ze_api.h /usr/local/include/level_zero/ze_api.h was installed by package level-zero-1.0.26 I'm not sure where that package came from, since it wasn't a dependency for anything. Removing it allowed the build to complete as planned. I did see a block in the configuration stage that refers to level-zero: > checking level_zero/ze_api.h usability... no > checking level_zero/ze_api.h presence... no > checking for level_zero/ze_api.h... no > checking for zeInit in -lze_loader... no Seems to be the source of the issue.
(In reply to Nick from comment #0) > src/gpu/mpl_gpu_ze.c:280:19: error: use of undeclared identifier 'device_handles' device_handles doesn't show up in any change under https://github.com/oneapi-src/level-zero. Which version of level-zero is expected by mpich? Does it build on Linux? For example, src/pm/hydra2/mpl/src/gpu/mpl_gpu_ze.c has "ze_device_handle_t *global_ze_devices_handle;". Maybe device_handles is a leftover from before global_ze_devices_handle was renamed e.g., https://github.com/pmodels/mpich/commit/4c1ed41821b4
A commit references this bug: Author: jbeich Date: Sat Jan 9 17:51:20 UTC 2021 New revision: 560881 URL: https://svnweb.freebsd.org/changeset/ports/560881 Log: net/mpich: unbreak with level-zero after r560756 level-zero is pulled as a build-only dependency of intel-compute-runtime. mpich support for level-zero is broken and uses pre-1.0 API (before r545238). src/gpu/mpl_gpu_ze.c:123:11: warning: implicit declaration of function 'zeDriverGetMemIpcHandle' is invalid in C99 [-Wimplicit-function-declaration] ret = zeDriverGetMemIpcHandle(global_ze_driver_handle, ptr, ipc_handle); ^ src/gpu/mpl_gpu_ze.c:139:9: warning: implicit declaration of function 'zeDriverOpenMemIpcHandle' is invalid in C99 [-Wimplicit-function-declaration] zeDriverOpenMemIpcHandle(global_ze_driver_handle, ^ src/gpu/mpl_gpu_ze.c:140:70: error: no member named 'global_dev_id' in 'struct _ze_ipc_mem_handle_t' global_ze_devices_handle[ipc_handle.global_dev_id], ~~~~~~~~~~ ^ src/gpu/mpl_gpu_ze.c:141:45: error: no member named 'handle' in 'struct _ze_ipc_mem_handle_t' ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr); ~~~~~~~~~~ ^ src/gpu/mpl_gpu_ze.c:141:53: error: use of undeclared identifier 'ZE_IPC_MEMORY_FLAG_NONE'; did you mean 'ZE_IPC_MEMORY_FLAG_TBD'? ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr); ^~~~~~~~~~~~~~~~~~~~~~~ src/gpu/mpl_gpu_ze.c:156:11: warning: implicit declaration of function 'zeDriverCloseMemIpcHandle' is invalid in C99 [-Wimplicit-function-declaration] ret = zeDriverCloseMemIpcHandle(global_ze_driver_handle, ptr); ^ src/gpu/mpl_gpu_ze.c:171:11: warning: implicit declaration of function 'zeDriverGetMemAllocProperties' is invalid in C99 [-Wimplicit-function-declaration] ret = zeDriverGetMemAllocProperties(global_ze_driver_handle, ptr, &ptr_attr, &device); ^ src/gpu/mpl_gpu_ze.c:202:25: error: use of undeclared identifier 'ZE_DEVICE_MEM_ALLOC_FLAG_DEFAULT' device_desc.flags = ZE_DEVICE_MEM_ALLOC_FLAG_DEFAULT; ^ src/gpu/mpl_gpu_ze.c:204:17: error: no member named 'version' in 'struct _ze_device_mem_alloc_desc_t' device_desc.version = ZE_DEVICE_MEM_ALLOC_DESC_VERSION_CURRENT; ~~~~~~~~~~~ ^ src/gpu/mpl_gpu_ze.c:204:27: error: use of undeclared identifier 'ZE_DEVICE_MEM_ALLOC_DESC_VERSION_CURRENT' device_desc.version = ZE_DEVICE_MEM_ALLOC_DESC_VERSION_CURRENT; ^ src/gpu/mpl_gpu_ze.c:208:11: warning: implicit declaration of function 'zeDriverAllocDeviceMem' is invalid in C99 [-Wimplicit-function-declaration] ret = zeDriverAllocDeviceMem(global_ze_driver_handle, &device_desc, ^ src/gpu/mpl_gpu_ze.c:223:23: error: use of undeclared identifier 'ZE_HOST_MEM_ALLOC_FLAG_DEFAULT' host_desc.flags = ZE_HOST_MEM_ALLOC_FLAG_DEFAULT; ^ src/gpu/mpl_gpu_ze.c:224:15: error: no member named 'version' in 'struct _ze_host_mem_alloc_desc_t' host_desc.version = ZE_HOST_MEM_ALLOC_DESC_VERSION_CURRENT; ~~~~~~~~~ ^ src/gpu/mpl_gpu_ze.c:224:25: error: use of undeclared identifier 'ZE_HOST_MEM_ALLOC_DESC_VERSION_CURRENT' host_desc.version = ZE_HOST_MEM_ALLOC_DESC_VERSION_CURRENT; ^ src/gpu/mpl_gpu_ze.c:229:11: warning: implicit declaration of function 'zeDriverAllocHostMem' is invalid in C99 [-Wimplicit-function-declaration] ret = zeDriverAllocHostMem(global_ze_driver_handle, &host_desc, size, mem_alignment, ptr); ^ src/gpu/mpl_gpu_ze.c:240:11: warning: implicit declaration of function 'zeDriverFreeMem' is invalid in C99 [-Wimplicit-function-declaration] ret = zeDriverFreeMem(global_ze_driver_handle, ptr); ^ src/gpu/mpl_gpu_ze.c:251:11: warning: implicit declaration of function 'zeDriverFreeMem' is invalid in C99 [-Wimplicit-function-declaration] ret = zeDriverFreeMem(global_ze_driver_handle, ptr); ^ src/gpu/mpl_gpu_ze.c:280:19: error: use of undeclared identifier 'device_handles'; did you mean 'dev_handle'? *dev_handle = device_handles[dev_id]; ^~~~~~~~~~~~~~ PR: 252536 Reported by: Nick, thierry Changes: head/net/mpich/Makefile
Nick, could we close this PR?
Yes, I believe so.
Closed after submitter´s feedback. Thanks!
You might want to have look at the following MPICH comments/developments that might be related to this bug entry https://github.com/nwchemgit/nwchem/issues/463#issuecomment-953060633 https://github.com/pmodels/mpich/pull/5623
(In reply to Edoardo Aprà from comment #10) Thanks for the feedback! Have you been able to reproduce the reported issue?
I have been able to reproduce the issue reported in https://github.com/nwchemgit/nwchem/issues/463#issue-1034059467 on a VirtualBox image of FreeBSD 13. It only shows up when the current mpich port is used (no problems with either openmpi of mpich2). Valgrind shows memory issues in the mpich layer even prior to the fatal segv.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=a46966bb3496e0cf8100f6acdd671d4fb90c9cdb commit a46966bb3496e0cf8100f6acdd671d4fb90c9cdb Author: Jan Beich <jbeich@FreeBSD.org> AuthorDate: 2021-10-27 21:54:51 +0000 Commit: Jan Beich <jbeich@FreeBSD.org> CommitDate: 2021-10-27 22:19:27 +0000 net/mpich: replace L0 fix with upstream version PR: 252536 Reported by: Edoardo Aprà net/mpich/Makefile | 1 - net/mpich/files/patch-l0-1.4.1 (new) | 50 ++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=dea82318648e46c157874bd1079f50b23c9c08d0 commit dea82318648e46c157874bd1079f50b23c9c08d0 Author: Jan Beich <jbeich@FreeBSD.org> AuthorDate: 2021-10-27 23:48:16 +0000 Commit: Jan Beich <jbeich@FreeBSD.org> CommitDate: 2021-10-28 00:08:04 +0000 net/mpich: switch L0=off to --without-ze after 697c7df81364 https://github.com/pmodels/mpich/commit/67b1e07851fe https://github.com/pmodels/mpich/commit/84ae6243139c PR: 252536 Reported by: Edoardo Aprà net/mpich/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)