Bug 252536 - net/mpich: Build fails on mpl/src/gpu/mpl_gpu_ze.c
Summary: net/mpich: Build fails on mpl/src/gpu/mpl_gpu_ze.c
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Thierry Thomas
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-09 09:01 UTC by Nick
Modified: 2021-01-09 17:52 UTC (History)
2 users (show)

See Also:


Attachments
Build output (6.77 KB, text/plain)
2021-01-09 09:01 UTC, Nick
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick 2021-01-09 09:01:45 UTC
Created attachment 221408 [details]
Build output

When trying to build net/mpich-3.4, multiple build errors occur while compiling src/mpl/src/gpu/mpl_gpu_ze.lo.  The build output is attached, but it looks like a missing/incorrect dependency.  As an example:

```
src/gpu/mpl_gpu_ze.c:280:19: error: use of undeclared identifier 'device_handles'; did you mean 'dev_handle'?
    *dev_handle = device_handles[dev_id];
                  ^~~~~~~~~~~~~~
                  dev_handle
src/gpu/mpl_gpu_ze.c:278:66: note: 'dev_handle' declared here
int MPL_gpu_get_dev_handle(int dev_id, MPL_gpu_device_handle_t * dev_handle)
                                                                 ^
```
Comment 1 Thierry Thomas freebsd_committer 2021-01-09 09:14:35 UTC
Could you please describe your platform? (uname -mrU)
Comment 2 Nick 2021-01-09 09:18:47 UTC
(In reply to Thierry Thomas from comment #1)

Of course.  

> uname -mrU
12.2-RELEASE-p1 amd64 1202000
Comment 3 Thierry Thomas freebsd_committer 2021-01-09 14:04:48 UTC
The encountered error seems caused by the file
/usr/local/include/level_zero/ze_api.h
and I don't know it: it does not exist on my machines!

Could you please report the output of
pkg which /usr/local/include/level_zero/ze_api.h

A log of the different config.log could also be interesting, specially if you change the options.

For comparison, the output of a build session in a clean jail (poudriere) is available at:
https://people.freebsd.org/~thierry/mpich-3.4.log
and it does not display anything related the reported error on src/gpu/mpl_gpu_ze.c.
Comment 4 Nick 2021-01-09 15:20:03 UTC
(In reply to Thierry Thomas from comment #3)

> pkg which /usr/local/include/level_zero/ze_api.h
/usr/local/include/level_zero/ze_api.h was installed by package level-zero-1.0.26

I'm not sure where that package came from, since it wasn't a dependency for anything.  Removing it allowed the build to complete as planned.

I did see a block in the configuration stage that refers to level-zero:
> checking level_zero/ze_api.h usability... no
> checking level_zero/ze_api.h presence... no
> checking for level_zero/ze_api.h... no
> checking for zeInit in -lze_loader... no

Seems to be the source of the issue.
Comment 5 Jan Beich freebsd_committer 2021-01-09 16:59:24 UTC
(In reply to Nick from comment #0)
> src/gpu/mpl_gpu_ze.c:280:19: error: use of undeclared identifier 'device_handles'

device_handles doesn't show up in any change under https://github.com/oneapi-src/level-zero. Which version of level-zero is expected by mpich? Does it build on Linux?

For example, src/pm/hydra2/mpl/src/gpu/mpl_gpu_ze.c has "ze_device_handle_t *global_ze_devices_handle;". Maybe device_handles is a leftover from before global_ze_devices_handle was renamed e.g.,

https://github.com/pmodels/mpich/commit/4c1ed41821b4
Comment 6 commit-hook freebsd_committer 2021-01-09 17:52:00 UTC
A commit references this bug:

Author: jbeich
Date: Sat Jan  9 17:51:20 UTC 2021
New revision: 560881
URL: https://svnweb.freebsd.org/changeset/ports/560881

Log:
  net/mpich: unbreak with level-zero after r560756

  level-zero is pulled as a build-only dependency of intel-compute-runtime.
  mpich support for level-zero is broken and uses pre-1.0 API (before r545238).

  src/gpu/mpl_gpu_ze.c:123:11: warning: implicit declaration of function 'zeDriverGetMemIpcHandle' is invalid in C99 [-Wimplicit-function-declaration]
      ret = zeDriverGetMemIpcHandle(global_ze_driver_handle, ptr, ipc_handle);
            ^
  src/gpu/mpl_gpu_ze.c:139:9: warning: implicit declaration of function 'zeDriverOpenMemIpcHandle' is invalid in C99 [-Wimplicit-function-declaration]
          zeDriverOpenMemIpcHandle(global_ze_driver_handle,
          ^
  src/gpu/mpl_gpu_ze.c:140:70: error: no member named 'global_dev_id' in 'struct _ze_ipc_mem_handle_t'
                                   global_ze_devices_handle[ipc_handle.global_dev_id],
                                                            ~~~~~~~~~~ ^
  src/gpu/mpl_gpu_ze.c:141:45: error: no member named 'handle' in 'struct _ze_ipc_mem_handle_t'
                                   ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr);
                                   ~~~~~~~~~~ ^
  src/gpu/mpl_gpu_ze.c:141:53: error: use of undeclared identifier 'ZE_IPC_MEMORY_FLAG_NONE'; did you mean 'ZE_IPC_MEMORY_FLAG_TBD'?
                                   ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr);
                                                      ^~~~~~~~~~~~~~~~~~~~~~~
  src/gpu/mpl_gpu_ze.c:156:11: warning: implicit declaration of function 'zeDriverCloseMemIpcHandle' is invalid in C99 [-Wimplicit-function-declaration]
      ret = zeDriverCloseMemIpcHandle(global_ze_driver_handle, ptr);
            ^
  src/gpu/mpl_gpu_ze.c:171:11: warning: implicit declaration of function 'zeDriverGetMemAllocProperties' is invalid in C99 [-Wimplicit-function-declaration]
      ret = zeDriverGetMemAllocProperties(global_ze_driver_handle, ptr, &ptr_attr, &device);
            ^
  src/gpu/mpl_gpu_ze.c:202:25: error: use of undeclared identifier 'ZE_DEVICE_MEM_ALLOC_FLAG_DEFAULT'
      device_desc.flags = ZE_DEVICE_MEM_ALLOC_FLAG_DEFAULT;
                          ^
  src/gpu/mpl_gpu_ze.c:204:17: error: no member named 'version' in 'struct _ze_device_mem_alloc_desc_t'
      device_desc.version = ZE_DEVICE_MEM_ALLOC_DESC_VERSION_CURRENT;
      ~~~~~~~~~~~ ^
  src/gpu/mpl_gpu_ze.c:204:27: error: use of undeclared identifier 'ZE_DEVICE_MEM_ALLOC_DESC_VERSION_CURRENT'
      device_desc.version = ZE_DEVICE_MEM_ALLOC_DESC_VERSION_CURRENT;
                            ^
  src/gpu/mpl_gpu_ze.c:208:11: warning: implicit declaration of function 'zeDriverAllocDeviceMem' is invalid in C99 [-Wimplicit-function-declaration]
      ret = zeDriverAllocDeviceMem(global_ze_driver_handle, &device_desc,
            ^
  src/gpu/mpl_gpu_ze.c:223:23: error: use of undeclared identifier 'ZE_HOST_MEM_ALLOC_FLAG_DEFAULT'
      host_desc.flags = ZE_HOST_MEM_ALLOC_FLAG_DEFAULT;
                        ^
  src/gpu/mpl_gpu_ze.c:224:15: error: no member named 'version' in 'struct _ze_host_mem_alloc_desc_t'
      host_desc.version = ZE_HOST_MEM_ALLOC_DESC_VERSION_CURRENT;
      ~~~~~~~~~ ^
  src/gpu/mpl_gpu_ze.c:224:25: error: use of undeclared identifier 'ZE_HOST_MEM_ALLOC_DESC_VERSION_CURRENT'
      host_desc.version = ZE_HOST_MEM_ALLOC_DESC_VERSION_CURRENT;
                          ^
  src/gpu/mpl_gpu_ze.c:229:11: warning: implicit declaration of function 'zeDriverAllocHostMem' is invalid in C99 [-Wimplicit-function-declaration]
      ret = zeDriverAllocHostMem(global_ze_driver_handle, &host_desc, size, mem_alignment, ptr);
            ^
  src/gpu/mpl_gpu_ze.c:240:11: warning: implicit declaration of function 'zeDriverFreeMem' is invalid in C99 [-Wimplicit-function-declaration]
      ret = zeDriverFreeMem(global_ze_driver_handle, ptr);
            ^
  src/gpu/mpl_gpu_ze.c:251:11: warning: implicit declaration of function 'zeDriverFreeMem' is invalid in C99 [-Wimplicit-function-declaration]
      ret = zeDriverFreeMem(global_ze_driver_handle, ptr);
            ^
  src/gpu/mpl_gpu_ze.c:280:19: error: use of undeclared identifier 'device_handles'; did you mean 'dev_handle'?
      *dev_handle = device_handles[dev_id];
                    ^~~~~~~~~~~~~~

  PR:		252536
  Reported by:	Nick, thierry

Changes:
  head/net/mpich/Makefile