Bug 278845

Summary: OpenMP Patches to prevent locking, hanging and CPU limiting to single core.
Product: Base System Reporter: cbl
Component: threadsAssignee: Dimitry Andric <dim>
Status: Closed FIXED    
Severity: Affects Only Me CC: dim
Priority: --- Flags: dim: mfc-stable14+
dim: mfc-stable13+
Version: 14.0-STABLE   
Hardware: Any   
OS: Any   

Description cbl 2024-05-07 22:49:04 UTC
We worked with LLVM OpenMP development group to get a couple of PR fixes added to fix some big bugs we've been experiencing. 

Fix#1 - Fixes forked processes from hanging:
https://github.com/llvm/llvm-project/pull/88539
Original issue reported: https://github.com/llvm/llvm-project/issues/86684

Fix#2 - Fixes child processes to use affinity_none. Before they were limited to using a single cpu core for all child processes. 
https://github.com/llvm/llvm-project/pull/91391
Original issue reported: https://github.com/llvm/llvm-project/issues/91098

Fix#2 is only needed in version of llvm 16.x and later.  My original issue shows it's not an issue in 14.x or 15.x. Since FreeBSD 14.0 and 13.3 leverage 16.x and 17.x, this bug was triggered when we upgraded servers to either release. 

Hoping we can get both into 14.1 in time for it's upcoming release.
Comment 1 Dimitry Andric freebsd_committer freebsd_triage 2024-05-08 11:26:07 UTC
Fix #1 was committed upstream as https://github.com/llvm/llvm-project/commit/5300a6731e98fbcf7bca68374e934de737166698, and it applies with a little fuzz (there were some other changes that inserted DragonFlyBSD handling, which changed the context lines). So I can import that right away.

Fix #2 is has been approved upstream, but is not yet committed. I would prefer to wait until it is committed and "cooked" a little, then I will also import it.
Comment 3 commit-hook freebsd_committer freebsd_triage 2024-05-08 18:48:08 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=22b3e7898ecdf90887a9536fab5b9a6f7a291723

commit 22b3e7898ecdf90887a9536fab5b9a6f7a291723
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 18:44:28 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-08 18:45:45 +0000

    Merge commit 73bb8d9d92f6 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix child processes to use affinity_none (#91391)

      When a child process is forked with OpenMP already initialized, the
      child process resets its affinity mask and sets proc-bind-var to false
      so that the entire original affinity mask is used. This patch corrects
      an issue with the affinity initialization code setting affinity to
      compact instead of none for this special case of forked children.

      The test trying to catch this only testing explicit setting of
      KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting.

      Fixes: #91098

    This should fix OpenMP processes sometimes getting stuck on a single CPU
    core.

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    MFC after:      3 days

 contrib/llvm-project/openmp/runtime/src/kmp_settings.cpp | 2 ++
 1 file changed, 2 insertions(+)
Comment 4 commit-hook freebsd_committer freebsd_triage 2024-05-08 18:48:10 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=da15ed2e982180198f77a0fa26628e6d414cb10e

commit da15ed2e982180198f77a0fa26628e6d414cb10e
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 16:55:08 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-08 18:45:44 +0000

    Merge commit 5300a6731e98 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix re-locking hang found in issue 86684 (#88539)

      This was initially reported here (including stacktraces):
      https://stackoverflow.com/questions/78183545/does-compiling-imagick-with-openmp-enabled-in-freebsd-13-2-cause-sched-yield

      If `__kmp_register_library_startup()` detects that another instance of
      the library is present, `__kmp_is_address_mapped()` is eventually
      called. which uses `kmpc_alloc()` to allocate memory. This function
      calls `__kmp_entry_thread()` to access the thread-local memory pool,
      which is a bad idea during initialization. This macro internally calls
      `__kmp_get_global_thread_id_reg()` which sets the bootstrap lock at the
      beginning (before calling `__kmp_register_library_startup()`).

      The fix is to use `KMP_INTERNAL_MALLOC()`/`KMP_INTERNAL_FREE()` instead
      of `kmpc_malloc()`/`kmpc_free()`. `KMP_INTERNAL_MALLOC` and
      `KMP_INTERNAL_FREE` do not use any bootstrap locks. They just translate
      to `malloc()`/`free()` and are meant to be used during library
      initialization before other library-specific allocators have been
      initialized.

      Fixes: #86684

    This should fix OpenMP processes sometimes getting locked with 100% CPU
    usage, endlessly calling sched_yield(2).

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    MFC after:      3 days

 contrib/llvm-project/openmp/runtime/src/z_Linux_util.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2024-05-11 08:57:41 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=426e07d791641e80e90af89d52008635a35e4794

commit 426e07d791641e80e90af89d52008635a35e4794
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 16:55:08 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-11 08:56:22 +0000

    Merge commit 5300a6731e98 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix re-locking hang found in issue 86684 (#88539)

      This was initially reported here (including stacktraces):
      https://stackoverflow.com/questions/78183545/does-compiling-imagick-with-openmp-enabled-in-freebsd-13-2-cause-sched-yield

      If `__kmp_register_library_startup()` detects that another instance of
      the library is present, `__kmp_is_address_mapped()` is eventually
      called. which uses `kmpc_alloc()` to allocate memory. This function
      calls `__kmp_entry_thread()` to access the thread-local memory pool,
      which is a bad idea during initialization. This macro internally calls
      `__kmp_get_global_thread_id_reg()` which sets the bootstrap lock at the
      beginning (before calling `__kmp_register_library_startup()`).

      The fix is to use `KMP_INTERNAL_MALLOC()`/`KMP_INTERNAL_FREE()` instead
      of `kmpc_malloc()`/`kmpc_free()`. `KMP_INTERNAL_MALLOC` and
      `KMP_INTERNAL_FREE` do not use any bootstrap locks. They just translate
      to `malloc()`/`free()` and are meant to be used during library
      initialization before other library-specific allocators have been
      initialized.

      Fixes: #86684

    This should fix OpenMP processes sometimes getting locked with 100% CPU
    usage, endlessly calling sched_yield(2).

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    MFC after:      3 days

    (cherry picked from commit da15ed2e982180198f77a0fa26628e6d414cb10e)

 contrib/llvm-project/openmp/runtime/src/z_Linux_util.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2024-05-11 08:57:43 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=91df7d335dd44fa3cf506b35987d791502613ed4

commit 91df7d335dd44fa3cf506b35987d791502613ed4
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 18:44:28 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-11 08:56:28 +0000

    Merge commit 73bb8d9d92f6 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix child processes to use affinity_none (#91391)

      When a child process is forked with OpenMP already initialized, the
      child process resets its affinity mask and sets proc-bind-var to false
      so that the entire original affinity mask is used. This patch corrects
      an issue with the affinity initialization code setting affinity to
      compact instead of none for this special case of forked children.

      The test trying to catch this only testing explicit setting of
      KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting.

      Fixes: #91098

    This should fix OpenMP processes sometimes getting stuck on a single CPU
    core.

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    MFC after:      3 days

    (cherry picked from commit 22b3e7898ecdf90887a9536fab5b9a6f7a291723)

 contrib/llvm-project/openmp/runtime/src/kmp_settings.cpp | 2 ++
 1 file changed, 2 insertions(+)
Comment 7 commit-hook freebsd_committer freebsd_triage 2024-05-11 08:57:44 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7b966dcc3ac5e413c668da4a6d567f9478321806

commit 7b966dcc3ac5e413c668da4a6d567f9478321806
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 16:55:08 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-11 08:56:33 +0000

    Merge commit 5300a6731e98 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix re-locking hang found in issue 86684 (#88539)

      This was initially reported here (including stacktraces):
      https://stackoverflow.com/questions/78183545/does-compiling-imagick-with-openmp-enabled-in-freebsd-13-2-cause-sched-yield

      If `__kmp_register_library_startup()` detects that another instance of
      the library is present, `__kmp_is_address_mapped()` is eventually
      called. which uses `kmpc_alloc()` to allocate memory. This function
      calls `__kmp_entry_thread()` to access the thread-local memory pool,
      which is a bad idea during initialization. This macro internally calls
      `__kmp_get_global_thread_id_reg()` which sets the bootstrap lock at the
      beginning (before calling `__kmp_register_library_startup()`).

      The fix is to use `KMP_INTERNAL_MALLOC()`/`KMP_INTERNAL_FREE()` instead
      of `kmpc_malloc()`/`kmpc_free()`. `KMP_INTERNAL_MALLOC` and
      `KMP_INTERNAL_FREE` do not use any bootstrap locks. They just translate
      to `malloc()`/`free()` and are meant to be used during library
      initialization before other library-specific allocators have been
      initialized.

      Fixes: #86684

    This should fix OpenMP processes sometimes getting locked with 100% CPU
    usage, endlessly calling sched_yield(2).

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    MFC after:      3 days

    (cherry picked from commit da15ed2e982180198f77a0fa26628e6d414cb10e)

 contrib/llvm-project/openmp/runtime/src/z_Linux_util.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2024-05-11 08:57:46 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e2de08bf70f4343ebcb455dedf1b77ac0d67f5ca

commit e2de08bf70f4343ebcb455dedf1b77ac0d67f5ca
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 18:44:28 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-11 08:56:35 +0000

    Merge commit 73bb8d9d92f6 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix child processes to use affinity_none (#91391)

      When a child process is forked with OpenMP already initialized, the
      child process resets its affinity mask and sets proc-bind-var to false
      so that the entire original affinity mask is used. This patch corrects
      an issue with the affinity initialization code setting affinity to
      compact instead of none for this special case of forked children.

      The test trying to catch this only testing explicit setting of
      KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting.

      Fixes: #91098

    This should fix OpenMP processes sometimes getting stuck on a single CPU
    core.

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    MFC after:      3 days

    (cherry picked from commit 22b3e7898ecdf90887a9536fab5b9a6f7a291723)

 contrib/llvm-project/openmp/runtime/src/kmp_settings.cpp | 2 ++
 1 file changed, 2 insertions(+)
Comment 9 commit-hook freebsd_committer freebsd_triage 2024-05-12 18:09:31 UTC
A commit in branch releng/14.1 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ba0bd7cea412c6dc51ebfebd4000a543e49013bd

commit ba0bd7cea412c6dc51ebfebd4000a543e49013bd
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 18:44:28 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-12 18:08:46 +0000

    Merge commit 73bb8d9d92f6 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix child processes to use affinity_none (#91391)

      When a child process is forked with OpenMP already initialized, the
      child process resets its affinity mask and sets proc-bind-var to false
      so that the entire original affinity mask is used. This patch corrects
      an issue with the affinity initialization code setting affinity to
      compact instead of none for this special case of forked children.

      The test trying to catch this only testing explicit setting of
      KMP_AFFINITY=none. Add test run for no KMP_AFFINITY setting.

      Fixes: #91098

    This should fix OpenMP processes sometimes getting stuck on a single CPU
    core.

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    Approved by:    re (cperciva)
    MFC after:      3 days

    (cherry picked from commit 22b3e7898ecdf90887a9536fab5b9a6f7a291723)
    (cherry picked from commit 91df7d335dd44fa3cf506b35987d791502613ed4)

 contrib/llvm-project/openmp/runtime/src/kmp_settings.cpp | 2 ++
 1 file changed, 2 insertions(+)
Comment 10 commit-hook freebsd_committer freebsd_triage 2024-05-12 18:09:34 UTC
A commit in branch releng/14.1 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=aec52a27f987eef4f21c07f83cdc783fbdede2db

commit aec52a27f987eef4f21c07f83cdc783fbdede2db
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2024-05-08 16:55:08 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2024-05-12 18:08:18 +0000

    Merge commit 5300a6731e98 from llvm-project (by Jonathan Peyton):

      [OpenMP] Fix re-locking hang found in issue 86684 (#88539)

      This was initially reported here (including stacktraces):
      https://stackoverflow.com/questions/78183545/does-compiling-imagick-with-openmp-enabled-in-freebsd-13-2-cause-sched-yield

      If `__kmp_register_library_startup()` detects that another instance of
      the library is present, `__kmp_is_address_mapped()` is eventually
      called. which uses `kmpc_alloc()` to allocate memory. This function
      calls `__kmp_entry_thread()` to access the thread-local memory pool,
      which is a bad idea during initialization. This macro internally calls
      `__kmp_get_global_thread_id_reg()` which sets the bootstrap lock at the
      beginning (before calling `__kmp_register_library_startup()`).

      The fix is to use `KMP_INTERNAL_MALLOC()`/`KMP_INTERNAL_FREE()` instead
      of `kmpc_malloc()`/`kmpc_free()`. `KMP_INTERNAL_MALLOC` and
      `KMP_INTERNAL_FREE` do not use any bootstrap locks. They just translate
      to `malloc()`/`free()` and are meant to be used during library
      initialization before other library-specific allocators have been
      initialized.

      Fixes: #86684

    This should fix OpenMP processes sometimes getting locked with 100% CPU
    usage, endlessly calling sched_yield(2).

    PR:             278845
    Reported by:    Cassidy B. Larson <cbl@cbl.us>
    Approved by:    re (cperciva)
    MFC after:      3 days

    (cherry picked from commit da15ed2e982180198f77a0fa26628e6d414cb10e)
    (cherry picked from commit 426e07d791641e80e90af89d52008635a35e4794)

 contrib/llvm-project/openmp/runtime/src/z_Linux_util.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 11 Dimitry Andric freebsd_committer freebsd_triage 2024-05-12 18:10:14 UTC
All merged, in time for 14.1-RELEASE.
Comment 12 Mark Johnston freebsd_committer freebsd_triage 2024-10-25 13:08:55 UTC
*** Bug 278703 has been marked as a duplicate of this bug. ***