Bug 242272

Summary: LinuxKPI combines all RCU and SRCU domains together, leading to deadlock
Product: Base System Reporter: Andrew Boyer <aboyer>
Component: kernAssignee: Hans Petter Selasky <hselasky>
Status: Closed FIXED    
Severity: Affects Some People CC: emaste, hselasky, kib
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Possible patch
none
Split RCU domains none

Description Andrew Boyer 2019-11-27 18:57:21 UTC
In that other OS, all RCU code is in the one domain, and each SRCU user gets its own domain. (More or less.) They are all combined together in FreeBSD since it has only a bare-bones implementation in the KPI.

This is a problem because ib_uverbs holds an SRCU read lock when calling into the provider's destroy_cq function. Providers may expect to be able to use RCU primitives when tearing down, but calling synchronize_rcu() or synchronize_srcu() will lead to a deadlock, even on a completely separate SRCU domain.

To fix this will require adding real multiple-domain support to the KPI.
Comment 1 commit-hook freebsd_committer 2019-11-29 13:34:15 UTC
A commit references this bug:

Author: pkubaj
Date: Fri Nov 29 13:33:14 UTC 2019
New revision: 518649
URL: https://svnweb.freebsd.org/changeset/ports/518649

Log:
  games/stuntrally: fix build on PPC with clang and ARM

  Upstream PR:
  https://github.com/stuntrally/stuntrally/pull/18

  We had until now CXXFLAGS_gcc=-Wno-narrowing, but it looks like this was incorrect because it did not fix the original issue.

  PR:		242272
  Approved by:	linimon (mentor), amdmi3 (maintainer)

Changes:
  head/games/stuntrally/Makefile
  head/games/stuntrally/files/patch-source_editor_CApp.h
Comment 2 Hans Petter Selasky freebsd_committer 2020-03-03 15:54:25 UTC
Sorry I didn't see this earlier on. Has this deadlock been observed in practice?
Comment 3 Andrew Boyer 2020-03-03 15:56:56 UTC
It was easy to hit in code that we have not yet upstreamed.

The design has changed in the meantime and we're not using RCU to protect our QP/CQ objects any more.

This was a right pain to debug though - maybe it could be documented as a XXX somewhere?
Comment 4 Hans Petter Selasky freebsd_committer 2020-03-03 16:01:27 UTC
Would it help if SRCU and RCU were separated?

That all SRCU had one domain and all RCU had another domain?

The RCU implementation in the LinuxKPI does not have as many asserts and debug knobs as the EPOCH() implementation in the kernel.

--HPS
Comment 5 Andrew Boyer 2020-03-03 18:43:14 UTC
That would work for what we tried to do. SRCU should still be marked XXX IMO.
Comment 6 Andrew Boyer 2020-03-03 18:43:55 UTC
Created attachment 212127 [details]
Possible patch

Attached is a quick & dirty split of RCU/SRCU.
Comment 7 Hans Petter Selasky freebsd_committer 2020-04-07 12:41:27 UTC
Created attachment 213153 [details]
Split RCU domains

A fix for this issue will soon be upstreamed.
Comment 8 commit-hook freebsd_committer 2020-04-08 17:10:44 UTC
A commit references this bug:

Author: hselasky
Date: Wed Apr  8 17:09:46 UTC 2020
New revision: 359727
URL: https://svnweb.freebsd.org/changeset/base/359727

Log:
  Clone the RCU interface into a sleepable and a non-sleepable part
  in the LinuxKPI.

  This allows synchronize RCU to be used inside a SRCU read section.
  No functional change intended.

  Bump the __FreeBSD_version to force recompilation of external kernel modules.

  PR:		242272
  MFC after:	1 week
  Sponsored by:	Mellanox Technologies

Changes:
  head/sys/compat/linuxkpi/common/include/linux/rcupdate.h
  head/sys/compat/linuxkpi/common/src/linux_rcu.c
  head/sys/sys/param.h
Comment 9 commit-hook freebsd_committer 2020-04-15 04:56:39 UTC
A commit references this bug:

Author: hselasky
Date: Wed Apr 15 04:56:28 UTC 2020
New revision: 359957
URL: https://svnweb.freebsd.org/changeset/base/359957

Log:
  MFC r359727:
  Clone the RCU interface into a sleepable and a non-sleepable part
  in the LinuxKPI.

  This allows synchronize RCU to be used inside a SRCU read section.
  No functional change intended.

  Bump the __FreeBSD_version to force recompilation of external kernel modules.

  PR:		242272
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/12/
  stable/12/sys/compat/linuxkpi/common/include/linux/rcupdate.h
  stable/12/sys/compat/linuxkpi/common/src/linux_rcu.c
  stable/12/sys/sys/param.h
Comment 10 commit-hook freebsd_committer 2020-04-15 05:03:42 UTC
A commit references this bug:

Author: hselasky
Date: Wed Apr 15 05:02:50 UTC 2020
New revision: 359958
URL: https://svnweb.freebsd.org/changeset/base/359958

Log:
  MFC r359727:
  Clone the RCU interface into a sleepable and a non-sleepable part
  in the LinuxKPI.

  This allows synchronize RCU to be used inside a SRCU read section.
  No functional change intended.

  Bump the __FreeBSD_version to force recompilation of external kernel modules.

  PR:		242272
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/11/
  stable/11/sys/compat/linuxkpi/common/include/linux/rcupdate.h
  stable/11/sys/compat/linuxkpi/common/src/linux_rcu.c
  stable/11/sys/sys/param.h
Comment 11 commit-hook freebsd_committer 2020-08-11 12:18:03 UTC
A commit references this bug:

Author: hselasky
Date: Tue Aug 11 12:17:47 UTC 2020
New revision: 364109
URL: https://svnweb.freebsd.org/changeset/base/364109

Log:
  Need to clone the task struct fields related to RCU aswell in the
  LinuxKPI after r359727. This fixes a minor regression issue. Else the
  priority tracking won't work properly when both sleepable and
  non-sleepable RCU is in use on the same thread.

  Bump the __FreeBSD_version to force recompilation of external kernel
  modules.

  PR:		242272
  MFC after:	1 week
  Sponsored by:	Mellanox Technologies

Changes:
  head/sys/compat/linuxkpi/common/include/linux/sched.h
  head/sys/compat/linuxkpi/common/src/linux_rcu.c
  head/sys/sys/param.h
Comment 12 commit-hook freebsd_committer 2020-08-19 13:33:28 UTC
A commit references this bug:

Author: hselasky
Date: Wed Aug 19 13:32:56 UTC 2020
New revision: 364390
URL: https://svnweb.freebsd.org/changeset/base/364390

Log:
  MFC r364109:
  Need to clone the task struct fields related to RCU aswell in the
  LinuxKPI after r359727. This fixes a minor regression issue. Else the
  priority tracking won't work properly when both sleepable and
  non-sleepable RCU is in use on the same thread.

  Bump the __FreeBSD_version to force recompilation of external kernel
  modules.

  PR:		242272
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/12/
  stable/12/sys/compat/linuxkpi/common/include/linux/sched.h
  stable/12/sys/compat/linuxkpi/common/src/linux_rcu.c
  stable/12/sys/sys/param.h
Comment 13 commit-hook freebsd_committer 2020-08-19 13:36:30 UTC
A commit references this bug:

Author: hselasky
Date: Wed Aug 19 13:35:32 UTC 2020
New revision: 364391
URL: https://svnweb.freebsd.org/changeset/base/364391

Log:
  MFC r364109:
  Need to clone the task struct fields related to RCU aswell in the
  LinuxKPI after r359727. This fixes a minor regression issue. Else the
  priority tracking won't work properly when both sleepable and
  non-sleepable RCU is in use on the same thread.

  Bump the __FreeBSD_version to force recompilation of external kernel
  modules.

  PR:		242272
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/11/
  stable/11/sys/compat/linuxkpi/common/include/linux/sched.h
  stable/11/sys/compat/linuxkpi/common/src/linux_rcu.c
  stable/11/sys/sys/param.h