Bug 244703 - sys.netpfil.pf.nat.exhaust panics kernel
Summary: sys.netpfil.pf.nat.exhaust panics kernel
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: tests (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-testing (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-09 22:48 UTC by Li-Wen Hsu
Modified: 2021-01-29 01:21 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Li-Wen Hsu freebsd_committer freebsd_triage 2020-03-09 22:48:42 UTC
FreeBSD-head-amd64-test job starts randomly failing after:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14511/

and following builds:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14512/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14515/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14516/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14517/
...
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14528/

Console log:

sys/netpfil/pf/nat:exhaust  ->  panic: epair_qflush: ifp=0xfffff800ae6d5800, epair_softc gone? sc=0

cpuid = 0
time = 1583444839
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe002c6fe880
vpanic() at vpanic+0x185/frame 0xfffffe002c6fe8e0
panic() at panic+0x43/frame 0xfffffe002c6fe940
epair_qflush() at epair_qflush+0x1a8/frame 0xfffffe002c6fe990
if_down() at if_down+0x12d/frame 0xfffffe002c6fe9c0
if_detach_internal() at if_detach_internal+0x2de/frame 0xfffffe002c6fea20
if_vmove() at if_vmove+0x3c/frame 0xfffffe002c6fea70
vnet_if_return() at vnet_if_return+0x50/frame 0xfffffe002c6fea90
vnet_destroy() at vnet_destroy+0x130/frame 0xfffffe002c6feac0
prison_deref() at prison_deref+0x29d/frame 0xfffffe002c6feb00
taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe002c6feb80
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe002c6febb0
fork_exit() at fork_exit+0x80/frame 0xfffffe002c6febf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe002c6febf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100011 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10874b6(%rip)
db:0:kdb.enter.panic> show pcpu
cpuid        = 0
dynamic pcpu = 0x788d80
curthread    = 0xfffffe0008688c00: pid 0 tid 100011 critnest 1 "thread taskq"
curpcb       = 0xfffffe0008689110
fpcurthread  = none
idlethread   = 0xfffffe000a1f6300: tid 100003 "idle: cpu0"
self         = 0xffffffff82210000
curpmap      = 0xffffffff81d9ea50
tssp         = 0xffffffff82210384
rsp0         = 0xfffffe002c6fecc0
kcr3         = 0x8000000002119002
ucr3         = 0xffffffffffffffff
scr3         = 0x1012c8f99
gs32p        = 0xffffffff82210404
ldt          = 0xffffffff82210444
tss          = 0xffffffff82210434
tlb gen      = 307136
curvnet      = 0xfffff80003438380
spin locks held:
db:0:kdb.enter.panic>

It can be reproduced by running `kyua test sys.netpfil.pf.nat.exhaust` in a loop in the VM:

https://artifact.ci.freebsd.org/snapshot/head/r358683/amd64/amd64/disk-test.img.xz
Comment 1 commit-hook freebsd_committer freebsd_triage 2020-03-10 19:19:07 UTC
A commit references this bug:

Author: lwhsu
Date: Tue Mar 10 19:18:25 UTC 2020
New revision: 358852
URL: https://svnweb.freebsd.org/changeset/base/358852

Log:
  Skip sys.netpfil.pf.nat.exhaust on amd64 in CI as it sometimes panics kernel

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/netpfil/pf/nat.sh
Comment 2 Kristof Provost freebsd_committer freebsd_triage 2020-03-11 00:26:04 UTC
This looks like it's the same issue as #238870
Comment 3 commit-hook freebsd_committer freebsd_triage 2020-03-12 19:11:36 UTC
A commit references this bug:

Author: lwhsu
Date: Thu Mar 12 19:10:54 UTC 2020
New revision: 358918
URL: https://svnweb.freebsd.org/changeset/base/358918

Log:
  MFC r358852:

  Skip sys.netpfil.pf.nat.exhaust on amd64 in CI as it sometimes panics kernel

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/12/
  stable/12/tests/sys/netpfil/pf/nat.sh
Comment 4 commit-hook freebsd_committer freebsd_triage 2020-03-13 16:45:05 UTC
A commit references this bug:

Author: lwhsu
Date: Fri Mar 13 16:44:48 UTC 2020
New revision: 358961
URL: https://svnweb.freebsd.org/changeset/base/358961

Log:
  Skip sys.netpfil.pf.nat.exhaust on all platforms as it not only fails on amd64

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/netpfil/pf/nat.sh
Comment 5 commit-hook freebsd_committer freebsd_triage 2020-03-13 17:11:08 UTC
A commit references this bug:

Author: lwhsu
Date: Fri Mar 13 17:10:53 UTC 2020
New revision: 358964
URL: https://svnweb.freebsd.org/changeset/base/358964

Log:
  MFC r358961:

  Skip sys.netpfil.pf.nat.exhaust on all platforms as it not only fails on amd64

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/12/
  stable/12/tests/sys/netpfil/pf/nat.sh
Comment 6 commit-hook freebsd_committer freebsd_triage 2020-04-20 14:19:21 UTC
A commit references this bug:

Author: lwhsu
Date: Mon Apr 20 14:18:56 UTC 2020
New revision: 360120
URL: https://svnweb.freebsd.org/changeset/base/360120

Log:
  Temporarily disable sys.netinet.divert.* on i386

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/netinet/divert.sh
Comment 7 commit-hook freebsd_committer freebsd_triage 2020-09-08 14:54:31 UTC
A commit references this bug:

Author: kp
Date: Tue Sep  8 14:54:11 UTC 2020
New revision: 365457
URL: https://svnweb.freebsd.org/changeset/base/365457

Log:
  net: mitigate vnet / epair cleanup races

  There's a race where dying vnets move their interfaces back to their original
  vnet, and if_epair cleanup (where deleting one interface also deletes the other
  end of the epair). This is commonly triggered by the pf tests, but also by
  cleanup of vnet jails.

  As we've not yet been able to fix the root cause of the issue work around the
  panic by not dereferencing a NULL softc in epair_qflush() and by not
  re-attaching DYING interfaces.

  This isn't a full fix, but makes a very common panic far less likely.

  PR:		244703, 238870
  Reviewed by:	lutz_donnerhacke.de
  MFC after:	4 days
  Differential Revision:	https://reviews.freebsd.org/D26324

Changes:
  head/sys/net/if.c
  head/sys/net/if_epair.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2020-09-12 12:46:20 UTC
A commit references this bug:

Author: kp
Date: Sat Sep 12 12:45:32 UTC 2020
New revision: 365659
URL: https://svnweb.freebsd.org/changeset/base/365659

Log:
  MFC r365457:

  net: mitigate vnet / epair cleanup races

  There's a race where dying vnets move their interfaces back to their original
  vnet, and if_epair cleanup (where deleting one interface also deletes the other
  end of the epair). This is commonly triggered by the pf tests, but also by
  cleanup of vnet jails.

  As we've not yet been able to fix the root cause of the issue work around the
  panic by not dereferencing a NULL softc in epair_qflush() and by not
  re-attaching DYING interfaces.

  This isn't a full fix, but makes a very common panic far less likely.

  PR:		244703, 238870

Changes:
_U  stable/12/
  stable/12/sys/net/if.c
  stable/12/sys/net/if_epair.c
Comment 9 commit-hook freebsd_committer freebsd_triage 2020-09-12 18:59:05 UTC
A commit references this bug:

Author: kp
Date: Sat Sep 12 18:58:36 UTC 2020
New revision: 365669
URL: https://svnweb.freebsd.org/changeset/base/365669

Log:
  MFC r365457:

  net: mitigate vnet / epair cleanup races

  There's a race where dying vnets move their interfaces back to their original
  vnet, and if_epair cleanup (where deleting one interface also deletes the other
  end of the epair). This is commonly triggered by the pf tests, but also by
  cleanup of vnet jails.

  As we've not yet been able to fix the root cause of the issue work around the
  panic by not dereferencing a NULL softc in epair_qflush() and by not
  re-attaching DYING interfaces.

  This isn't a full fix, but makes a very common panic far less likely.

  PR:		244703, 238870
  Approved by:	re (gjb)

Changes:
_U  releng/12.2/
  releng/12.2/sys/net/if.c
  releng/12.2/sys/net/if_epair.c
Comment 10 commit-hook freebsd_committer freebsd_triage 2020-12-01 16:24:29 UTC
A commit references this bug:

Author: kp
Date: Tue Dec  1 16:24:00 UTC 2020
New revision: 368237
URL: https://svnweb.freebsd.org/changeset/base/368237

Log:
  if: Fix panic when destroying vnet and epair simultaneously

  When destroying a vnet and an epair (with one end in the vnet) we often
  panicked. This was the result of the destruction of the epair, which destroys
  both ends simultaneously, happening while vnet_if_return() was moving the
  struct ifnet to its home vnet. This can result in a freed ifnet being re-added
  to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
  used.

  Prevent this race by ensuring that vnet_if_return() cannot run at the same time
  as if_detach() or epair_clone_destroy().

  PR:		238870, 234985, 244703, 250870
  MFC after:	2 weeks
  Sponsored by:	Modirum MDPay
  Differential Revision:	https://reviews.freebsd.org/D27378

Changes:
  head/sys/net/if.c
Comment 11 commit-hook freebsd_committer freebsd_triage 2020-12-15 15:34:33 UTC
A commit references this bug:

Author: kp
Date: Tue Dec 15 15:33:29 UTC 2020
New revision: 368663
URL: https://svnweb.freebsd.org/changeset/base/368663

Log:
  MFC r368237:

  if: Fix panic when destroying vnet and epair simultaneously

  When destroying a vnet and an epair (with one end in the vnet) we often
  panicked. This was the result of the destruction of the epair, which destroys
  both ends simultaneously, happening while vnet_if_return() was moving the
  struct ifnet to its home vnet. This can result in a freed ifnet being re-added
  to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
  used.

  Prevent this race by ensuring that vnet_if_return() cannot run at the same time
  as if_detach() or epair_clone_destroy().

  PR:		238870, 234985, 244703, 250870
  Sponsored by:	Modirum MDPay

Changes:
_U  stable/12/
  stable/12/sys/net/if.c
Comment 12 commit-hook freebsd_committer freebsd_triage 2021-01-29 01:06:13 UTC
A commit in branch releng/12.1 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e0c15f45abd4bd5165e11b557a8c90d0faf5cfeb

commit e0c15f45abd4bd5165e11b557a8c90d0faf5cfeb
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-01-18 21:55:53 +0000
Commit:     Ed Maste <emaste@FreeBSD.org>
CommitDate: 2021-01-29 00:58:55 +0000

    MFC r368237: if: Fix panic when destroying vnet and epair simultaneously

    When destroying a vnet and an epair (with one end in the vnet) we often
    panicked. This was the result of the destruction of the epair, which destroys
    both ends simultaneously, happening while vnet_if_return() was moving the
    struct ifnet to its home vnet. This can result in a freed ifnet being re-added
    to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
    used.

    Prevent this race by ensuring that vnet_if_return() cannot run at the same time
    as if_detach() or epair_clone_destroy().

    PR:             238870, 234985, 244703, 250870
    Sponsored by:   Modirum MDPay
    Approved by:    so

 sys/net/if.c     | 147 +++++++++++++++++++++++++++++++++++++------------------
 sys/net/if_var.h |  24 ++-------
 2 files changed, 104 insertions(+), 67 deletions(-)
Comment 13 commit-hook freebsd_committer freebsd_triage 2021-01-29 01:21:17 UTC
A commit in branch releng/12.2 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e682b62c96e94c60d830e4414215032e0d4f8dad

commit e682b62c96e94c60d830e4414215032e0d4f8dad
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2020-09-12 16:33:05 +0000
Commit:     Ed Maste <emaste@FreeBSD.org>
CommitDate: 2021-01-29 01:14:24 +0000

    MFC r368237: if: Fix panic when destroying vnet and epair simultaneously

    When destroying a vnet and an epair (with one end in the vnet) we often
    panicked. This was the result of the destruction of the epair, which destroys
    both ends simultaneously, happening while vnet_if_return() was moving the
    struct ifnet to its home vnet. This can result in a freed ifnet being re-added
    to the home vnet V_ifnet list. That in turn panics the next time the ifnet is
    used.

    Prevent this race by ensuring that vnet_if_return() cannot run at the same time
    as if_detach() or epair_clone_destroy().

    PR:             238870, 234985, 244703, 250870
    Sponsored by:   Modirum MDPay
    Approved by:    so

 sys/net/if.c     | 147 +++++++++++++++++++++++++++++++++++++------------------
 sys/net/if_var.h |  24 ++-------
 2 files changed, 104 insertions(+), 67 deletions(-)