Bug 244703 - sys.netpfil.pf.nat.exhaust panics kernel
Summary: sys.netpfil.pf.nat.exhaust panics kernel
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: tests (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-testing (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-09 22:48 UTC by Li-Wen Hsu
Modified: 2020-09-12 18:59 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Li-Wen Hsu freebsd_committer 2020-03-09 22:48:42 UTC
FreeBSD-head-amd64-test job starts randomly failing after:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14511/

and following builds:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14512/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14515/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14516/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14517/
...
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/14528/

Console log:

sys/netpfil/pf/nat:exhaust  ->  panic: epair_qflush: ifp=0xfffff800ae6d5800, epair_softc gone? sc=0

cpuid = 0
time = 1583444839
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe002c6fe880
vpanic() at vpanic+0x185/frame 0xfffffe002c6fe8e0
panic() at panic+0x43/frame 0xfffffe002c6fe940
epair_qflush() at epair_qflush+0x1a8/frame 0xfffffe002c6fe990
if_down() at if_down+0x12d/frame 0xfffffe002c6fe9c0
if_detach_internal() at if_detach_internal+0x2de/frame 0xfffffe002c6fea20
if_vmove() at if_vmove+0x3c/frame 0xfffffe002c6fea70
vnet_if_return() at vnet_if_return+0x50/frame 0xfffffe002c6fea90
vnet_destroy() at vnet_destroy+0x130/frame 0xfffffe002c6feac0
prison_deref() at prison_deref+0x29d/frame 0xfffffe002c6feb00
taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe002c6feb80
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe002c6febb0
fork_exit() at fork_exit+0x80/frame 0xfffffe002c6febf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe002c6febf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100011 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10874b6(%rip)
db:0:kdb.enter.panic> show pcpu
cpuid        = 0
dynamic pcpu = 0x788d80
curthread    = 0xfffffe0008688c00: pid 0 tid 100011 critnest 1 "thread taskq"
curpcb       = 0xfffffe0008689110
fpcurthread  = none
idlethread   = 0xfffffe000a1f6300: tid 100003 "idle: cpu0"
self         = 0xffffffff82210000
curpmap      = 0xffffffff81d9ea50
tssp         = 0xffffffff82210384
rsp0         = 0xfffffe002c6fecc0
kcr3         = 0x8000000002119002
ucr3         = 0xffffffffffffffff
scr3         = 0x1012c8f99
gs32p        = 0xffffffff82210404
ldt          = 0xffffffff82210444
tss          = 0xffffffff82210434
tlb gen      = 307136
curvnet      = 0xfffff80003438380
spin locks held:
db:0:kdb.enter.panic>

It can be reproduced by running `kyua test sys.netpfil.pf.nat.exhaust` in a loop in the VM:

https://artifact.ci.freebsd.org/snapshot/head/r358683/amd64/amd64/disk-test.img.xz
Comment 1 commit-hook freebsd_committer 2020-03-10 19:19:07 UTC
A commit references this bug:

Author: lwhsu
Date: Tue Mar 10 19:18:25 UTC 2020
New revision: 358852
URL: https://svnweb.freebsd.org/changeset/base/358852

Log:
  Skip sys.netpfil.pf.nat.exhaust on amd64 in CI as it sometimes panics kernel

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/netpfil/pf/nat.sh
Comment 2 Kristof Provost freebsd_committer 2020-03-11 00:26:04 UTC
This looks like it's the same issue as #238870
Comment 3 commit-hook freebsd_committer 2020-03-12 19:11:36 UTC
A commit references this bug:

Author: lwhsu
Date: Thu Mar 12 19:10:54 UTC 2020
New revision: 358918
URL: https://svnweb.freebsd.org/changeset/base/358918

Log:
  MFC r358852:

  Skip sys.netpfil.pf.nat.exhaust on amd64 in CI as it sometimes panics kernel

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/12/
  stable/12/tests/sys/netpfil/pf/nat.sh
Comment 4 commit-hook freebsd_committer 2020-03-13 16:45:05 UTC
A commit references this bug:

Author: lwhsu
Date: Fri Mar 13 16:44:48 UTC 2020
New revision: 358961
URL: https://svnweb.freebsd.org/changeset/base/358961

Log:
  Skip sys.netpfil.pf.nat.exhaust on all platforms as it not only fails on amd64

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/netpfil/pf/nat.sh
Comment 5 commit-hook freebsd_committer 2020-03-13 17:11:08 UTC
A commit references this bug:

Author: lwhsu
Date: Fri Mar 13 17:10:53 UTC 2020
New revision: 358964
URL: https://svnweb.freebsd.org/changeset/base/358964

Log:
  MFC r358961:

  Skip sys.netpfil.pf.nat.exhaust on all platforms as it not only fails on amd64

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/12/
  stable/12/tests/sys/netpfil/pf/nat.sh
Comment 6 commit-hook freebsd_committer 2020-04-20 14:19:21 UTC
A commit references this bug:

Author: lwhsu
Date: Mon Apr 20 14:18:56 UTC 2020
New revision: 360120
URL: https://svnweb.freebsd.org/changeset/base/360120

Log:
  Temporarily disable sys.netinet.divert.* on i386

  PR:		244703
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/netinet/divert.sh
Comment 7 commit-hook freebsd_committer 2020-09-08 14:54:31 UTC
A commit references this bug:

Author: kp
Date: Tue Sep  8 14:54:11 UTC 2020
New revision: 365457
URL: https://svnweb.freebsd.org/changeset/base/365457

Log:
  net: mitigate vnet / epair cleanup races

  There's a race where dying vnets move their interfaces back to their original
  vnet, and if_epair cleanup (where deleting one interface also deletes the other
  end of the epair). This is commonly triggered by the pf tests, but also by
  cleanup of vnet jails.

  As we've not yet been able to fix the root cause of the issue work around the
  panic by not dereferencing a NULL softc in epair_qflush() and by not
  re-attaching DYING interfaces.

  This isn't a full fix, but makes a very common panic far less likely.

  PR:		244703, 238870
  Reviewed by:	lutz_donnerhacke.de
  MFC after:	4 days
  Differential Revision:	https://reviews.freebsd.org/D26324

Changes:
  head/sys/net/if.c
  head/sys/net/if_epair.c
Comment 8 commit-hook freebsd_committer 2020-09-12 12:46:20 UTC
A commit references this bug:

Author: kp
Date: Sat Sep 12 12:45:32 UTC 2020
New revision: 365659
URL: https://svnweb.freebsd.org/changeset/base/365659

Log:
  MFC r365457:

  net: mitigate vnet / epair cleanup races

  There's a race where dying vnets move their interfaces back to their original
  vnet, and if_epair cleanup (where deleting one interface also deletes the other
  end of the epair). This is commonly triggered by the pf tests, but also by
  cleanup of vnet jails.

  As we've not yet been able to fix the root cause of the issue work around the
  panic by not dereferencing a NULL softc in epair_qflush() and by not
  re-attaching DYING interfaces.

  This isn't a full fix, but makes a very common panic far less likely.

  PR:		244703, 238870

Changes:
_U  stable/12/
  stable/12/sys/net/if.c
  stable/12/sys/net/if_epair.c
Comment 9 commit-hook freebsd_committer 2020-09-12 18:59:05 UTC
A commit references this bug:

Author: kp
Date: Sat Sep 12 18:58:36 UTC 2020
New revision: 365669
URL: https://svnweb.freebsd.org/changeset/base/365669

Log:
  MFC r365457:

  net: mitigate vnet / epair cleanup races

  There's a race where dying vnets move their interfaces back to their original
  vnet, and if_epair cleanup (where deleting one interface also deletes the other
  end of the epair). This is commonly triggered by the pf tests, but also by
  cleanup of vnet jails.

  As we've not yet been able to fix the root cause of the issue work around the
  panic by not dereferencing a NULL softc in epair_qflush() and by not
  re-attaching DYING interfaces.

  This isn't a full fix, but makes a very common panic far less likely.

  PR:		244703, 238870
  Approved by:	re (gjb)

Changes:
_U  releng/12.2/
  releng/12.2/sys/net/if.c
  releng/12.2/sys/net/if_epair.c