Bug 238870 - sys.netpfil.pf.names.names and sys.netpfil.pf.synproxy.synproxy cause panic
Summary: sys.netpfil.pf.names.names and sys.netpfil.pf.synproxy.synproxy cause panic
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: tests (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-testing mailing list
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2019-06-28 20:11 UTC by Li-Wen Hsu
Modified: 2019-07-09 09:10 UTC (History)
3 users (show)

See Also:


Attachments
epair.patch (637 bytes, patch)
2019-06-28 22:23 UTC, Kristof Provost
no flags Details | Diff
ifp_dying.patch (413 bytes, patch)
2019-07-05 12:11 UTC, Kristof Provost
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Li-Wen Hsu freebsd_committer 2019-06-28 20:11:29 UTC
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/11696/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/11697/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/11698/
https://ci.freebsd.org/job/FreeBSD-head-amd64-test/11699/

These test runs panic while executing sys.netpfil.pf.names.names or sys.netpfil.pf.synproxy.synproxy, with message:

00:53:22.137 sys/netpfil/pf/names:names  ->  panic: epair_qflush: ifp=0xfffff80063f15800, epair_softc gone? sc=0

`cd /usr/tests/sys/netpfil/pf && kyua test names:names` can trigger this.

Note: this should not be related to r349508 (revision of build 11696), it can be reproduced on earlier revision, e.g.: r349507.
Comment 1 Kristof Provost freebsd_committer 2019-06-28 20:28:15 UTC
The backtrace suggests this is an epair teardown problem:

panic: epair_qflush: ifp=0xfffff80063f15800, epair_softc gone? sc=0

cpuid = 0
time = 1561746423
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0024260750
vpanic() at vpanic+0x19d/frame 0xfffffe00242607a0
panic() at panic+0x43/frame 0xfffffe0024260800
epair_qflush() at epair_qflush+0x1ba/frame 0xfffffe0024260850
if_down() at if_down+0x11d/frame 0xfffffe0024260880
if_detach_internal() at if_detach_internal+0x704/frame 0xfffffe0024260900
if_vmove() at if_vmove+0x3c/frame 0xfffffe0024260950
vnet_if_return() at vnet_if_return+0x48/frame 0xfffffe0024260970
vnet_destroy() at vnet_destroy+0x124/frame 0xfffffe00242609a0
prison_deref() at prison_deref+0x29d/frame 0xfffffe00242609e0
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe0024260a40
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe0024260a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0024260ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0024260ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

The pf tests create and destroy many epair interfaces as they start up and stop vnet jails.
Comment 2 Kristof Provost freebsd_committer 2019-06-28 22:23:40 UTC
Created attachment 205406 [details]
epair.patch

This seems to fix the panic, and I *think* it might even be correct. I'd quite like bz@ to take a look though.
Comment 3 commit-hook freebsd_committer 2019-06-29 12:20:06 UTC
A commit references this bug:

Author: lwhsu
Date: Sat Jun 29 12:19:57 UTC 2019
New revision: 349539
URL: https://svnweb.freebsd.org/changeset/base/349539

Log:
  Skip sys.netpfil.pf.names.names and sys.netpfil.pf.synproxy.synproxy
  temporarily because kernel panics when flushing epair queue.

  PR:		238870
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/netpfil/pf/names.sh
  head/tests/sys/netpfil/pf/synproxy.sh
Comment 4 Kristof Provost freebsd_committer 2019-06-29 12:22:10 UTC
(In reply to commit-hook from comment #3)
I'm pretty sure the other pf tests (and the netipsec) tests risk triggering this too.
Comment 5 Li-Wen Hsu freebsd_committer 2019-06-29 12:25:24 UTC
I also believe so.  The other questions are why this only started happening recently and sys.netpfil.pf.names.names triggers panic mostly.
Comment 6 Kristof Provost freebsd_committer 2019-06-29 12:27:33 UTC
(In reply to Li-Wen Hsu from comment #5)
The names test does trigger very unusual behaviour, in that it provokes a situation where we have two struct ifnets with the same name. That might be a factor.

I think it's also racy, which might mean that any random change in the network stack suddenly makes this more likely to occur.
Comment 7 Kristof Provost freebsd_committer 2019-07-05 12:11:11 UTC
Created attachment 205530 [details]
ifp_dying.patch

A second experimental patch. This solves a different problem, that also manifests during the pf tests.

It looks like there's a race, where the epair deletes its ifp while the jail shuts down. The jail shutdown moves the ifp back into its original vnet, while the epair delete is causing it to be deleted. As a result we end up with a freed ifp in V_ifnet, and when we run into it later the box panics.

I think this is likely fallout from the epoch-ification, so mmacy@ should take a look at this.
Comment 8 Kristof Provost freebsd_committer 2019-07-05 12:13:04 UTC
To reproduce the issue that ifp_dying.patch works around (I don't think it's a fully correct fix), apply epair.patch, revert r349539, kldload pfsync, cd /usr/tests/sys/netpfil/pf and do while true do sudo kyua test done. It shouldn't take more than 10 minutes to panic the box.
Comment 9 Li-Wen Hsu freebsd_committer 2019-07-06 00:53:57 UTC
This stars happening in 12 now so the fix also needs being MFC'd
Comment 11 commit-hook freebsd_committer 2019-07-09 09:10:27 UTC
A commit references this bug:

Author: lwhsu
Date: Tue Jul  9 09:09:52 UTC 2019
New revision: 349858
URL: https://svnweb.freebsd.org/changeset/base/349858

Log:
  MFC r349539

  Skip sys.netpfil.pf.names.names and sys.netpfil.pf.synproxy.synproxy
  temporarily because kernel panics when flushing epair queue.

  PR:		238870
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/12/
  stable/12/tests/sys/netpfil/pf/names.sh
  stable/12/tests/sys/netpfil/pf/synproxy.sh