Bug 219251

Summary: [Panic] [VIMAGE] [pf] panic when creating/destroying multiple vnet jails
Product: Base System Reporter: Reshad Patuck <reshadpatuck1>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed FIXED    
Severity: Affects Only Me CC: andrei, bz, kp, zeon
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223959
Attachments:
Description Flags
core.txt file for the panic
none
Test script to cause panic none

Description Reshad Patuck 2017-05-13 07:50:13 UTC
Created attachment 182558 [details]
core.txt file for the panic

While trying a workaround for bug #219250 - [Panic] [VIMAGE] panic: in6_ifattach_linklocal (assigning ipv6 addresses in vnet jail)

I disabled ipv6 in the jails.

Now, while ruining the same tests to create and destroy 20 vnet jails with pf and netgraph I get this panic (pf_state_expires).

I have attached the core.txt for the panic to this post.

The issue seems to be during cleaning up existing pf states.

Pleas let me know if there is any other information you require.

Thanks

---

(kgdb) backtrace
#0  __curthread () at ./machine/pcpu.h:232
#1  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:318
#2  0xffffffff803a308b in db_dump (dummy=<optimized out>, dummy2=<error reading variable: access outside bounds of object referenced via synthetic pointer>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:546
#3  0xffffffff803a2e7f in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:453
#4  0xffffffff803a2bb4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:506
#5  0xffffffff803a5c7f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:248
#6  0xffffffff80a95673 in kdb_trap (type=9, code=0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#7  0xffffffff80ef15b2 in trap_fatal (frame=0xfffffe10433a17f0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:796
#8  0xffffffff80ef0bbd in trap (frame=0xfffffe10433a17f0) at /usr/src/sys/amd64/amd64/trap.c:197
#9  <signal handler called>
#10 counter_u64_fetch_inline (p=<optimized out>) at ./machine/counter.h:57
#11 counter_u64_fetch (c=0xdeadc0dedeadc0de) at /usr/src/sys/kern/subr_counter.c:55
#12 0xffffffff82666c8f in pf_state_expires (state=0xfffff80570246cb8) at /usr/src/sys/netpfil/pf/pf.c:1529
#13 0xffffffff82666618 in pf_purge_expired_states (i=5026, maxcheck=<optimized out>) at /usr/src/sys/netpfil/pf/pf.c:1689
#14 0xffffffff82666469 in pf_purge_thread (unused=<optimized out>) at /usr/src/sys/netpfil/pf/pf.c:1451
#15 0xffffffff80a147d4 in fork_exit (callout=0xffffffff82666320 <pf_purge_thread>, arg=0x0, frame=0xfffffe10433a19c0) at /usr/src/sys/kern/kern_fork.c:1038
#16 <signal handler called>

---
Comment 1 Reshad Patuck 2017-05-16 08:06:36 UTC
Created attachment 182621 [details]
Test script to cause panic

This seems to be an issue with running pf both on the host and guest and reloading pf on the host multiple times.

I have attached a script which recreates this crash.

The script starts 10 vnet jails with pf in them and then stops the jails.
It reloads pf in the host system after starting or stoping each jail.

The crash does not occur at a fixed number of iterations, but does generally happen within 10 iterations.
You may have to run the test script more than once or increase the iterations if it does not crash.

When you do not reload pf on the host by comment out the pfctl -f line in reload_host_pf the test runs successfully.
This is tested up to 99 iterations, and the test ran successfully multiple times.

zfs snapshot zroot/jails/12jail@base is a freebsd 12 current jail created using bsdinstall.
Comment 2 commit-hook freebsd_committer freebsd_triage 2017-07-09 17:56:48 UTC
A commit references this bug:

Author: kp
Date: Sun Jul  9 17:56:39 UTC 2017
New revision: 320848
URL: https://svnweb.freebsd.org/changeset/base/320848

Log:
  pf: Fix vnet purging

  pf_purge_thread() breaks up the work of iterating all states (in
  pf_purge_expired_states()) and tracks progress in the idx variable.

  If multiple vnets exist this results in pf_purge_thread() only calling
  pf_purge_expired_states() for part of the states (the first part of the
  first vnet, second part of the second vnet and so on).
  Combined with the mark-and-sweep approach to cleaning up old rules (in
  V_pf_unlinked_rules) that resulted in pf freeing rules that were still
  referenced by states. This in turn caused panics when pf_state_expires()
  encounters that state and attempts to access the rule.

  We need to track the progress per vnet, not globally, so idx is moved
  into a per-vnet V_pf_purge_idx.

  PR:		219251
  Sponsored by:	Hackathon Essen 2017

Changes:
  head/sys/netpfil/pf/pf.c
Comment 3 Reshad Patuck 2017-07-09 18:06:55 UTC
(In reply to commit-hook from comment #2)

Thanks,

Will test this patch in a few days and will let you know if it is fixed.
Comment 4 Reshad Patuck 2017-07-12 10:27:11 UTC
(In reply to commit-hook from comment #2)


Hey have been running this patch and it looks like everything is working.

Test methodology
- Run test on unpatched system build r319808 panics system.
- Update to r320850 and rerun test no panics.

At this point the updated system has been up for a few hours since I ran the test.

I will continue reloading pf and watch for panics.

Thanks,

Reshad
Comment 5 Andrei 2018-01-29 11:55:37 UTC
*** Bug 225528 has been marked as a duplicate of this bug. ***
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2019-02-13 02:17:53 UTC
*** Bug 223959 has been marked as a duplicate of this bug. ***