Created attachment 182558 [details]
core.txt file for the panic
While trying a workaround for bug #219250 - [Panic] [VIMAGE] panic: in6_ifattach_linklocal (assigning ipv6 addresses in vnet jail)
I disabled ipv6 in the jails.
Now, while ruining the same tests to create and destroy 20 vnet jails with pf and netgraph I get this panic (pf_state_expires).
I have attached the core.txt for the panic to this post.
The issue seems to be during cleaning up existing pf states.
Pleas let me know if there is any other information you require.
#0 __curthread () at ./machine/pcpu.h:232
#1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:318
#2 0xffffffff803a308b in db_dump (dummy=<optimized out>, dummy2=<error reading variable: access outside bounds of object referenced via synthetic pointer>, dummy3=<unavailable>, dummy4=<unavailable>) at /usr/src/sys/ddb/db_command.c:546
#3 0xffffffff803a2e7f in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=<optimized out>) at /usr/src/sys/ddb/db_command.c:453
#4 0xffffffff803a2bb4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:506
#5 0xffffffff803a5c7f in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:248
#6 0xffffffff80a95673 in kdb_trap (type=9, code=0, tf=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#7 0xffffffff80ef15b2 in trap_fatal (frame=0xfffffe10433a17f0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:796
#8 0xffffffff80ef0bbd in trap (frame=0xfffffe10433a17f0) at /usr/src/sys/amd64/amd64/trap.c:197
#9 <signal handler called>
#10 counter_u64_fetch_inline (p=<optimized out>) at ./machine/counter.h:57
#11 counter_u64_fetch (c=0xdeadc0dedeadc0de) at /usr/src/sys/kern/subr_counter.c:55
#12 0xffffffff82666c8f in pf_state_expires (state=0xfffff80570246cb8) at /usr/src/sys/netpfil/pf/pf.c:1529
#13 0xffffffff82666618 in pf_purge_expired_states (i=5026, maxcheck=<optimized out>) at /usr/src/sys/netpfil/pf/pf.c:1689
#14 0xffffffff82666469 in pf_purge_thread (unused=<optimized out>) at /usr/src/sys/netpfil/pf/pf.c:1451
#15 0xffffffff80a147d4 in fork_exit (callout=0xffffffff82666320 <pf_purge_thread>, arg=0x0, frame=0xfffffe10433a19c0) at /usr/src/sys/kern/kern_fork.c:1038
#16 <signal handler called>
Created attachment 182621 [details]
Test script to cause panic
This seems to be an issue with running pf both on the host and guest and reloading pf on the host multiple times.
I have attached a script which recreates this crash.
The script starts 10 vnet jails with pf in them and then stops the jails.
It reloads pf in the host system after starting or stoping each jail.
The crash does not occur at a fixed number of iterations, but does generally happen within 10 iterations.
You may have to run the test script more than once or increase the iterations if it does not crash.
When you do not reload pf on the host by comment out the pfctl -f line in reload_host_pf the test runs successfully.
This is tested up to 99 iterations, and the test ran successfully multiple times.
zfs snapshot zroot/jails/12jail@base is a freebsd 12 current jail created using bsdinstall.
A commit references this bug:
Date: Sun Jul 9 17:56:39 UTC 2017
New revision: 320848
pf: Fix vnet purging
pf_purge_thread() breaks up the work of iterating all states (in
pf_purge_expired_states()) and tracks progress in the idx variable.
If multiple vnets exist this results in pf_purge_thread() only calling
pf_purge_expired_states() for part of the states (the first part of the
first vnet, second part of the second vnet and so on).
Combined with the mark-and-sweep approach to cleaning up old rules (in
V_pf_unlinked_rules) that resulted in pf freeing rules that were still
referenced by states. This in turn caused panics when pf_state_expires()
encounters that state and attempts to access the rule.
We need to track the progress per vnet, not globally, so idx is moved
into a per-vnet V_pf_purge_idx.
Sponsored by: Hackathon Essen 2017
(In reply to commit-hook from comment #2)
Will test this patch in a few days and will let you know if it is fixed.
(In reply to commit-hook from comment #2)
Hey have been running this patch and it looks like everything is working.
- Run test on unpatched system build r319808 panics system.
- Update to r320850 and rerun test no panics.
At this point the updated system has been up for a few hours since I ran the test.
I will continue reloading pf and watch for panics.