Created attachment 184470 [details] freebsd 11.1-rc3 (generic) with zfs boostrap loader, kernel panic. pf purge. how to reproduce? 1) get a FreeBSD 11.1-RC3/amd64 install ISO; 2) install it following its defaults; use "Auto ZFS" for disk partitions; 3) boot into your new FreeBSD box; 4) as root, run "sysrc -f /boot/loader.conf pf_load=YES"; 5) reboot. it might be the case that previous revisions are also affected. first noticed this behavior on r320730. kernel panic reproduced in a virtual environment, and on bare metal. *) changing "boot options" to enable verbose (option number 6) is a workaround, and the machine boots without panic. I'm building a DEBUG kernel to attach more details and info ASAP; if one can build and reproduce it too add more data, feel free.
I've installed the official 11.1-RC3 (https://download.freebsd.org/ftp/releases/amd64/amd64/ISO-IMAGES/11.1/FreeBSD-11.1-RC3-amd64-disc1.iso) into a Virtualbox machine (64bits, 2GB RAM, 16GB disk) and followed your instruction but didn't reach to reproduce your problem (I've rebooted 5 times). Unrelated: Why did you load pf in /boot/loader.conf stage ? Because RC script/etc/rc.d/pf already load automatically pf.
Same here, unable to reproduce with bhyve (amd64, Root on ZFS, 2 GB memory).
kristof, olivier, thank you for writing back (: I was also unable to reproduce my steps running a 11.1-RC3 guest here on my workstation with bhyve, qemu or even virtualbox. my host OS is a FreeBSD 12.0-CURRENT/amd64 (r320640, 1200037). the "main" hypervisor I did use that presented me with a kernel panic is a linux-based OS with kvm; proxmox 5.0-23/af4267bf (debian 6.3.0-18, kernel 4.10.15-1-pve). I did build a DEBUG kernel and just tested it; 11.1-PRERELEASE (r320976+) worked just fine. the sad thing is that I completely forgot to remove pf from the KERNCONF after adding the DEBUG options. so, I'm rebuilding it again and will test it again - following the very same procedures I reported before. olivier, the reason I am using loader.conf is: when one uses the rc.conf the rc.d/pf script doesn't load the module if there's no pf.conf available, or it loads the module quite "late". so, I was just being sure that pf.ko would be loaded no matter what.
Hi, I can reproduce this. This is a divide by zero in sys/netpfil/pf/pf.c pf_purge_thread(). https://svnweb.freebsd.org/base/stable/11/sys/netpfil/pf/pf.c?annotate=316640#l1446 The V_pf_default_rule.timeout array hasn't been initialized yet. This happens presumably because pf_load() is getting called before pfattach_vnet() (i.e. pf_load_vnet()). Anyone know what determines the order, or how to enforce this type of "dependency"? It smells like a race condition, which could be why not everyone can reproduce it, but on my hardware it's 100%. I also suspect this has to do with EARLY_AP_STARTUP, because I have yet to see the same panic with EARLY_AP_STARTUP disabled. Still testing. BTW, my kernel config is simple: include GENERIC device carp device pf #nooptions EARLY_AP_STARTUP
Created attachment 184503 [details] [GIF] freebsd 11.1-rc3 (generic) with zfs boostrap loader, kernel panic. pf purge.
(In reply to Paul Herman from comment #4) I'm not sure how that would happen, but loos@ committed something that'll likely fix it to head a while back: ------------------------------------------------------------------------ r312943 | loos | 2017-01-29 03:17:52 +0100 (Sun, 29 Jan 2017) | 8 lines Do not run the pf purge thread while the VNET variables are not initialized, this can cause a divide by zero (if the VNET initialization takes to long to complete). Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate) We probably want to MFC that one. Can you confirm that fixes the problem for you?
Nice catch. Yep, applied r312943 to pf.c, panic gone.
I also confirm that the panic is gone. Thanks! The machine I patched has the very same config and it's running as guest OS in the very same hypervisor as the one crashing before. BTW, I used the 'src.txz' shipped with 11.1-RC3 (r320976) to apply the patch (https://svnweb.freebsd.org/base/head/sys/netpfil/pf/pf.c?view=patch&r1=312943&r2=312942&pathrev=312943), compile the kernel and install the patched version of it and its modules. # date Thu Jul 20 08:41:35 UTC 2017 # uname -a FreeBSD freebsd11 11.1-RC3 FreeBSD 11.1-RC3 #0: Thu Jul 20 08:39:04 UTC 2017 root@freebsd11:/usr/obj/usr/src/sys/GENERIC amd64 # diff /boot/kernel.generic/pf.ko /boot/kernel/pf.ko Files /boot/kernel.generic/pf.ko and /boot/kernel/pf.ko differ # grep pf /boot/loader.conf /etc/rc.conf /boot/loader.conf:pf_load="YES" # uptime 8:41AM up 1 min, 1 users, load averages: 0.15, 0.08, 0.03
A commit references this bug: Author: kp Date: Thu Jul 20 17:15:19 UTC 2017 New revision: 321296 URL: https://svnweb.freebsd.org/changeset/base/321296 Log: MFC r312943 Do not run the pf purge thread while the VNET variables are not initialized, this can cause a divide by zero (if the VNET initialization takes to long to complete). PR: 220830 Changes: _U stable/11/ stable/11/sys/netpfil/pf/pf.c
Adding re@. The fix is now in stable/11 -- should we do an errata for this one?
(In reply to Xin LI from comment #10) It's on my todo list. Real soon now(tm)!
For posterity, seconding Paul's assessment that this started tripping with the introduction of EARLY_AP_STARTUP. On this system I hit it (during testing) 100% of the time with EAP, and never without. https://lists.freebsd.org/pipermail/freebsd-stable/2017-June/087245.html
The errata notice has been sent so this can be closed.
I faced this same problem with FreeBSD 11.1 Release and the fix here to sys/netpfil/pf/pf.c also stopped the panics on boot. Should it not also be added to Release 11.1 too? https://www.freebsd.org/releases/11.1R/errata.html
(In reply to Kris from comment #14) There was an errata notice for this problem: https://www.freebsd.org/security/advisories/FreeBSD-EN-17:08.pf.asc I'm not quite sure why it's not listed in the 11.1 release errata, other than perhaps that it was not know/fixed at that point.