Bug 258196

Summary: Kernel panic on pf_free_state with HardenedBSD
Product: Base System Reporter: Théo Bertin <theo.bertin>
Component: kernAssignee: freebsd-pf (Nobody) <pf>
Status: Closed Overcome By Events    
Severity: Affects Some People CC: dewayne, kp
Priority: --- Keywords: crash
Version: 12.2-STABLE   
Hardware: amd64   
OS: Any   

Description Théo Bertin 2021-09-01 14:50:10 UTC
Hi everyone,

We're currently experiencing kernel panics on HardenedBSD machines following 12.2-stable :
uname -a :
FreeBSD [redacted] 12.2-STABLE-HBSD FreeBSD 12.2-STABLE-HBSD #0 : Tue Aug 10 20:14:33 UTC 2021     ro...@ci-12.md.hardenedbsd.lan:/usr/obj/usr/src/amd64.amd64/sys/HARDENEDBSD  amd64

Since an update of kernel around July, some of our machines began rebooting at random times following a kernel panic :
/var/crash/info.0 :
Dump header from device: /dev/da1
 Architecture: amd64
 Architecture Version: 2
 Dump Length: 626298880
 Blocksize: 512
 Compression: none
 Dumptime: Tue Aug 31 14:22:24 2021
 Hostname: [redacted]
 Magic: FreeBSD Kernel Dump
 Version String: FreeBSD 12.2-STABLE-HBSD #0 : Tue Aug 10 20:14:33 UTC 2021
   ro...@ci-12.md.hardenedbsd.lan:/usr/obj/usr/src/amd64.amd64/sys/HARDENEDBSD
 Panic String: pf_free_state: timeout 0
 Dump Parity: 436929362
 Bounds: 0
 Dump Status: good

/var/log/messages :
Sep  1 05:23:08 [redacted] syslogd: kernel boot file is /boot/kernel/kernel
Sep  1 05:23:08 [redacted] kernel: [4048] panic: pf_free_state: timeout 0
Sep  1 05:23:08 [redacted] kernel: [4048] cpuid = 2
Sep  1 05:23:08 [redacted] kernel: [4048] time = 1630473734
Sep  1 05:23:08 [redacted] kernel: [4048] __HardenedBSD_version = 1200060 __FreeBSD_version = 1202508
Sep  1 05:23:08 [redacted] kernel: [4048] version = FreeBSD 12.2-STABLE-HBSD #0 : Tue Aug 10 20:14:33 UTC 2021
Sep  1 05:23:08 [redacted] kernel: [4048]     ro...@ci-12.md.hardenedbsd.lan:/usr/obj/usr/src/amd64.amd64/sys/HARDENEDBSD
Sep  1 05:23:08 [redacted] kernel: [4048] KDB: stack backtrace:
Sep  1 05:23:08 [redacted] kernel: [4048] #0 0xffffffff80b9f12b at kdb_backtrace+0x6b
Sep  1 05:23:08 [redacted] kernel: [4048] #1 0xffffffff80b581e0 at vpanic+0x180
Sep  1 05:23:08 [redacted] kernel: [4048] #2 0xffffffff80b57fe3 at panic+0x43
Sep  1 05:23:08 [redacted] kernel: [4048] #3 0xffffffff82923622 at pf_free_state+0xb2
Sep  1 05:23:08 [redacted] kernel: [4048] #4 0xffffffff8292cdde at pf_test_rule+0x312e
Sep  1 05:23:08 [redacted] kernel: [4048] #5 0xffffffff82930282 at pf_test6+0x772
Sep  1 05:23:08 [redacted] kernel: [4048] #6 0xffffffff8293abe9 at pf_check6_out+0x59
Sep  1 05:23:08 [redacted] kernel: [4048] #7 0xffffffff80c7095a at pfil_run_hooks+0xaa
Sep  1 05:23:08 [redacted] kernel: [4048] #8 0xffffffff80da192f at ip6_output+0x15af
Sep  1 05:23:08 [redacted] kernel: [4048] #9 0xffffffff80d67317 at tcp_output+0x1d37
Sep  1 05:23:08 [redacted] kernel: [4048] #10 0xffffffff80d7acb0 at tcp6_usr_connect+0x2f0
Sep  1 05:23:08 [redacted] kernel: [4048] #11 0xffffffff80be75fc at soconnectat+0xdc
Sep  1 05:23:08 [redacted] kernel: [4048] #12 0xffffffff80beea8e at kern_connectat+0xfe
Sep  1 05:23:08 [redacted] kernel: [4048] #13 0xffffffff80bee965 at sys_connect+0x75
Sep  1 05:23:08 [redacted] kernel: [4048] #14 0xffffffff81023556 at amd64_syscall+0x2b6
Sep  1 05:23:08 [redacted] kernel: [4048] #15 0xffffffff80ffa99e at fast_syscall_common+0xf8
Sep  1 05:23:08 [redacted] kernel: [4048] Uptime: 1h7m28s
Sep  1 05:23:08 [redacted] kernel: [4048] Dumping 580 out of 4056 MB:..3%..12%..23%..31%..42%..53%..61%..72%..83%..91%
Sep  1 05:23:08 [redacted] kernel: [4048] Dump complete
Sep  1 05:23:08 [redacted] kernel: [4048] Automatic reboot in 15 seconds - press a key on the console to abort

A bug has already been filled over at HardenedBSD, but the problem seems inherent to PF's source code :
https://groups.google.com/a/hardenedbsd.org/g/users/c/rX7-zJQWnu0

It should be interesting to note that all the impacted machines use an IPv6 stack
Comment 1 Théo Bertin 2021-09-02 09:15:27 UTC
Updating the Importance, as this problem should impact most people using latest PF changes on IPv6 stacks
Comment 2 Kristof Provost freebsd_committer freebsd_triage 2021-09-02 10:31:25 UTC
Can this problem be reproduced on FreeBSD? How is is reproduced on HardenedBSD?

I'm afraid this is not the right place to report HardenedBSD issues.
Comment 3 Théo Bertin 2021-09-22 13:23:17 UTC
I couldn't find a way to consistently reproduce the problem, either on HardenedBSD or FreeBSD.
However, investigating a bit further the trace and commits related to parts of the concerned code, the problem is certainly to be between commits bc6cf5a56 and 2f6dd4a29, and likely between 5372a43bf and 2f6dd4a29.

Our impacted systems are on (HardenedBSD) commits 4fc0cb929 (FreeBSD commit eed85dd1a) and 1969e37a9 (FreeBSD commit 398bfe63e) which include pf code changes.

However, some later changes are not applied and reported to a HardenedBSD patch, like this particular commit : a37c697b8da9, which might be a possible resolution to our problem.

Do you have any insight onto that ?
Comment 4 Théo Bertin 2021-10-14 08:30:16 UTC
The latest HardenedBSD patch including newest FreeBSD code seems to have resolved the problem for us