Created attachment 228009 [details] Console log Hello, on a system with slightly over 200 configured VLANs "pfctl -s info" causes a panic while going multiuser. This was not the case with CURRENT n248803 but happens every time with CURRENT main-n249314-71a1ae7cebd. A console log of the panic, including some information from the debugger, is in the attachment. Regards Ralf
I've not been able to reproduce this on amd64 so far, and I don't have access to an aarch64 system. Can you try getting a backtrace to pintpoint where it's panicking? The log you've attached does point at pf_getstatus(), which matches the trigger (pfctl -s info), but it's not clear to me where we could be triggering a page fault.
(In reply to Kristof Provost from comment #1) A copy of the OS installed on an equivalent machine is in my lab. So I could try. Do you have any hints how extract the needed information with the debugger?
Created attachment 228105 [details] Console log with show lock, locks, witness etc statements I tried to add some hopefully helpful show statements. Any hints how to debug it better are welcome.
(In reply to rz-rpi03 from comment #3) Thanks for that already, but it turns out not to really have any clues at the moment. It looks like the stack trace that db produces is not what I'd expect to see, and it doesn't really tell me where in pf_getstatus() things go wrong. On amd64 I'd usually dump a core to swap (with the db> dump command) and analyse that along with the object files (/usr/obj, if I've built my own kernel) to get a backtrace with kgdb.
Can you run 'addr2line -r /usr/lib/debug/boot/kernel/kernel.debug ffff000000758dd4'. That will tell us which line the invalid memory access occurs.
(In reply to Andrew Turner from comment #5) Sure. It is copyinout(), so not very specific. root@:~ # addr2line -e /usr/lib/debug/boot/kernel.n249314-defect/kernel.debug ffff000000758dd4 /usr/src/sys/arm64/arm64/copyinout.S:177 root@:~ #
(In reply to Kristof Provost from comment #4) I moved the contents of the SD card to a bigger one and added a swap partition to get a place to store the kernel dump. Unfortunately the dump command triggers another panic. db> dump Dumping 123 out of 921 MB:panic: data pending interrupt pushed through SDHCI framework So there is no kernel dump created. I think there was a fix for this in the last few days. If that is the case I can build a newer version of CURRENT and try that.
(In reply to rz-rpi03 from comment #7) Ah, thanks. That's actually somewhat useful. copyinout() gets called (as copyout()) in only one place in the function, and I suspect I know what's going on as well. I shouldn't be running copyout() while holding the pf rules lock. Happily we've already built the nvlist at that point, and it's safe to release it earlier. It'll just take a minor bit of rearranging the code. I think I can fix it tomorrow.
Can you give this patch a try? https://reviews.freebsd.org/D32076
(In reply to Kristof Provost from comment #9) I am working on it. Can't try with an actual CURRENT because of a new issue with the sdhci driver. Going back to not so current CURENT.
(In reply to Kristof Provost from comment #9) The patch is fine. I double verified it. A unpatched kernel panics, a patched does not. Thank you. Ralf
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=cb13059663e455b3fc69c293dadec53c164490dc commit cb13059663e455b3fc69c293dadec53c164490dc Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-09-23 08:39:49 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-09-23 19:56:59 +0000 pf: fix pagefault in pf_getstatus() We can't copyout() while holding a lock, in case it triggers a page fault. Release the lock before copyout, which is safe because we've already copied all the data into the nvlist. PR: 258601 Reviewed by: mjg MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D32076 sys/netpfil/pf/pf_ioctl.c | 3 +++ 1 file changed, 3 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3c9253c2f6d1f076ef35a5538b207a5cc866480f commit 3c9253c2f6d1f076ef35a5538b207a5cc866480f Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-09-23 08:39:49 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-09-30 11:51:32 +0000 pf: fix pagefault in pf_getstatus() We can't copyout() while holding a lock, in case it triggers a page fault. Release the lock before copyout, which is safe because we've already copied all the data into the nvlist. PR: 258601 Reviewed by: mjg MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D32076 (cherry picked from commit cb13059663e455b3fc69c293dadec53c164490dc) sys/netpfil/pf/pf_ioctl.c | 3 +++ 1 file changed, 3 insertions(+)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9419d273e4718ee8c768865cd73a3b907f365d8d commit 9419d273e4718ee8c768865cd73a3b907f365d8d Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-09-23 08:39:49 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2021-09-30 07:54:44 +0000 pf: fix pagefault in pf_getstatus() We can't copyout() while holding a lock, in case it triggers a page fault. Release the lock before copyout, which is safe because we've already copied all the data into the nvlist. PR: 258601 Reviewed by: mjg MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D32076 (cherry picked from commit cb13059663e455b3fc69c293dadec53c164490dc) sys/netpfil/pf/pf_ioctl.c | 3 +++ 1 file changed, 3 insertions(+)