Bug 258601 - kernel panic/pagefault caused by pfctl -s info
Summary: kernel panic/pagefault caused by pfctl -s info
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-pf (Nobody)
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2021-09-19 11:05 UTC by rz-rpi03
Modified: 2021-09-30 12:22 UTC (History)
3 users (show)

See Also:


Attachments
Console log (6.45 KB, text/plain)
2021-09-19 11:05 UTC, rz-rpi03
no flags Details
Console log with show lock, locks, witness etc statements (78.75 KB, text/plain)
2021-09-21 20:01 UTC, rz-rpi03
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description rz-rpi03 2021-09-19 11:05:23 UTC
Created attachment 228009 [details]
Console log

Hello,

on a system with slightly over 200 configured VLANs "pfctl -s info"
causes a panic while going multiuser.

This was not the case with CURRENT n248803 but happens every time with
CURRENT main-n249314-71a1ae7cebd.

A console log of the panic, including some information from the debugger,
is in the attachment.

Regards
  Ralf
Comment 1 Kristof Provost freebsd_committer 2021-09-21 12:49:14 UTC
I've not been able to reproduce this on amd64 so far, and I don't have access to an aarch64 system.

Can you try getting a backtrace to pintpoint where it's panicking? The log you've attached does point at pf_getstatus(), which matches the trigger (pfctl -s info), but it's not clear to me where we could be triggering a page fault.
Comment 2 rz-rpi03 2021-09-21 18:24:50 UTC
(In reply to Kristof Provost from comment #1)
A copy of the OS installed on an equivalent machine is in my lab.
So I could try.

Do you have any hints how extract the needed information with the debugger?
Comment 3 rz-rpi03 2021-09-21 20:01:06 UTC
Created attachment 228105 [details]
Console log with show lock, locks, witness etc statements

I tried to add some hopefully helpful show statements.
Any hints how to debug it better are welcome.
Comment 4 Kristof Provost freebsd_committer 2021-09-21 20:41:14 UTC
(In reply to rz-rpi03 from comment #3)
Thanks for that already, but it turns out not to really have any clues at the moment.

It looks like the stack trace that db produces is not what I'd expect to see, and it doesn't really tell me where in pf_getstatus() things go wrong.

On amd64 I'd usually dump a core to swap (with the db> dump command) and analyse that along with the object files (/usr/obj, if I've built my own kernel) to get a backtrace with kgdb.
Comment 5 Andrew Turner freebsd_committer 2021-09-22 16:22:55 UTC
Can you run 'addr2line -r /usr/lib/debug/boot/kernel/kernel.debug ffff000000758dd4'. That will tell us which line the invalid memory access occurs.
Comment 6 rz-rpi03 2021-09-22 16:43:54 UTC
(In reply to Andrew Turner from comment #5)

Sure. It is copyinout(), so not very specific.

root@:~ # addr2line -e /usr/lib/debug/boot/kernel.n249314-defect/kernel.debug ffff000000758dd4
/usr/src/sys/arm64/arm64/copyinout.S:177 
root@:~ #
Comment 7 rz-rpi03 2021-09-22 16:45:48 UTC
(In reply to Kristof Provost from comment #4)
I moved the contents of the SD card to a bigger one and added a swap
partition to get a place to store the kernel dump.

Unfortunately the dump command triggers another panic.

db> dump
Dumping 123 out of 921 MB:panic: data pending interrupt pushed through SDHCI framework

So there is no kernel dump created.

I think there was a fix for this in the last few days.
If that is the case I can build a newer version of CURRENT and try that.
Comment 8 Kristof Provost freebsd_committer 2021-09-22 17:57:39 UTC
(In reply to rz-rpi03 from comment #7)
Ah, thanks. That's actually somewhat useful. copyinout() gets called (as copyout()) in only one place in the function, and I suspect I know what's going on as well. I shouldn't be running copyout() while holding the pf rules lock. Happily we've already built the nvlist at that point, and it's safe to release it earlier. It'll just take a minor bit of rearranging the code.

I think I can fix it tomorrow.
Comment 9 Kristof Provost freebsd_committer 2021-09-23 08:43:36 UTC
Can you give this patch a try? https://reviews.freebsd.org/D32076
Comment 10 rz-rpi03 2021-09-23 11:36:50 UTC
(In reply to Kristof Provost from comment #9)
I am working on it.
Can't try with an actual CURRENT because of a new issue with the sdhci driver.
Going back to not so current CURENT.
Comment 11 rz-rpi03 2021-09-23 14:42:23 UTC
(In reply to Kristof Provost from comment #9)
The patch is fine.
I double verified it. A unpatched kernel panics, a patched does not.

Thank you.

Ralf
Comment 12 commit-hook freebsd_committer 2021-09-23 19:58:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=cb13059663e455b3fc69c293dadec53c164490dc

commit cb13059663e455b3fc69c293dadec53c164490dc
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-09-23 08:39:49 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-09-23 19:56:59 +0000

    pf: fix pagefault in pf_getstatus()

    We can't copyout() while holding a lock, in case it triggers a page
    fault.
    Release the lock before copyout, which is safe because we've already
    copied all the data into the nvlist.

    PR:             258601
    Reviewed by:    mjg
    MFC after:      1 week
    Sponsored by:   Modirum MDPay
    Differential Revision:  https://reviews.freebsd.org/D32076

 sys/netpfil/pf/pf_ioctl.c | 3 +++
 1 file changed, 3 insertions(+)
Comment 13 commit-hook freebsd_committer 2021-09-30 12:22:06 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3c9253c2f6d1f076ef35a5538b207a5cc866480f

commit 3c9253c2f6d1f076ef35a5538b207a5cc866480f
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-09-23 08:39:49 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-09-30 11:51:32 +0000

    pf: fix pagefault in pf_getstatus()

    We can't copyout() while holding a lock, in case it triggers a page
    fault.
    Release the lock before copyout, which is safe because we've already
    copied all the data into the nvlist.

    PR:             258601
    Reviewed by:    mjg
    MFC after:      1 week
    Sponsored by:   Modirum MDPay
    Differential Revision:  https://reviews.freebsd.org/D32076

    (cherry picked from commit cb13059663e455b3fc69c293dadec53c164490dc)

 sys/netpfil/pf/pf_ioctl.c | 3 +++
 1 file changed, 3 insertions(+)
Comment 14 commit-hook freebsd_committer 2021-09-30 12:22:08 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9419d273e4718ee8c768865cd73a3b907f365d8d

commit 9419d273e4718ee8c768865cd73a3b907f365d8d
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2021-09-23 08:39:49 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2021-09-30 07:54:44 +0000

    pf: fix pagefault in pf_getstatus()

    We can't copyout() while holding a lock, in case it triggers a page
    fault.
    Release the lock before copyout, which is safe because we've already
    copied all the data into the nvlist.

    PR:             258601
    Reviewed by:    mjg
    MFC after:      1 week
    Sponsored by:   Modirum MDPay
    Differential Revision:  https://reviews.freebsd.org/D32076

    (cherry picked from commit cb13059663e455b3fc69c293dadec53c164490dc)

 sys/netpfil/pf/pf_ioctl.c | 3 +++
 1 file changed, 3 insertions(+)