Running `cd /usr/tests/ && kyua test` on RISC-V built today (commit 65618fdda0f272a823e6701966421bdca0efa301) results in the following panic: sys/netgraph/ng_macfilter_test:main -> WARNING: attempt to domain_add(netgraph) after domainfinalize() Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex ng_node (ng_node) r = 0 (0xffffffd002139c70) locked @ /local/scratch/alr48/cheri/freebsd/sys/netgraph/ng_base.c:2325 stack backtrace: #0 0xffffffc00031de5a at witness_checkorder+0xe78 #1 0xffffffc00031ef58 at witness_warn+0x3f4 #2 0xffffffc00054619a at do_trap_supervisor+0x3a8 #3 0xffffffc000545e56 at do_trap_supervisor+0x64 #4 0xffffffc000536718 at cpu_exception_handler_supervisor+0x68 t[0] == 0xffffffd07fd97280 t[1] == 0xffffffc0a5552b6c t[2] == 0xffffffc0a4d6b1d0 t[3] == 0xffffffc0002999de t[4] == 0x0000000000000000 t[5] == 0xffffffc0a60c52b0 t[6] == 0x0000000000000001 s[0] == 0x0000000000000000 s[1] == 0xffffffd002139c88 s[2] == 0xffffffd002139c00 s[3] == 0x0000000000001000 s[4] == 0xffffffd002139c68 s[5] == 0x00000000ffffffff s[6] == 0xffffffd002139c00 s[7] == 0x0000000000000001 s[8] == 0x0000000000004f68 s[9] == 0xffffffd0268a03b0 s[10] == 0xffffffd001448cf0 s[11] == 0x0000000000000000 a[0] == 0x0000000000000002 a[1] == 0x0000000000000000 a[2] == 0x0000000000000000 a[3] == 0x0000000000100000 a[4] == 0x0000000000000000 a[5] == 0xffffffc0007d0f68 a[6] == 0xffffffd07fd8d380 a[7] == 0x0000000000000027 ra == 0xffffffc0a554df30 sp == 0xffffffc0a60c5870 gp == 0x0000000000000000 tp == 0xffffffd026ddf980 sepc == 0xffffffc0a554df3e sstatus == 0x8000000200006120 panic: Fatal page fault at 0xffffffc0a554df3e: 0000000000000000 cpuid = 0 time = 1612390054 KDB: stack backtrace: db_trace_self() at db_trace_self db_fetch_ksymtab() at db_fetch_ksymtab+0x15c kdb_backtrace() at kdb_backtrace+0x2c vpanic() at vpanic+0x148 panic() at panic+0x26 do_trap_supervisor() at do_trap_supervisor+0x500 do_trap_supervisor() at do_trap_supervisor+0x64 cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x68 --- exception 13, tval = 0 KDB: enter: panic [ thread pid 90029 tid 103721 ] Stopped at kdb_enter+0x44: sd zero,0(a0) db>
Seems to be reproducible. I just rebuilt the image and ran just that test: ``` To change this login announcement, see motd(5). root@freebsd-riscv64:~ # cd /usr/tests/ root@freebsd-riscv64:/usr/tests # kyua test sys/netgraph/ sys/netgraph/ng_macfilter_test:main -> WARNING: attempt to domain_add(netgraph) after domainfinalize() Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex ng_node (ng_node) r = 0 (0xffffffd002139a70) locked @ /local/scratch/alr48/cheri/freebsd/sys/netgraph/ng_base.c:2325 stack backtrace: #0 0xffffffc00031de56 at witness_checkorder+0xe78 #1 0xffffffc00031ef54 at witness_warn+0x3f4 #2 0xffffffc00054619a at do_trap_supervisor+0x3a8 #3 0xffffffc000545e56 at do_trap_supervisor+0x64 #4 0xffffffc000536718 at cpu_exception_handler_supervisor+0x68 t[0] == 0xffffffd07fd97280 t[1] == 0xffffffc0a4e0fb6c t[2] == 0xffffffc098c77750 t[3] == 0xffffffc0002999d6 t[4] == 0x0000000000000000 t[5] == 0xffffffc0985482b0 t[6] == 0x0000000000000001 s[0] == 0x0000000000000000 s[1] == 0xffffffd002139a88 s[2] == 0xffffffd002139a00 s[3] == 0x0000000000001000 s[4] == 0xffffffd002139a68 s[5] == 0x00000000ffffffff s[6] == 0xffffffd002139a00 s[7] == 0x0000000000000001 s[8] == 0x0000000000004f68 s[9] == 0xffffffd009ea3b10 s[10] == 0xffffffd001448d70 s[11] == 0x0000000000000000 a[0] == 0x0000000000000002 a[1] == 0x0000000000000000 a[2] == 0x0000000000000000 a[3] == 0x0000000000100000 a[4] == 0x0000000000000000 a[5] == 0xffffffc0007d6a40 a[6] == 0xffffffd07fd8d380 a[7] == 0x0000000000000027 ra == 0xffffffc0a4e0af30 sp == 0xffffffc098548870 gp == 0x0000000000000000 tp == 0xffffffd009e04780 sepc == 0xffffffc0a4e0af3e sstatus == 0x8000000200006120 panic: Fatal page fault at 0xffffffc0a4e0af3e: 0000000000000000 cpuid = 0 time = 1612391189 KDB: stack backtrace: db_trace_self() at db_trace_self db_fetch_ksymtab() at db_fetch_ksymtab+0x15c kdb_backtrace() at kdb_backtrace+0x2c vpanic() at vpanic+0x148 panic() at panic+0x26 do_trap_supervisor() at do_trap_supervisor+0x500 do_trap_supervisor() at do_trap_supervisor+0x64 cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x68 --- exception 13, tval = 0 KDB: enter: panic [ thread pid 781 tid 100044 ] Stopped at kdb_enter+0x44: sd zero,0(a0) db> ``` I start QEMU with /local/scratch/alr48/cheri/output/sdk/bin/qemu-system-riscv64cheri -M virt -m 2048 -nographic -bios default -kernel /local/scratch/alr48/cheri/output/freebsd-riscv64/boot/kernel/kernel -drive if=none,file=/local/scratch/alr48/cheri/output/freebsd-riscv64.img,id=drv,format=raw -device virtio-blk-device,drive=drv -device virtio-net-device,netdev=net0 -netdev 'user,id=net0,ipv6=off'
(In reply to Alex Richardson from comment #1) Within QEMU, I can reproduce the panic with only the following command: $ ngctl mkpeer vtnet0: macfilter lower ether I have not yet had the chance to debug much further. My guess is that the netgraph module has been subtly broken on the platform all along, and r368443 was just the first to add tests for it.
May you point me to a link, how to setup a test environment for this type of CPU?
If you have space in your home directory, the fastest way would be: $ git clone https://github.com/CTSRD-CHERI/cheribuild $ cd cheribuild $ ./cheribuild.py qemu build-and-run-freebsd-riscv64 .... wait for the git clone + build to complete .... You should now have a QEMU login prompt root # cd /usr/tests && kyua test sys/netgraph/
I think I've narrowed this down to a pretty unsatisfying bug. See the linked review.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0d3b3beeb253e09b2b6b3805065594aecc7e2c2f commit 0d3b3beeb253e09b2b6b3805065594aecc7e2c2f Author: Mitchell Horne <mhorne@FreeBSD.org> AuthorDate: 2021-03-04 17:52:45 +0000 Commit: Mitchell Horne <mhorne@FreeBSD.org> CommitDate: 2021-03-04 20:59:58 +0000 riscv: fix errors in some atomic type aliases This appears to be a copy-and-paste error that has simply been overlooked. The tree contains only two calls to any of the affected variants, but recent additions to the test suite started exercising the call to atomic_clear_rel_int() in ng_leave_write(), reliably causing panics. Apparently, the issue was inherited from the arm64 atomic header. That instance was addressed in c90baf6817a0, but the fix did not make its way to RISC-V. Note that the particular test case ng_macfilter_test:main still appears to fail on this platform, but this change reduces the panic to a timeout. PR: 253237 Reported by: Jenkins, arichardson Reviewed by: kp, arichardson MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D29064 sys/riscv/include/atomic.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=cc24f5bc6f6eb56a959bd23ebb051d3bf6ebf670 commit cc24f5bc6f6eb56a959bd23ebb051d3bf6ebf670 Author: Mitchell Horne <mhorne@FreeBSD.org> AuthorDate: 2021-03-04 17:52:45 +0000 Commit: Mitchell Horne <mhorne@FreeBSD.org> CommitDate: 2021-03-08 14:03:01 +0000 riscv: fix errors in some atomic type aliases This appears to be a copy-and-paste error that has simply been overlooked. The tree contains only two calls to any of the affected variants, but recent additions to the test suite started exercising the call to atomic_clear_rel_int() in ng_leave_write(), reliably causing panics. Apparently, the issue was inherited from the arm64 atomic header. That instance was addressed in c90baf6817a0, but the fix did not make its way to RISC-V. Note that the particular test case ng_macfilter_test:main still appears to fail on this platform, but this change reduces the panic to a timeout. PR: 253237 Reported by: Jenkins, arichardson Reviewed by: kp, arichardson (cherry picked from commit 0d3b3beeb253e09b2b6b3805065594aecc7e2c2f) sys/riscv/include/atomic.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=03572a87a84cde47f672480d3c5485713b7c39fb commit 03572a87a84cde47f672480d3c5485713b7c39fb Author: Mitchell Horne <mhorne@FreeBSD.org> AuthorDate: 2021-03-04 17:52:45 +0000 Commit: Mitchell Horne <mhorne@FreeBSD.org> CommitDate: 2021-03-08 23:04:25 +0000 riscv: fix errors in some atomic type aliases This appears to be a copy-and-paste error that has simply been overlooked. The tree contains only two calls to any of the affected variants, but recent additions to the test suite started exercising the call to atomic_clear_rel_int() in ng_leave_write(), reliably causing panics. Apparently, the issue was inherited from the arm64 atomic header. That instance was addressed in c90baf6817a0, but the fix did not make its way to RISC-V. Note that the particular test case ng_macfilter_test:main still appears to fail on this platform, but this change reduces the panic to a timeout. PR: 253237 Reported by: Jenkins, arichardson Reviewed by: kp, arichardson Approved by: re (gjb) (cherry picked from commit 0d3b3beeb253e09b2b6b3805065594aecc7e2c2f) (cherry picked from commit cc24f5bc6f6eb56a959bd23ebb051d3bf6ebf670) sys/riscv/include/atomic.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)