Trying to run Linux Firefox binary from Ubuntu Focal seems to trigger panic on amd64 FreeBSD 15: __curthread () at /usr/home/trasz/git/freebsd-src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/home/trasz/git/freebsd-src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=textdump@entry=1) at /usr/home/trasz/git/freebsd-src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80b4ee90 in kern_reboot (howto=260) at /usr/home/trasz/git/freebsd-src/sys/kern/kern_shutdown.c:526 #3 0xffffffff80b4f38f in vpanic ( fmt=0xffffffff811406d9 "%s: fam out of bounds (%d < %d)", ap=ap@entry=0xfffffe00fc3a2b30) at /usr/home/trasz/git/freebsd-src/sys/kern/kern_shutdown.c:970 #4 0xffffffff80b4f133 in panic (fmt=<unavailable>) at /usr/home/trasz/git/freebsd-src/sys/kern/kern_shutdown.c:894 #5 0xffffffff80cbbee3 in rt_tables_get_rnh_ptr (table=<optimized out>, family=<optimized out>) at /usr/home/trasz/git/freebsd-src/sys/net/route/route_tables.c:372 #6 rt_tables_get_rnh (table=<optimized out>, family=<optimized out>) at /usr/home/trasz/git/freebsd-src/sys/net/route/route_tables.c:387 #7 0xffffffff80dc626d in dump_rtable_fib (wa=0xfffffe00fc3a2bc8, fibnum=0, family=255) at /usr/home/trasz/git/freebsd-src/sys/netlink/route/rt.c:599 #8 handle_rtm_dump (nlp=0xfffff803cbde4700, fibnum=0, family=255, hdr=0xfffff8039edb6800, nw=<unavailable>) at /usr/home/trasz/git/freebsd-src/sys/netlink/route/rt.c:682 #9 rtnl_handle_getroute (hdr=0xfffff8039edb6800, nlp=0xfffff803cbde4700, npt=0xfffffe00fc3a2dc0) at /usr/home/trasz/git/freebsd-src/sys/netlink/route/rt.c:1028 #10 0xffffffff80dbe552 in rtnl_handle_message (hdr=0xfffff8039edb6800, npt=0xfffffe00fc3a2dc0) at /usr/home/trasz/git/freebsd-src/sys/netlink/netlink_route.c:104 #11 0xffffffff80dbbeaa in nl_receive_message (hdr=0xfffff8039edb6800, remaining_length=<optimized out>, nlp=0xfffff803cbde4700, npt=0xfffffe00fc3a2dc0) at /usr/home/trasz/git/freebsd-src/sys/netlink/netlink_io.c:506 #12 nl_process_mbuf (m=0xfffff800099fd000, nlp=0xfffff803cbde4700) at /usr/home/trasz/git/freebsd-src/sys/netlink/netlink_io.c:580 #13 nl_process_received_one (nlp=0xfffff803cbde4700) at /usr/home/trasz/git/freebsd-src/sys/netlink/netlink_io.c:293 #14 nl_process_received (nlp=0xfffff803cbde4700) at /usr/home/trasz/git/freebsd-src/sys/netlink/netlink_io.c:320 #15 nl_taskqueue_handler (_arg=0xfffff803cbde4700, pending=<optimized out>) at /usr/home/trasz/git/freebsd-src/sys/netlink/netlink_io.c:371 #16 0xffffffff80bb497b in taskqueue_run_locked ( queue=queue@entry=0xfffff8002400cc00) at /usr/home/trasz/git/freebsd-src/sys/kern/subr_taskqueue.c:512 #17 0xffffffff80bb5a33 in taskqueue_thread_loop ( arg=arg@entry=0xfffff803cbde4760) at /usr/home/trasz/git/freebsd-src/sys/kern/subr_taskqueue.c:824 #18 0xffffffff80b04f02 in fork_exit ( callout=0xffffffff80bb5960 <taskqueue_thread_loop>, arg=0xfffff803cbde4760, frame=0xfffffe00fc3a2f40) at /usr/home/trasz/git/freebsd-src/sys/kern/kern_fork.c:1160 #19 <signal handler called> #20 0x00000008011306c6 in ?? ()
(In reply to Edward Tomasz Napierala from comment #0) > amd64 FreeBSD 15: Which version, exactly? Reproducible with an updated OS? (Reading this alongside bug 274538 comment 1.)
Ah; sorry; I've just verified it's still happening with: FreeBSD pustak 15.0-CURRENT FreeBSD 15.0-CURRENT #69 main-n266018-d2abbfede534-dirty: Wed Oct 18 11:33:02 BST 2023 root@pustak:/usr/obj/usr/home/trasz/git/freebsd-src/amd64.amd64/sys/GENERIC amd64
I've cc'd melifaro@ as this looks to be related to netlink reading the routing tables.
This is still happening with yesterday's CURRENT. It looks like the bug is caused by rtnl_handle_getroute() being called with family=255, which then causes assertion in rt_tables_get_rnh_ptr(). I'm not sure where this value - which from I assume came from userspace - should be handled? I've pinged glebius@, he's done some Netlink work recently.
I guess the 255 is coming from sys/compat/linux/linux.c:linux_to_bsd_domain() return (-1). Can you please first modify your kernel so that it doesn't panic: e.g. in netlink_io.c after line 284 just return as if the msg_from_linux failed. Then please add print of the actual value of domain in linux_to_bsd_domain().
Thank you! It's 17, which appears to be PF_PACKET.
Adding Dmitry and Alexander here. TLDR version for them: An application (Firefox) sends NETLINK_ROUTE message with AF_PACKET in it. linux_to_bsd_domain() fails to find an analog in FreeBSD, and returns 0xffffffff. Later that truncates down to 0xff and rt_tables_get_rnh_ptr() panics. How should we fix that? At what level should we report EOPNOTSUPP (or maybe other) error? My guess that should live in NetLink, cause it is NetLink that doesn't check return value of linux_to_bsd_domain(). The latter honestly reports "I don't know".
Created attachment 249197 [details] check if it is just a typo Can you please try out this patch? Reverting any previous changes.
See also https://reviews.freebsd.org/D44375
Negative; the s/254/255/ patch doesn't seem to fix the panic I'm having.
Can you please apply both https://reviews.freebsd.org/D44375 https://reviews.freebsd.org/D44392 and check if that fixes the panic?
Sorry, no joy - with those two applied it still panics like before, with similar backtrace.
On Mon Mar 18 16:56:14 2024 UTC, trasz@FreeBSD.org wrote: > Sorry, no joy - with those two applied it still panics like before, with similar > backtrace. I have just updated both revisions and correct a mistake. Can you please try again?
Bingo! Those two fix my panic. Thank you :)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b977dd1ea5fbc2df3f1279330be4d089322eb2cf commit b977dd1ea5fbc2df3f1279330be4d089322eb2cf Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2024-03-29 20:35:51 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2024-03-29 20:35:51 +0000 linux: make linux_netlink_p->msg_from_linux be able to fail The KPI for this function was misleading. From the NetLink perspective it looked like a function that: a) allocates new hdr, b) can fail. Neither was true. Let the function return a error code instead of returning the same hdr it was passed to. In case if future Linux NetLink compatibility support calls for reallocating header, pass hdr as pointer to pointer. With KPI that returns a error, propagate domain conversion errors all the way up to NetLink module. This fixes panic when unknown domain is converted to 0xff and this invalid value is passed into NetLink processing. PR: 274536 Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D44392 sys/compat/linux/linux_netlink.c | 58 ++++++++++++++++++++++++++-------------- sys/netlink/netlink_io.c | 22 +++++++-------- sys/netlink/netlink_linux.h | 2 +- 3 files changed, 48 insertions(+), 34 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9d4a08d162d87ba120f418a1a71facd2c631b549 commit 9d4a08d162d87ba120f418a1a71facd2c631b549 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2024-03-29 20:35:37 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2024-03-29 20:35:37 +0000 linux: use sa_family_t for address family conversions Express "conversion failed" with maximum possible value. This allows to reduce number of size/signedness conversion in the code that utilizes the functions. PR: 274536 Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D44375 sys/compat/linux/linux.c | 18 +++++++++--------- sys/compat/linux/linux_common.h | 5 +++-- sys/compat/linux/linux_socket.c | 9 +++++---- 3 files changed, 17 insertions(+), 15 deletions(-)