Summary: | Kernel crash in sys/netlink/route/iface.c:124 | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Mike Cui <cuicui> | ||||
Component: | kern | Assignee: | freebsd-net (Nobody) <net> | ||||
Status: | New --- | ||||||
Severity: | Affects Some People | CC: | emaste, jhibbits, kp, loos, melifaro, mizhka, nyandarknessgirl, unspam, zlei | ||||
Priority: | --- | Keywords: | crash | ||||
Version: | 14.0-RELEASE | ||||||
Hardware: | arm64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Presumably the crash happened sys/netlink/route/iface.c:124, in function get_operstate_ether() which calls if_ioctl(), and ifp->if_ioctl is NULL for some reason. It also appears that NETLINK subsystem was added in 13.2, but NETLINK is not included in the base kernel in 13.2, but it is in 14.0. Is NETLINK required in 14.0? Can just compile out of the kernel myself? That's interesting. The cause is fairly obvious. e6000sw creates struct ifnets that don't have an ioctl handler, and that's triggering this crash. The following is probably the best fix, assuming that the ioctl-less e6000sw ifnet is intentional: diff --git a/sys/net/if.c b/sys/net/if.c index 9f44223af0dd..c3c27fbf678f 100644 --- a/sys/net/if.c +++ b/sys/net/if.c @@ -4871,6 +4871,9 @@ if_resolvemulti(if_t ifp, struct sockaddr **srcs, struct sockaddr *dst) int if_ioctl(if_t ifp, u_long cmd, void *data) { + if (ifp->if_ioctl == NULL) + return (EOPNOTSUPP); + return (ifp->if_ioctl(ifp, cmd, data)); } On 2023-12-25 09:44:27 UTC, kp@freebsd.org wrote: > That's interesting. The cause is fairly obvious. e6000sw creates struct ifnets > that don't have an ioctl handler, and that's triggering this crash. Well, without Netlink, there were always ways to call if_ioctl() and that would crash on stable/13. Mike, can you please provide full trace of the crash? Can you please point me to the sources of the e6000sw? That code lives in sys/dev/etherswitch/e6000sw/e6000sw.c It creates a struct ifnet for each port in e6000sw_attach() / e6000sw_init_interface(). It never actually attached that ifnet though. I believe it's only created so e6000sw can call into the mii code, which is also how I think we eventually end up in the panicing stack. There's a link state event, which calls do_link_state_change() -> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() -> get_operstate_ether(). That wants to know if the link is up or down, so it tries to ioctl(SIOCGIFMEDIA). Which doesn't go well if if_ioctl is NULL. Here's the relevant bit of backtrace: #7 0x0000000000000000 in ?? () #8 0xffff0000006f87f4 in get_operstate_ether (ifp=0xffffa00002f7d000, pstate=<optimized out>) at /usr/src/sys/netlink/route/iface.c:124 #9 get_operstate (ifp=0xffffa00002f7d000, pstate=<optimized out>) at /usr/src/sys/netlink/route/iface.c:181 #10 dump_iface (nw=nw@entry=0xffff0000877e0780, ifp=ifp@entry=0xffffa00002f7d000, hdr=hdr@entry=0xffff0000877e07c0, if_flags_mask=if_flags_mask@entry=0) at /usr/src/sys/netlink/route/iface.c:310 #11 0xffff0000006f80cc in rtnl_handle_ifevent (ifp=0xffffa00002f7d000, nlmsg_type=<optimized out>, if_flags_mask=0) at /usr/src/sys/netlink/route/iface.c:1411 #12 0xffff0000005f9cb8 in do_link_state_change (arg=0xffffa00002f7d000, pending=1) at /usr/src/sys/net/if.c:2181 #13 0xffff000000525bf0 in taskqueue_run_locked ( queue=queue@entry=0xffffa0000136d300) at /usr/src/sys/kern/subr_taskqueue.c:512 #14 0xffff00000052594c in taskqueue_run (queue=0xffffa0000136d300) at /usr/src/sys/kern/subr_taskqueue.c:527 Any chance we can get this fixed for 14.1? It would be nice if 14.1 worked out of the box without having to cross compile my own custom kernel. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=43387b4e574043b78a58c8bcb7575161b055fce1 commit 43387b4e574043b78a58c8bcb7575161b055fce1 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2024-05-06 09:39:08 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2024-05-06 09:39:08 +0000 if: guard against if_ioctl being NULL There are situations where an struct ifnet has a NULL if_ioctl pointer. For example, e6000sw creates such struct ifnets for each of its ports so it can call into the MII code. If there is then a link state event this calls do_link_state_change() -> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() -> get_operstate_ether(). That wants to know if the link is up or down, so it tries to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL. Guard against this, and return EOPNOTSUPP. PR: 275920 MFC ater: 3 days Sponsored by: Rubicon Communications, LLC ("Netgate") sys/net/if.c | 3 +++ 1 file changed, 3 insertions(+) A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9a8a26aefb366ef6f497d48547a1562a1de566c1 commit 9a8a26aefb366ef6f497d48547a1562a1de566c1 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2024-05-06 09:39:08 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2024-05-12 16:12:04 +0000 if: guard against if_ioctl being NULL There are situations where an struct ifnet has a NULL if_ioctl pointer. For example, e6000sw creates such struct ifnets for each of its ports so it can call into the MII code. If there is then a link state event this calls do_link_state_change() -> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() -> get_operstate_ether(). That wants to know if the link is up or down, so it tries to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL. Guard against this, and return EOPNOTSUPP. PR: 275920 MFC ater: 3 days Sponsored by: Rubicon Communications, LLC ("Netgate") (cherry picked from commit 43387b4e574043b78a58c8bcb7575161b055fce1) sys/net/if.c | 3 +++ 1 file changed, 3 insertions(+) Gentle reminder: can this be cherry-picked to releng/14.1? A commit in branch releng/14.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fecd303882565954f58d984e4a13735e080b5263 commit fecd303882565954f58d984e4a13735e080b5263 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2024-05-06 09:39:08 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2024-05-20 07:38:40 +0000 if: guard against if_ioctl being NULL There are situations where an struct ifnet has a NULL if_ioctl pointer. For example, e6000sw creates such struct ifnets for each of its ports so it can call into the MII code. If there is then a link state event this calls do_link_state_change() -> rtnl_handle_ifevent() -> dump_iface() -> get_operstate() -> get_operstate_ether(). That wants to know if the link is up or down, so it tries to ioctl(SIOCGIFMEDIA), which doesn't go well if if_ioctl is NULL. Guard against this, and return EOPNOTSUPP. PR: 275920 MFC ater: 3 days Approved by: re (cperciva) Sponsored by: Rubicon Communications, LLC ("Netgate") (cherry picked from commit 43387b4e574043b78a58c8bcb7575161b055fce1) (cherry picked from commit 9a8a26aefb366ef6f497d48547a1562a1de566c1) sys/net/if.c | 3 +++ 1 file changed, 3 insertions(+) This also affects if_re driver with Realtek RTL8169 NIC in some cases. Reproducible on FreeBSD 14.1 and 15.0-CURRENT My RVVM virtual machine emulates rtl8169 and triggers this bug (apparently): https://github.com/LekKit/RVVM/issues/131 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x36 kdb_backtrace() at kdb_backtrace+0x2c vpanic() at vpanic+0x122 panic() at panic+0x26 page_fault_handler() at page_fault_handler+0x22a do_trap_supervisor() at do_trap_supervisor+0x6c cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74 --- exception 12, tval = 0 (null)() at 0 if_ioctl() at if_ioctl+0xc dump_iface() at dump_iface+0x10e rtnl_handle_ifevent() at rtnl_handle_ifevent+0x74 rtnl_handle_ifattach() at rtnl_handle_ifattach+0x48 if_attach_internal() at if_attach_internal+0x33a if_attach() at if_attach+0xe ether_ifattach() at ether_ifattach+0x32 .Lpcrel_hi106() at .Lpcrel_hi106+0x46 device_attach() at device_attach+0x36a device_probe_and_attach() at device_probe_and_attach+0x72 pci_driver_added() at pci_driver_added+0x102 devclass_driver_added() at devclass_driver_added+0x34 devclass_add_driver() at devclass_add_driver+0xfc driver_module_handler() at driver_module_handler+0x6a module_register_init() at module_register_init+0xa8 linker_load_module() at linker_load_module+0x9e6 kern_kldload() at kern_kldload+0x14e sys_kldload() at sys_kldload+0x54 do_trap_user() at do_trap_user+0x1de cpu_exception_handler_user() at cpu_exception_handler_user+0x72 --- syscall (304, FreeBSD ELF64, kldload) Same issue here with 15-Current and 14.1 stable Still happening in both 15-CURRENT and 14.1-STABLE root@freebsd:~ # kldload if_re re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> mem 0x40000000-0x400000ff irq 9 at device 1.0 on pci0 re0: Chip rev. 0x00800000 re0: MAC rev. 0x00000000 t[0]: 0x0000000000000800 t[1]: 0xffffffc071c0b26c t[2]: 0x49f17f2d48bb1c18 t[3]: 0xffffffc0003f699c t[4]: 0xda0f930be87ea53e t[5]: 0x9b63889a118d9c5c t[6]: 0x7ffe55743bd38475 s[0]: 0xffffffc050c9a280 s[1]: 0xffffffd00264ea80 s[2]: 0xffffffc050c9a2b8 s[3]: 0x0000000000000000 s[4]: 0xffffffd002498810 s[5]: 0xffffffc050c9a358 s[6]: 0xffffffc0009add18 s[7]: 0xffffffc0007f2b62 s[8]: 0xffffffc068700138 s[9]: 0xffffffc068703130 s[10]: 0xffffffc000717948 s[11]: 0x0000000000000801 a[0]: 0xffffffd00265e000 a[1]: 0xffffffc050c9a2b8 a[2]: 0xffffffd00264ea80 a[3]: 0x0000000000000000 a[4]: 0x00000000c0306938 a[5]: 0x0000000000000010 a[6]: 0x0000000000000020 a[7]: 0x0000000000000016 ra: 0xffffffc0003f6a8c sp: 0xffffffc050c9a250 gp: 0xffffffc0007f1520 tp: 0xffffffc0009c0700 sepc: 0x0000000000000000 sstatus: 0x0000000200000120 stval : 0x0000000000000000 panic: Fatal page fault at 0: 0 cpuid = 0 time = 1728370984 KDB: stack backtrace: #0 0xffffffc00033755e at kdb_backtrace+0x4a #1 0xffffffc0002f2878 at vpanic+0x10c #2 0xffffffc0002f2768 at panic+0x26 #3 0xffffffc0005a171c at page_fault_handler+0x210 #4 0xffffffc0005a0e50 at do_trap_supervisor+0x52 #5 0xffffffc000591344 at cpu_exception_handler_supervisor+0x74 #6 0xffffffc0003edd5a at if_ioctl+0xc #7 0xffffffc0004aac66 at dump_iface+0x130 #8 0xffffffc0004aa5da at rtnl_handle_ifevent+0x7c #9 0xffffffc0004aa8e0 at rtnl_handle_ifattach+0x48 #10 0xffffffc0003e79a4 at if_attach_internal+0x36c #11 0xffffffc0003e762c at if_attach+0xe #12 0xffffffc0003f20f6 at ether_ifattach+0x34 #13 0xffffffc071c068e2 at .Lpcrel_hi95+0x46 #14 0xffffffc0003281ca at device_attach+0x36a #15 0xffffffc000327e44 at device_probe_and_attach+0x3e #16 0xffffffc000125f94 at pci_driver_added+0x102 #17 0xffffffc000325bfc at devclass_driver_added+0x34 FreeBSD 15-CURRENT backtrace root@freebsd:~ # kldload if_re re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet> mem 0x40000000-0x400000ff irq 9 at device 1.0 on pci0 re0: Chip rev. 0x00800000 re0: MAC rev. 0x00000000 t[0]: 0x0000000000000000 t[1]: 0xffffffc07560b49c (.Lpcrel_hi235 + 0x4a2) t[2]: 0x0000000000010000 t[3]: 0xffffffc0004232ba (ifmedia_ioctl) t[4]: 0x744380940d76cf29 t[5]: 0x35930f09d23e5346 t[6]: 0x4134e5b735e0dc80 s[0]: 0xffffffc054f232b0 ($d.1 + 0x51ee95a8) s[1]: 0xffffffd000b43900 s[2]: 0xffffffc054f232f0 ($d.1 + 0x51ee95e8) s[3]: 0x0000000000000000 s[4]: 0xffffffd0571ce000 s[5]: 0xffffffd02c09302c s[6]: 0xffffffc000a6bd40 (__stack_chk_guard) s[7]: 0xffffffc0008a8666 ($d.0 + 0xe) s[8]: 0xffffffc07557c138 ($d.1 + 0x72542430) s[9]: 0xffffffc07557f130 ($d.1 + 0x72545428) s[10]: 0xffffffc0007a1350 (pci_find_cap_desc) s[11]: 0x0000000000000801 a[0]: 0xffffffd0571ce000 a[1]: 0xffffffc054f232f0 ($d.1 + 0x51ee95e8) a[2]: 0xffffffd000b43900 a[3]: 0x0000000000000000 a[4]: 0x00000000c0306938 a[5]: 0x0000000000000010 a[6]: 0xffffffd02c09301c a[7]: 0x0000000000000016 ra: 0xffffffc0004233aa (ifmedia_ioctl + 0xf0) sp: 0xffffffc054f23280 ($d.1 + 0x51ee9578) gp: 0xffffffc0008a7730 (__global_pointer$) tp: 0xffffffc000b2ca80 (__pcpu) sepc: 0x0000000000000000 sstatus: 0x0000000200000120 stval : 0x0000000000000000 panic: Fatal page fault at 0: 0 cpuid = 0 time = 1728371624 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x36 kdb_backtrace() at kdb_backtrace+0x2c vpanic() at vpanic+0x122 panic() at panic+0x26 page_fault_handler() at page_fault_handler+0x22a do_trap_supervisor() at do_trap_supervisor+0x6c cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74 --- exception 12, tval = 0 (null)() at 0 if_ioctl() at if_ioctl+0xc dump_iface() at dump_iface+0x10e rtnl_handle_ifevent() at rtnl_handle_ifevent+0x74 rtnl_handle_ifattach() at rtnl_handle_ifattach+0x48 if_attach_internal() at if_attach_internal+0x33a if_attach() at if_attach+0xe ether_ifattach() at ether_ifattach+0x32 .Lpcrel_hi105() at .Lpcrel_hi105+0x46 device_attach() at device_attach+0x36a device_probe_and_attach() at device_probe_and_attach+0x72 pci_driver_added() at pci_driver_added+0x102 devclass_driver_added() at devclass_driver_added+0x34 devclass_add_driver() at devclass_add_driver+0xfc driver_module_handler() at driver_module_handler+0x6a module_register_init() at module_register_init+0xa8 linker_load_module() at linker_load_module+0x9e6 kern_kldload() at kern_kldload+0x14e sys_kldload() at sys_kldload+0x54 do_trap_user() at do_trap_user+0x1e0 cpu_exception_handler_user() at cpu_exception_handler_user+0x72 --- syscall (304, FreeBSD ELF64, kldload) |
Created attachment 247235 [details] kernel crashdump I have an espressobin v7 which was running fine on 13.2-RELEASE. However, after upgrading to 14.0-RELEASE, the kernel crashes with vm_fault immediately after loading the e6000sw module. Attached is the full kernel crash dump core.txt file.