Reproduction steps: ------------------- 1. Create an AWS EC2 instance from FreeBSD 14.0-CURRENT-amd64-20230323 UEFI , ami-02dbe14b26d93d722 in us-east-1 (or any newer ami that starts with "FreeBSD 14.0-CURRENT-amd64-") 2. run kldunload if_ena.ko Result: ------- Crashes every time. 100% reproducible. Core dump stack: __curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /root/freebsd-src/sys/amd64/include/pcpu_aux.h:59 #1 doadump (textdump=textdump@entry=1) at /root/freebsd-src/sys/kern/kern_shutdown.c:407 #2 0xffffffff80bedc6c in kern_reboot (howto=260) at /root/freebsd-src/sys/kern/kern_shutdown.c:528 #3 0xffffffff80bee18f in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe01fef62ae0) at /root/freebsd-src/sys/kern/kern_shutdown.c:972 #4 0xffffffff80bedf13 in panic (fmt=<unavailable>) at /root/freebsd-src/sys/kern/kern_shutdown.c:896 #5 0xffffffff810e2b39 in trap_fatal (frame=0xfffffe01fef62b70, eva=0) at /root/freebsd-src/sys/amd64/amd64/trap.c:954 #6 <signal handler called> #7 dump_sa (nw=nw@entry=0xfffffe01fef62d08, attr=attr@entry=1, sa=0xdeadc0dedeadc0de) at /root/freebsd-src/sys/netlink/route/iface.c:210 #8 0xffffffff80e5659a in dump_iface (nw=nw@entry=0xfffffe01fef62d08, ifp=ifp@entry=0xfffff80109bbe800, hdr=hdr@entry=0xfffffe01fef62d48, if_flags_mask=if_flags_mask@entry=0) at /root/freebsd-src/sys/netlink/route/iface.c:279 #9 0xffffffff80e55e7b in rtnl_handle_ifevent (ifp=0xfffff80109bbe800, nlmsg_type=<optimized out>, if_flags_mask=0) at /root/freebsd-src/sys/netlink/route/iface.c:943 #10 0xffffffff80d1fc1d in do_link_state_change (arg=0xfffff80109bbe800, pending=1) at /root/freebsd-src/sys/net/if.c:2205 #11 0xffffffff80c5233a in taskqueue_run_locked ( queue=queue@entry=0xfffff80106ce7100) at /root/freebsd-src/sys/kern/subr_taskqueue.c:514 #12 0xffffffff80c5224d in taskqueue_run (queue=0xfffff80106ce7100) at /root/freebsd-src/sys/kern/subr_taskqueue.c:529 #13 0xffffffff80ba8126 in intr_event_execute_handlers (ie=0xfffff80106a9d300, p=<optimized out>) at /root/freebsd-src/sys/kern/kern_intr.c:1207 #14 ithread_execute_handlers (ie=0xfffff80106a9d300, p=<optimized out>) at /root/freebsd-src/sys/kern/kern_intr.c:1220 #15 ithread_loop (arg=arg@entry=0xfffff80106c951c0) at /root/freebsd-src/sys/kern/kern_intr.c:1308 #16 0xffffffff80ba45c0 in fork_exit ( callout=0xffffffff80ba7eb0 <ithread_loop>, arg=0xfffff80106c951c0, frame=0xfffffe01fef62f40) at /root/freebsd-src/sys/kern/kern_fork.c:1102 #17 <signal handler called> (kgdb) Initial investigation results: ------------------------------ 1. printed ifp->if_addr->ifa_addr inside do_link_state_change and it is 0xdeadc0dedeadc0de. 2. Initially I suspected that it is some kernel issue. I therefore tried to find a kernel commit that caused this: The last non crashing instance is with ami (us-east-1): FreeBSD 14.0-CURRENT-amd64-20230316 UEFI , ami-0d80d8baae9fea731 uname -a shows kernel commit hash cee09bda03c8 The first crashing instance is with ami (us-east-1: FreeBSD 14.0-CURRENT-amd64-20230323 UEFI , ami-02dbe14b26d93d722 uname -a shows kernel commit hash b5d43972e394 However I saw that if the ami was a crashing ami - then no matter which kernel I built and installed from sources, the issue reproduced. And the other way, if I used a non crashing ami, no matter which kernel I build and installed form sources, the issue didn't reproduce. So I figured it is a Userland issue. So I went on to build and install Userland without kernel until I found the commit that caused the issue. (command used make buildworld -j`sysctl -n hw.ncpu` && make installworld -j`sysctl -n hw.ncpu` && reboot) This commit proved to be: https://reviews.freebsd.org/D39048 (commit before doesnt crash, commits >= crash). Relevant discussions: --------------------- Initially I commented in https://reviews.freebsd.org/D39048, which created an email thread where the following was written: Zhenlei Huang <zlei@FreeBSD.org>: ================================= iface.c:210 That might be line 214. Also be aware that `sa == 0xdeadc0dedeadc0de`. ``` static bool dump_iface(struct nl_writer *nw, struct ifnet *ifp, const struct nlmsghdr *hdr, int if_flags_mask) { ... if ((ifp->if_addr != NULL)) { dump_sa(nw, IFLA_ADDRESS, ifp->if_addr->ifa_addr); } ... } ``` There probably have concurrency between ifp destroying and interface status event handling. `ifp` might be freed before this event handler rtnl_handle_ifevent() . So only checking `ifp->if_addr != NULL` is not enough. ================================= Fix thoughts: ------------- My first thought was to alter the dump_iface code that @zlei pointed out and to check if "if->addr !=0xdeadc0dedeadc0de" But I didn't find any code that does that or a #define for 0xdeadc0dedeadc0de that I could use. So I guess this is not the right way to do this. Would appreciate any suggestions you may have on how to tackle this.
I've created https://reviews.freebsd.org/D39614 which should fix the issue. Any chance you could test it?
(In reply to Alexander V. Chernikov from comment #1) I've tested https://reviews.freebsd.org/D39614 and it solves the issue. Updated kernel and world to the latest main. => Issue reproduced. Applied this fix to kernel sources, built and installed kernel => issue doesn't reproduce.
I can reproduce a similar panic via `devctl disable iwm0` on my laptop
It looks like ena was fixed by: commit c59a5fbd8a2ef68ed0842cbb1df4450edd654129 Author: Arthur Kiyanovski <akiyano@amazon.com> Date: Sun May 21 12:31:54 2023 +0000 ena: Fix driver unload crash
^Triage: apparently fixed Sun May 21 12:31:54 2023 +0000.