The following silly test case exposes a race in in6_unlink_ifa() that panics because the second thread into in6_unlink_ifa() attempts to remove the remove the address from &ifp->if_addrhead that the first thread has already removed and freed: while : do /etc/rc.d/netif restart igb1 & /etc/rc.d/netif restart igb1 & wait sleep 5 done ----------------------------------------- The panic thread: Unread portion of the kernel message buffer: panic: Bad link elm 0xfffff815570d7400 next->prev != elm Thread 1455 (Thread 102370): (struct thread *)0xfffff811a0e49000, tid 102370 ifconfig :: (struct proc *)0xfffff815570d6000, pid 5784 args: /sbin/ifconfig igb1 inet6 fe80::225:90ff:fec9:a5fd -alias #11 0xffffffff804bf103 in panic (fmt=<value optimized out>) at sys/kern/kern_shutdown.c:690 #12 0xffffffff8067e6f4 in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1292 #13 0xffffffff8067c30b in in6_control (so=<value optimized out>, cmd=<value optimized out>, data=<value optimized out>, ifp=<value optimized out>, td=<value optimized out>) at sys/netinet6/in6.c:699 #14 0xffffffff805aef80 in ifioctl (so=<value optimized out>, cmd=2166384921, data=0xfffff80158647c00 "igb1", td=0xfffff811a0e49000) at sys/net/if.c:2859 #15 0xffffffff80524ab4 in kern_ioctl (td=<value optimized out>, fd=<value optimized out>, com=<value optimized out>, data=<value optimized out>) at file.h:323 #16 0xffffffff8052476e in sys_ioctl (td=0xfffff811a0e49000, uap=0xfffffe1b8e3afa30) at sys/kern/sys_generic.c:745 (kgdb) frame 12 #12 0xffffffff8067e6f4 in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1292 1292 TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link); Note: In order to clearly show the where the panic occured, I wraped in6_unlink_ifa() in #pragma clang optimize off/on. Without this the offending frame (#12) looks like: #12 0xffffffff8067edba in in6_unlink_ifa (ia=0xfffff819e5dd5200, ifp=<value optimized out>) at fnv_hash.h:29 ----------------------------------------- The thread that removed the address: Thread 1456 (Thread 101967): (struct thread *)0xfffff81557641000, tid 101967 ifconfig :: (struct proc *)0xfffff81557595000, pid 5785 args: /sbin/ifconfig igb1 inet6 fe80::225:90ff:fec9:a5fd -alias #2 0xffffffff8078425a in trap (frame=0xfffffe1a5ddf1f30) at sys/amd64/amd64/trap.c:185 #3 0xffffffff80768863 in nmi_calltrap () at sys/amd64/amd64/exception.S:510 #4 0xffffffff80510032 in smp_rendezvous_cpus (map={__bits = 0xfffffe1b8e6ce580}, setup_func=0xffffffff8050fe80 <smp_no_rendevous_barrier>, action_func=<value optimized out>, teardown_func=<value optimized out>, arg=<value optimized out>) at cpufunc.h:339 #5 0xffffffff804b98ae in _rm_wlock (rm=0xffffffff80eeeac0) at sys/kern/kern_rmlock.c:558 #6 0xffffffff804b9b14 in _rm_wlock_debug (rm=0xffffffff80eeeac0, file=0xffffffff80895d8d "sys/netinet6/in6.c", line=1301) at sys/kern/kern_rmlock.c:610 #7 0xffffffff8067e7ae in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1301 #8 0xffffffff8067c30b in in6_control (so=<value optimized out>, cmd=<value optimized out>, data=<value optimized out>, ifp=<value optimized out>, td=<value optimized out>) at sys/netinet6/in6.c:699 #9 0xffffffff805aef80 in ifioctl (so=<value optimized out>, cmd=2166384921, data=0xfffff81557272200 "igb1", td=0xfffff81557641000) at sys/net/if.c:2859 #10 0xffffffff80524ab4 in kern_ioctl (td=<value optimized out>, fd=<value optimized out>, com=<value optimized out>, data=<value optimized out>) at file.h:323 #11 0xffffffff8052476e in sys_ioctl (td=0xfffff81557641000, uap=0xfffffe1b8e6cea30) at sys/kern/sys_generic.c:745 (kgdb) frame 7 #7 0xffffffff8067e7ae in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1301 1301 IN6_IFADDR_WLOCK();
Patching in6_unlink_ifa() with something like below only pushes the the crash to a double free in in6_leavegroup(): #11 0xffffffff804bf103 in panic (fmt=<value optimized out>) #12 0xffffffff8073765e in uma_dbg_free (zone=0xfffff81b7ffce000, ...) #13 0xffffffff807370d4 in uma_zfree_arg (zone=0xfffff81b7ffce000, ...) #14 0xffffffff8049a91b in free (addr=0xfffff801eb278c20, mtp=0xffffffff80b5a980) #15 0xffffffff80684b6f in in6_leavegroup (imm=0xfffff801eb278c20) #16 0xffffffff8067e83b in in6_purgeaddr (ifa=0xfffff8015b270600) #17 0xffffffff8067c355 in in6_control () % diff -du in6.c.orig in6.c --- in6.c.orig 2018-01-24 16:15:52.742977158 -0700 +++ in6.c 2018-01-24 16:17:29.140814668 -0700 @@ -1288,8 +1288,16 @@ int remove_lle; IF_ADDR_WLOCK(ifp); - TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link); + TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { + if (ifa->ifa_addr->sa_family == AF_INET6 && + (struct in6_ifaddr *)ifa == ia) { + TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link); + break; + } + } IF_ADDR_WUNLOCK(ifp); + if (ifa == NULL) + return; ifa_free(&ia->ia_ifa); /* if_addrhead */
Admittedly, the test case listed is specious, but the following a bit less so: while : do setfib 1 arp -d $ADDR & setfib 1 ping -c 4 $ADDR & setfib 1 arp -d $ADDR & /etc/rc.d/netif restart igb1 & setfib 1 ping -c 4 $ADDR & setfib 1 arp -d $ADDR & wait sleep 5 done panic: in6_ifattach_linklocal: ia == NULL, ifp=0xfffff8012150f800 (struct thread *)0xfffff8171e964560, tid 102323 ifconfig :: (struct proc *)0xfffff8171e298588, pid 48226 args: /sbin/ifconfig igb1 up #9 0xffffffff804ffacb in kdb_enter (why=0xffffffff80862eef "panic", msg=<value optimized out>) at cpufunc.h:63 #10 0xffffffff804bf0c3 in vpanic (fmt=<value optimized out>, ap=0xfffffe1b8ef285c0) at sys/kern/kern_shutdown.c:752 #11 0xffffffff804bef16 in kassert_panic (fmt=<value optimized out>) at sys/kern/kern_shutdown.c:649 #12 0xffffffff80682613 in in6_ifattach (ifp=0xfffff8012150f800, altifp=<value optimized out>) at sys/netinet6/in6_ifattach.c:506 #13 0xffffffff805ade5a in if_up (ifp=0xfffff8012150f800) at sys/net/if.c:2155 #14 0xffffffff805af399 in ifioctl (so=<value optimized out>, cmd=<value optimized out>, data=0xfffffe1b8ef28880 "igb1", td=<value optimized out>) at sys/net/if.c:2459 #15 0xffffffff80524ad4 in kern_ioctl (td=<value optimized out>, fd=<value optimized out>, com=<value optimized out>, data=<value optimized out>) at file.h:323 #16 0xffffffff8052478e in sys_ioctl (td=0xfffff8171e964560, uap=0xfffffe1b8ef28a30) at sys/kern/sys_generic.c:745
I have a test case for epair that produces this panic on HEAD, up for review here: https://reviews.freebsd.org/D20498
Test case was committed by asomers@ in base r349009 For any proposed patches in this issue, please include them as attachments. diff in comment 1 appears only to be a prototype, not actually resolving the issue (but changes the crash location)
Was the test case really for this PR and not for PR201466?
I just reproduced a similar race on 15.0-CURRENT. In my case it is race not between to in6_control()s but rather between in6_control() and DAD timer. I'll try to look at it.
This particular race of in6_control() vs in6_control() was fixed by f5a365e51feea75d1e5ebc86c53808d8cae7b6d7. The fix is in 14.0-RELEASE. The issue I reproduced today is slightly different and it is not worth keeping this bug open.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6ee181b9d5f00cfe337419370ba4590d939cce84 commit 6ee181b9d5f00cfe337419370ba4590d939cce84 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2024-01-11 04:51:53 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2024-01-11 04:51:53 +0000 tests/net: enable if_clone_test:epair_ipv6_up_stress The panic mentioned was fixed in f5a365e51feea75d1e5ebc86c53808d8cae7b6d7. PR: 225438 tests/sys/net/if_clone_test.sh | 1 - 1 file changed, 1 deletion(-)