Bug 225438 - panic in6_unlink_ifa() due to race
Summary: panic in6_unlink_ifa() due to race
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Gleb Smirnoff
URL:
Keywords: crash, needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2018-01-24 22:40 UTC by Dave Baukus
Modified: 2024-01-11 04:53 UTC (History)
7 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Baukus 2018-01-24 22:40:07 UTC
The following silly test case exposes a race in in6_unlink_ifa()
that panics because the second thread into in6_unlink_ifa() attempts
to remove the remove the address from &ifp->if_addrhead that the
first thread has already removed and freed:

while :
do
        /etc/rc.d/netif restart igb1 &
        /etc/rc.d/netif restart igb1 &
        wait
        sleep 5
done

-----------------------------------------
The panic thread:

Unread portion of the kernel message buffer:
panic: Bad link elm 0xfffff815570d7400 next->prev != elm

Thread 1455 (Thread 102370):
(struct thread *)0xfffff811a0e49000, tid 102370
   ifconfig :: (struct proc *)0xfffff815570d6000, pid 5784
   args: /sbin/ifconfig igb1 inet6 fe80::225:90ff:fec9:a5fd -alias

#11 0xffffffff804bf103 in panic (fmt=<value optimized out>) at sys/kern/kern_shutdown.c:690
#12 0xffffffff8067e6f4 in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1292
#13 0xffffffff8067c30b in in6_control (so=<value optimized out>, cmd=<value optimized out>, data=<value optimized out>, ifp=<value optimized out>, td=<value optimized out>) at sys/netinet6/in6.c:699
#14 0xffffffff805aef80 in ifioctl (so=<value optimized out>, cmd=2166384921, data=0xfffff80158647c00 "igb1", td=0xfffff811a0e49000) at sys/net/if.c:2859
#15 0xffffffff80524ab4 in kern_ioctl (td=<value optimized out>, fd=<value optimized out>, com=<value optimized out>, data=<value optimized out>) at file.h:323
#16 0xffffffff8052476e in sys_ioctl (td=0xfffff811a0e49000, uap=0xfffffe1b8e3afa30) at sys/kern/sys_generic.c:745

(kgdb) frame 12
#12 0xffffffff8067e6f4 in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1292

1292            TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link);

Note:
In order to clearly show the where the panic occured, I wraped
in6_unlink_ifa() in #pragma  clang optimize off/on.
Without this the offending frame (#12) looks like:

#12 0xffffffff8067edba in in6_unlink_ifa (ia=0xfffff819e5dd5200, ifp=<value optimized out>) at fnv_hash.h:29

-----------------------------------------
The thread that removed the address:

Thread 1456 (Thread 101967):
(struct thread *)0xfffff81557641000, tid 101967
   ifconfig :: (struct proc *)0xfffff81557595000, pid 5785
   args: /sbin/ifconfig igb1 inet6 fe80::225:90ff:fec9:a5fd -alias

#2  0xffffffff8078425a in trap (frame=0xfffffe1a5ddf1f30) at sys/amd64/amd64/trap.c:185
#3  0xffffffff80768863 in nmi_calltrap () at sys/amd64/amd64/exception.S:510
#4  0xffffffff80510032 in smp_rendezvous_cpus (map={__bits = 0xfffffe1b8e6ce580}, setup_func=0xffffffff8050fe80 <smp_no_rendevous_barrier>, action_func=<value optimized out>, teardown_func=<value optimized out>, arg=<value optimized out>) at cpufunc.h:339
#5  0xffffffff804b98ae in _rm_wlock (rm=0xffffffff80eeeac0) at sys/kern/kern_rmlock.c:558
#6  0xffffffff804b9b14 in _rm_wlock_debug (rm=0xffffffff80eeeac0, file=0xffffffff80895d8d "sys/netinet6/in6.c", line=1301) at sys/kern/kern_rmlock.c:610
#7  0xffffffff8067e7ae in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1301
#8  0xffffffff8067c30b in in6_control (so=<value optimized out>, cmd=<value optimized out>, data=<value optimized out>, ifp=<value optimized out>, td=<value optimized out>) at sys/netinet6/in6.c:699
#9  0xffffffff805aef80 in ifioctl (so=<value optimized out>, cmd=2166384921, data=0xfffff81557272200 "igb1", td=0xfffff81557641000) at sys/net/if.c:2859
#10 0xffffffff80524ab4 in kern_ioctl (td=<value optimized out>, fd=<value optimized out>, com=<value optimized out>, data=<value optimized out>) at file.h:323
#11 0xffffffff8052476e in sys_ioctl (td=0xfffff81557641000, uap=0xfffffe1b8e6cea30) at sys/kern/sys_generic.c:745

(kgdb) frame 7
#7  0xffffffff8067e7ae in in6_unlink_ifa (ia=0xfffff815570d7400, ifp=0xfffff8012150f800) at sys/netinet6/in6.c:1301
1301            IN6_IFADDR_WLOCK();
Comment 1 Dave Baukus 2018-01-24 23:21:38 UTC
Patching in6_unlink_ifa() with something like below only pushes the the crash to
a double free in in6_leavegroup():

#11 0xffffffff804bf103 in panic (fmt=<value optimized out>) 
#12 0xffffffff8073765e in uma_dbg_free (zone=0xfffff81b7ffce000, ...)
#13 0xffffffff807370d4 in uma_zfree_arg (zone=0xfffff81b7ffce000, ...)
#14 0xffffffff8049a91b in free (addr=0xfffff801eb278c20, mtp=0xffffffff80b5a980)
#15 0xffffffff80684b6f in in6_leavegroup (imm=0xfffff801eb278c20) 
#16 0xffffffff8067e83b in in6_purgeaddr (ifa=0xfffff8015b270600) 
#17 0xffffffff8067c355 in in6_control ()


% diff -du in6.c.orig  in6.c
--- in6.c.orig  2018-01-24 16:15:52.742977158 -0700
+++ in6.c       2018-01-24 16:17:29.140814668 -0700
@@ -1288,8 +1288,16 @@
        int remove_lle;
 
        IF_ADDR_WLOCK(ifp);
-       TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link);
+       TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
+               if (ifa->ifa_addr->sa_family == AF_INET6 &&
+                   (struct in6_ifaddr *)ifa == ia) {
+                       TAILQ_REMOVE(&ifp->if_addrhead, &ia->ia_ifa, ifa_link);
+                       break;
+               }
+       }
        IF_ADDR_WUNLOCK(ifp);
+       if (ifa == NULL)
+               return;
        ifa_free(&ia->ia_ifa);                  /* if_addrhead */
Comment 2 Dave Baukus 2018-01-26 00:25:05 UTC
Admittedly, the test case listed is specious, but the following a bit less so:

while :
do
        setfib 1 arp -d $ADDR &
        setfib 1 ping -c 4 $ADDR &
        setfib 1 arp -d $ADDR &
        /etc/rc.d/netif restart igb1 &
        setfib 1 ping -c 4 $ADDR &
        setfib 1 arp -d $ADDR &

        wait
        sleep 5
done

panic: in6_ifattach_linklocal: ia == NULL, ifp=0xfffff8012150f800

(struct thread *)0xfffff8171e964560, tid 102323
   ifconfig :: (struct proc *)0xfffff8171e298588, pid 48226
   args: /sbin/ifconfig igb1 up

#9  0xffffffff804ffacb in kdb_enter (why=0xffffffff80862eef "panic", msg=<value optimized out>) at cpufunc.h:63
#10 0xffffffff804bf0c3 in vpanic (fmt=<value optimized out>, ap=0xfffffe1b8ef285c0) at sys/kern/kern_shutdown.c:752
#11 0xffffffff804bef16 in kassert_panic (fmt=<value optimized out>) at sys/kern/kern_shutdown.c:649
#12 0xffffffff80682613 in in6_ifattach (ifp=0xfffff8012150f800, altifp=<value optimized out>) at sys/netinet6/in6_ifattach.c:506
#13 0xffffffff805ade5a in if_up (ifp=0xfffff8012150f800) at sys/net/if.c:2155
#14 0xffffffff805af399 in ifioctl (so=<value optimized out>, cmd=<value optimized out>, data=0xfffffe1b8ef28880 "igb1", td=<value optimized out>) at sys/net/if.c:2459
#15 0xffffffff80524ad4 in kern_ioctl (td=<value optimized out>, fd=<value optimized out>, com=<value optimized out>, data=<value optimized out>) at file.h:323
#16 0xffffffff8052478e in sys_ioctl (td=0xfffff8171e964560, uap=0xfffffe1b8ef28a30) at sys/kern/sys_generic.c:745
Comment 3 Ryan Moeller 2019-06-02 01:21:35 UTC
I have a test case for epair that produces this panic on HEAD, up for review here: https://reviews.freebsd.org/D20498
Comment 4 Kubilay Kocak freebsd_committer freebsd_triage 2019-06-27 10:46:45 UTC
Test case was committed by asomers@ in base r349009

For any proposed patches in this issue, please include them as attachments.

diff in comment 1 appears only to be a prototype, not actually resolving the issue (but changes the crash location)
Comment 5 Piotr Pawel Stefaniak freebsd_committer freebsd_triage 2021-10-09 12:24:59 UTC
Was the test case really for this PR and not for PR201466?
Comment 6 Gleb Smirnoff freebsd_committer freebsd_triage 2024-01-10 19:52:22 UTC
I just reproduced a similar race on 15.0-CURRENT. In my case it is race
not between to in6_control()s but rather between in6_control() and DAD
timer. I'll try to look at it.
Comment 7 Gleb Smirnoff freebsd_committer freebsd_triage 2024-01-10 20:10:57 UTC
This particular race of in6_control() vs in6_control() was fixed
by f5a365e51feea75d1e5ebc86c53808d8cae7b6d7.  The fix is in 14.0-RELEASE.

The issue I reproduced today is slightly different and it is not worth
keeping this bug open.
Comment 8 commit-hook freebsd_committer freebsd_triage 2024-01-11 04:53:07 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6ee181b9d5f00cfe337419370ba4590d939cce84

commit 6ee181b9d5f00cfe337419370ba4590d939cce84
Author:     Gleb Smirnoff <glebius@FreeBSD.org>
AuthorDate: 2024-01-11 04:51:53 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2024-01-11 04:51:53 +0000

    tests/net: enable if_clone_test:epair_ipv6_up_stress

    The panic mentioned was fixed in f5a365e51feea75d1e5ebc86c53808d8cae7b6d7.

    PR:     225438

 tests/sys/net/if_clone_test.sh | 1 -
 1 file changed, 1 deletion(-)