Bug 271101 - cxgbe(4): panic due to lock recursion while creating tracing interface
Summary: cxgbe(4): panic due to lock recursion while creating tracing interface
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-04-27 15:58 UTC by Navdeep Parhar
Modified: 2024-01-03 07:09 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Navdeep Parhar freebsd_committer freebsd_triage 2023-04-27 15:58:53 UTC
cxgbe registers an ifnet cloner with the kernel so that a pseudo ifnet can be created for use with tracing filters.  This code seems to panic after the recent netlink changes.  These are the steps to reproduce (T6 or T5):

# ifconfig cc0 up
# ifconfig t6nex0 create

# ifconfig cxl0 up
# ifconfig t5nex0 create

The reason for the panic is that the new code calls the driver ioctl (tracer_ioctl) during the creation of the interface (ether_ifattach).  This pseudo-ifnet driver has always used the same lock in both routines.

#2  0xffffffff810e7a29 in vpanic (fmt=0xffffffff81b56047 "_sx_xlock_hard: recursed on non-recursive sx %s @ %s:%d\n", ap=0xfffffe0094b2e3b0) at /root/ws/t7/sys/kern/kern_shutdown.c:972
#3  0xffffffff810e726e in panic (fmt=0xffffffff81b56047 "_sx_xlock_hard: recursed on non-recursive sx %s @ %s:%d\n") at /root/ws/t7/sys/kern/kern_shutdown.c:896
#4  0xffffffff810f8b81 in _sx_xlock_hard (sx=0xffffffff833c9698 <t4_trace_lock>, x=18446735277694109504, opts=0, file=0xffffffff833a809d "/root/ws/t7/sys/dev/cxgbe/t4_tracer.c", line=487) at /root/ws/t7/sys/kern/kern_sx.c:628
#5  0xffffffff810f884b in _sx_xlock (sx=0xffffffff833c9698 <t4_trace_lock>, opts=0, file=0xffffffff833a809d "/root/ws/t7/sys/dev/cxgbe/t4_tracer.c", line=487) at /root/ws/t7/sys/kern/kern_sx.c:332
#6  0xffffffff8338fde0 in tracer_ioctl (ifp=0xfffff80001935800, cmd=3224398136, data=0xfffffe0094b2e698 "") at /root/ws/t7/sys/dev/cxgbe/t4_tracer.c:487
#7  0xffffffff814a20e7 in get_operstate_ether (ifp=0xfffff80001935800, pstate=0xfffffe0094b2e728) at /root/ws/t7/sys/netlink/route/iface.c:125
#8  0xffffffff814a1a2b in get_operstate (ifp=0xfffff80001935800, pstate=0xfffffe0094b2e728) at /root/ws/t7/sys/netlink/route/iface.c:182
#9  0xffffffff814a166d in dump_iface (nw=0xfffffe0094b2e770, ifp=0xfffff80001935800, hdr=0xfffffe0094b2e7b0, if_flags_mask=0) at /root/ws/t7/sys/netlink/route/iface.c:269
#10 0xffffffff814a0f5d in rtnl_handle_ifevent (ifp=0xfffff80001935800, nlmsg_type=16, if_flags_mask=0) at /root/ws/t7/sys/netlink/route/iface.c:952
#11 0xffffffff814a1212 in rtnl_handle_ifattach (arg=0x0, ifp=0xfffff80001935800) at /root/ws/t7/sys/netlink/route/iface.c:960
#12 0xffffffff812a6def in if_attach_internal (ifp=0xfffff80001935800, vmove=false) at /root/ws/t7/sys/net/if.c:958
#13 0xffffffff812a67b9 in if_attach (ifp=0xfffff80001935800) at /root/ws/t7/sys/net/if.c:773
#14 0xffffffff812b9a0d in ether_ifattach (ifp=0xfffff80001935800, lla=0xfffffe0094b2e908 "") at /root/ws/t7/sys/net/if_ethersubr.c:1002
#15 0xffffffff8338ef92 in t4_cloner_create (ifc=0xfffff800014a2600, name=0xfffffe0094b2ea50 "t6nex0", len=16, params=0x0) at /root/ws/t7/sys/dev/cxgbe/t4_tracer.c:196
#16 0xffffffff812b6f0a in ifc_advanced_create_wrapper (ifc=0xfffff800014a2600, name=0xfffffe0094b2ea50 "t6nex0", maxlen=16, ifc_data=0xfffffe0094b2eaa0, ifpp=0xfffffe0094b2ea20) at /root/ws/t7/sys/net/if_clone.c:468
#17 0xffffffff812b5894 in if_clone_createif (ifc=0xfffff800014a2600, name=0xfffffe0094b2ea50 "t6nex0", len=16, ifd=0xfffffe0094b2eaa0, ifpp=0xfffffe0094b2ea20) at /root/ws/t7/sys/net/if_clone.c:323
#18 0xffffffff812b56e7 in ifc_create_ifp (name=0xfffffe0094b2ed98 "t6nex0", ifd=0xfffffe0094b2eaa0, ifpp=0xfffffe0094b2ea98) at /root/ws/t7/sys/net/if_clone.c:205
#19 0xffffffff812b5983 in if_clone_create (name=0xfffffe0094b2ed98 "t6nex0", len=16, params=0x0) at /root/ws/t7/sys/net/if_clone.c:218
#20 0xffffffff812ae6ae in ifioctl (so=0xfffff800040dc000, cmd=3223349628, data=0xfffffe0094b2ed98 "t6nex0", td=0xfffff800049fc740) at /root/ws/t7/sys/net/if.c:3100
Comment 1 Alexander V. Chernikov freebsd_committer freebsd_triage 2023-04-27 16:45:16 UTC
Thank you for sharing the report!
That was not certainly the desired outcome.

The feature of Netlink in question is providing the "link" operstate per RFC 2863.
Generally, Netlink is going to fetch more interface data (like min/max MTU, LRO/GSO, other capabilities etc) which may trigger either driver ioctl or event some new nl_ioctl - that's something not polished yet.

Speaking of this particular situation - I understand that calling driver ioctl from within the event handler is something never done before (and there was not a lot of users of ioctl interface within the kernel either).

I see the following options as the path forward:

1) Do not ask for operstate on init. It can serve as a workaround, but may introduce silent "state miss" for other interfaces that are oper-up from the beginning. Additionally, the similar task needs to be done for the other future queries

2) Cut the event handler callchain in half (e.g. store the arrival/departure/events, reference ifp and call it from the dedicated taskq). It can do the trick, but may introduce more events reordering with other ifaddr/ifstate/route related events. The worst part (to me) is that such solution sort of says that's something wrong with the event callpoints.

3) Update the driver (either by using recursive sx or existing sx prior to calling ether_ifattach().

Personally, in my mental model ether_ifattach() call serves as the notification like "hey, this interface is ready to be used as any other interface" to the rest of the system. I actually lean to option (3) for exactly this reason. I'm happy to change something inside the Netlink code so we don't run into issues with netlink<>driver interaction like this, but I'm not sure I see a good solution on Netlink part.

What do you think?
Comment 2 Navdeep Parhar freebsd_committer freebsd_triage 2023-04-27 17:06:19 UTC
I tried to see if making the driver sx recursive would fix the problem but ran into a different assertion in if_clone.c


diff --git a/sys/dev/cxgbe/t4_tracer.c b/sys/dev/cxgbe/t4_tracer.c
index 00c403b59bba..25316eec77f8 100644
--- a/sys/dev/cxgbe/t4_tracer.c
+++ b/sys/dev/cxgbe/t4_tracer.c
@@ -233,7 +233,7 @@ void
 t4_tracer_modload(void)
 {
 
-       sx_init(&t4_trace_lock, "T4/T5 tracer lock");
+       sx_init_flags(&t4_trace_lock, "T4/T5 tracer lock", SX_RECURSE);
        t4_cloner = if_clone_advanced(t4_cloner_name, 0, t4_cloner_match,
            t4_cloner_create, t4_cloner_destroy);
 }


(kgdb) p panicstr
$1 = 0xffffffff8219e380 <vpanic[buf]> "Assertion ifd->ifp != NULL failed at /root/ws/dev/sys/net/if_clone.c:415"
(kgdb) bt
#0  0xffffffff80fdb06b in doadump (textdump=1) at /root/ws/dev/sys/kern/kern_shutdown.c:407
#1  0xffffffff80fdaca9 in kern_reboot (howto=260) at /root/ws/dev/sys/kern/kern_shutdown.c:528
#2  0xffffffff80fdb7c9 in vpanic (fmt=0xffffffff819ba49d "Assertion %s failed at %s:%d", ap=0xfffffe0088fc8970) at /root/ws/dev/sys/kern/kern_shutdown.c:972
#3  0xffffffff80fdb00e in panic (fmt=0xffffffff819ba49d "Assertion %s failed at %s:%d") at /root/ws/dev/sys/kern/kern_shutdown.c:896
#4  0xffffffff811a96e0 in if_clone_createif_nl (ifc=0xfffff80001c72300, ifname=0xfffffe0088fc8d98 "t6nex0", ifd=0xfffffe0088fc8a10) at /root/ws/dev/sys/net/if_clone.c:415
#5  0xffffffff811a949f in ifc_create_ifp (name=0xfffffe0088fc8d98 "t6nex0", ifd=0xfffffe0088fc8aa0, ifpp=0xfffffe0088fc8a98) at /root/ws/dev/sys/net/if_clone.c:215
#6  0xffffffff811a9803 in if_clone_create (name=0xfffffe0088fc8d98 "t6nex0", len=16, params=0x0) at /root/ws/dev/sys/net/if_clone.c:243
#7  0xffffffff811a235e in ifioctl (so=0xfffff80009186780, cmd=3223349628, data=0xfffffe0088fc8d98 "t6nex0", td=0xfffff80009db9740) at /root/ws/dev/sys/net/if.c:3048
#8  0xffffffff810a0f53 in soo_ioctl (fp=0xfffff80001f512d0, cmd=3223349628, data=0xfffffe0088fc8d98, active_cred=0xfffff8000198e100, td=0xfffff80009db9740) at /root/ws/dev/sys/kern/sys_socket.c:267
#9  0xffffffff8109234e in fo_ioctl (fp=0xfffff80001f512d0, com=3223349628, data=0xfffffe0088fc8d98, active_cred=0xfffff8000198e100, td=0xfffff80009db9740) at /root/ws/dev/sys/sys/file.h:367
#10 0xffffffff81092124 in kern_ioctl (td=0xfffff80009db9740, fd=3, com=3223349628, data=0xfffffe0088fc8d98 "t6nex0") at /root/ws/dev/sys/kern/sys_generic.c:807
#11 0xffffffff81091d81 in sys_ioctl (td=0xfffff80009db9740, uap=0xfffff80009db9b38) at /root/ws/dev/sys/kern/sys_generic.c:715
#12 0xffffffff817a68f9 in syscallenter (td=0xfffff80009db9740) at /root/ws/dev/sys/amd64/amd64/../../kern/subr_syscall.c:190
#13 0xffffffff817a603b in amd64_syscall (td=0xfffff80009db9740, traced=0) at /root/ws/dev/sys/amd64/amd64/trap.c:1199
#14 <signal handler called>
#15 0x00001b45efb6750a in ?? ()
Backtrace stopped: Cannot access memory at address 0x1b45eba4f018
(kgdb)
Comment 3 Alexander V. Chernikov freebsd_committer freebsd_triage 2023-04-28 09:33:12 UTC
(In reply to Navdeep Parhar from comment #2)
Ack. The reason is that t4_tracer is using older 13.x cloner KPI. The new KPI requires the creation function to provide pointer to the created ifnet and 13.x wrapper tries to guess it by using the requested interface name.
As t4_tracer uses different name ('if_initname(ifp, name, ifd->unit);') and does not update the provided name pointer, the search fails, resulting in the panic later on.

I've created https://reviews.freebsd.org/D39865 to switch t4_tracer to the new KPI.
Afer the conversion, I'm going to delete if_clone_advanced() functions in HEAD, as t4_tracer is the only remaining customer.
Comment 4 Navdeep Parhar freebsd_committer freebsd_triage 2024-01-03 07:09:50 UTC
This was fixed in e203cb393fe0 some time ago.

commit e203cb393fe0
Author: Navdeep Parhar <np@FreeBSD.org>
Date:   Sat Sep 9 12:39:15 2023 -0700

    cxgbe(4): Fix tracing with netlink enabled kernels.
    
    1. The tracing ifnet's name must match the nexus name.  One way to do
       this is to not use a unit number.
    
    2. Do not hold the tracer lock around ether_ifattach or ether_ifdetach.
       netlink calls back into the driver (see stack below) and that leads
       to illegal lock recursion.
    
       tracer_ioctl
       if_ioctl
       get_operstate_ether
       get_operstate
       dump_iface
       rtnl_handle_ifevent
       rtnl_handle_ifattach
       if_attach_internal
       if_attach
       ether_ifattach
       t4_cloner_create
    
    MFC after:      3 days
    Sponsored by:   Chelsio Communications