# cd /var/crash && kgdb /boot/kernel/kernel vmcore.last GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/net/if_vlan.c:1185 cpuid = 1 time = 1625786527 KDB: stack backtrace: #0 0xffffffff80bfece5 at kdb_backtrace+0x65 #1 0xffffffff80bb46d1 at vpanic+0x181 #2 0xffffffff80bb44a3 at panic+0x43 #3 0xffffffff80cd88de at vlan_transmit+0x1de #4 0xffffffff80cce39b at ether_output_frame+0xab #5 0xffffffff8210c6d8 at ng_apply_item+0xa8 #6 0xffffffff8210c45e at ng_snd_item+0x2ce #7 0xffffffff835247f2 at ng_pppoe_disconnect+0x152 #8 0xffffffff8210ac23 at ng_destroy_hook+0x1e3 #9 0xffffffff8210c729 at ng_apply_item+0xf9 #10 0xffffffff8210c45e at ng_snd_item+0x2ce #11 0xffffffff8210aae9 at ng_destroy_hook+0xa9 #12 0xffffffff8210a8c5 at ng_rmnode+0x65 #13 0xffffffff8210cb03 at ng_apply_item+0x4d3 #14 0xffffffff8210c45e at ng_snd_item+0x2ce #15 0xffffffff82f35dcf at ngc_send+0x1af #16 0xffffffff80c51c47 at sosend_generic+0x5f7 #17 0xffffffff80c51f56 at sosend+0x66 Uptime: 7d5h6m15s Dumping 3807 out of 16148 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80bb42fa in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80bb4740 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80bb44a3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff80cd88de in vlan_transmit (ifp=0xfffff8003742e800, m=0xfffff8004de1a900) at /usr/src/sys/net/if_vlan.c:1185 #6 0xffffffff80cce39b in ether_output_frame (ifp=0xfffff8003742e800, ifp@entry=<error reading variable: value is not available>, m=<unavailable>, m@entry=<error reading variable: value is not available>) at /usr/src/sys/net/if_ethersubr.c:508 #7 0xffffffff8210c6d8 in ng_apply_item (node=<unavailable>, node@entry=0xfffff800423e8700, item=<unavailable>, item@entry=0xfffff8038da1ad80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2407 #8 0xffffffff8210c45e in ng_snd_item (item=item@entry=0xfffff8038da1ad80, flags=flags@entry=0) at /usr/src/sys/netgraph/ng_base.c:2324 #9 0xffffffff835247f2 in ng_pppoe_disconnect (hook=0xfffff80037b6bf00, hook@entry=<error reading variable: value is not available>) at /usr/src/sys/netgraph/ng_pppoe.c:2070 #10 0xffffffff8210ac23 in ng_destroy_hook (hook=0xfffff80037b6bf00, hook@entry=<error reading variable: value is not available>) at /usr/src/sys/netgraph/ng_base.c:1224 #11 0xffffffff8210c729 in ng_apply_item (node=<unavailable>, node@entry=0xfffff80054599d00, item=<unavailable>, item@entry=0xfffff800a06dad80, rw=1) at /usr/src/sys/netgraph/ng_base.c:2478 #12 0xffffffff8210c45e in ng_snd_item (item=0xfffff800a06dad80, flags=0, flags@entry=934723328) at /usr/src/sys/netgraph/ng_base.c:2324 #13 0xffffffff8210e471 in ng_send_fn1 (node=0x0, hook=<optimized out>, hook@entry=0xfffff80037b6bf00, fn=<optimized out>, arg1=<optimized out>, arg1@entry=0x0, arg2=-511, arg2@entry=0, flags=<optimized out>, flags@entry=0) at /usr/src/sys/netgraph/ng_base.c:3730 #14 0xffffffff8210aae9 in ng_send_fn (node=<unavailable>, hook=0xfffff80037b6bf00, fn=<unavailable>, arg1=0x0, arg2=0) at /usr/src/sys/netgraph/ng_base.c:3708 #15 ng_rmhook_self (hook=0xfffff80037b6bf00) at /usr/src/sys/netgraph/ng_base.c:1639 #16 ng_destroy_hook (hook=0xfffff80034c07380) at /usr/src/sys/netgraph/ng_base.c:1199 #17 0xffffffff8210a8c5 in ng_rmnode (node=0xfffff80037e6c500, dummy1=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>) at /usr/src/sys/netgraph/ng_base.c:745 #18 0xffffffff8210cb03 in ng_generic_msg (here=0xfffff80037e6c500, item=0xfffff801210fd480, lasthook=<optimized out>) at /usr/src/sys/netgraph/ng_base.c:2527 #19 ng_apply_item (node=<optimized out>, node@entry=0xfffff80037e6c500, item=<optimized out>, item@entry=0xfffff801210fd480, rw=<optimized out>) at /usr/src/sys/netgraph/ng_base.c:2441 #20 0xffffffff8210c45e in ng_snd_item (item=item@entry=0xfffff801210fd480, flags=flags@entry=0) at /usr/src/sys/netgraph/ng_base.c:2324 #21 0xffffffff82f35dcf in ngc_send (so=<unavailable>, so@entry=<error reading variable: value is not available>, flags=<unavailable>, flags@entry=<error reading variable: value is not available>, m=0xfffff803d5b65400, m@entry=<error reading variable: value is not available>, addr=<unavailable>, addr@entry=<error reading variable: value is not available>, control=<unavailable>, control@entry=<error reading variable: value is not available>, td=<unavailable>, td@entry=<error reading variable: value is not available>) at /usr/src/sys/netgraph/ng_socket.c:345 #22 0xffffffff80c51c47 in sosend_generic (so=0xfffff80042011760, so@entry=<error reading variable: value is not available>, addr=0xfffff80141708f80, addr@entry=<error reading variable: value is not available>, uio=<optimized out>, uio@entry=<error reading variable: value is not available>, top=0xfffff803d5b65400, top@entry=<error reading variable: value is not available>, control=<optimized out>, control@entry=<error reading variable: value is not available>, flags=0, flags@entry=<error reading variable: value is not available>, td=0xfffffe0110c06c80, td@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/uipc_socket.c:1755 #23 0xffffffff80c51f56 in sosend (so=<unavailable>, so@entry=0xfffff80042011760, addr=<unavailable>, uio=<unavailable>, uio@entry=0xfffffe01112de9a8, top=<unavailable>, top@entry=0x0, control=control@entry=0x0, flags=<unavailable>, flags@entry=0, td=0xfffffe0110c06c80) at /usr/src/sys/kern/uipc_socket.c:1805 #24 0xffffffff80c58ceb in kern_sendit (td=<optimized out>, td@entry=0xfffffe0110c06c80, s=5, mp=<optimized out>, mp@entry=0xfffffe01112dea80, flags=0, control=0x0, segflg=segflg@entry=UIO_USERSPACE) at /usr/src/sys/kern/uipc_syscalls.c:798 #25 0xffffffff80c5908b in sendit (td=0xfffffe0110c06c80, td@entry=<unavailable>, s=<unavailable>, mp=mp@entry=0xfffffe01112dea80, flags=<unavailable>) at /usr/src/sys/kern/uipc_syscalls.c:723 #26 0xffffffff80c58e9d in sys_sendto (td=<unavailable>, td@entry=<error reading variable: value is not available>, uap=<unavailable>, uap@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/uipc_syscalls.c:841 #27 0xffffffff8101b59e in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189 #28 amd64_syscall (td=0xfffffe0110c06c80, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1184 #29 <signal handler called> #30 0x00000008008bf6da in ?? () Backtrace stopped: Cannot access memory at address 0x7fffdfffdab8 (kgdb) f 5 #5 0xffffffff80cd88de in vlan_transmit (ifp=0xfffff8003742e800, m=0xfffff8004de1a900) at /usr/src/sys/net/if_vlan.c:1185 1185 NET_EPOCH_ASSERT(); (kgdb) p *ifp $1 = {if_link = {cstqe_next = 0xfffff800378a3800}, if_clones = {le_next = 0x0, le_prev = 0xfffff800378a3808}, if_groups = {cstqh_first = 0xfffff800422da7f0, cstqh_last = 0xfffff800422da7c8}, if_alloctype = 6 '\006', if_numa_domain = 255 '\377', if_softc = 0xfffff800423d1e00, if_llsoftc = 0x0, if_l2com = 0xfffff800423e8700, if_dname = 0xffffffff8158fd60 <vlanname> "vlan", if_dunit = 0, if_index = 5, if_index_reserved = 0, if_xname = "vlan0\000\000\000\000\000\000\000\000\000\000", if_description = 0x0, if_flags = 34947, if_drv_flags = 64, if_capabilities = 524546, if_capenable = 524546, if_linkmib = 0x0, if_linkmiblen = 0, if_refcount = 2, if_type = 135 '\207', if_addrlen = 6 '\006', if_hdrlen = 4 '\004', if_link_state = 2 '\002', if_mtu = 1508, if_metric = 0, if_baudrate = 100000000, if_hwassist = 4119, if_epoch = 18, if_lastchange = {tv_sec = 1625163373, tv_usec = 405364}, if_snd = {ifq_head = 0x0, ifq_tail = 0x0, ifq_len = 0, ifq_maxlen = 50, ifq_mtx = {lock_object = { lo_name = 0xfffff8003742e858 "vlan0", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, ifq_drv_head = 0x0, ifq_drv_tail = 0x0, ifq_drv_len = 0, ifq_drv_maxlen = 0, altq_type = 0, altq_flags = 0, altq_disc = 0x0, altq_ifp = 0xfffff8003742e800, altq_enqueue = 0x0, altq_dequeue = 0x0, altq_request = 0x0, altq_clfier = 0x0, altq_classify = 0x0, altq_tbr = 0x0, altq_cdnr = 0x0}, if_linktask = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 '\000', ta_func = 0xffffffff80cc4270 <do_link_state_change>, ta_context = 0xfffff8003742e800}, if_addmultitask = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 '\000', ta_func = 0xffffffff80cc4580 <if_siocaddmulti>, ta_context = 0xfffff8003742e800}, if_addr_lock = {lock_object = { lo_name = 0xffffffff81102568 "if_addr_lock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, if_addrhead = {cstqh_first = 0xfffff800423e8d00, cstqh_last = 0xfffff800423e8d28}, if_multiaddrs = {cstqh_first = 0x0, cstqh_last = 0xfffff8003742e9d8}, if_amcount = 0, if_addr = 0xfffff800423e8d00, if_hw_addr = 0xfffff800422da7d0, if_broadcastaddr = 0xffffffff8158ee70 <etherbroadcastaddr> "\377\377\377\377\377\377", if_afdata_lock = { lock_object = {lo_name = 0xffffffff81199fe8 "if_afdata", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, if_afdata = {0x0, 0x0, 0xfffff800422d03e0, 0x0 <repeats 25 times>, 0xfffff80037d20dc0, 0x0 <repeats 14 times>}, if_afdata_initialized = 2, if_fib = 0, if_vnet = 0xfffff8000419d600, if_home_vnet = 0xfffff8000419d600, if_vlantrunk = 0x0, if_bpf = 0xfffff800423d1c80, if_pcount = 0, if_bridge = 0x0, if_lagg = 0x0, if_pf_kif = 0x0, if_carp = 0x0, if_label = 0x0, if_netmap = 0x0, if_output = 0xffffffff80cd9a00 <vlan_output>, if_input = 0xffffffff80cce980 <ether_input>, if_bridge_input = 0x0, if_bridge_output = 0x0, if_bridge_linkstate = 0x0, if_start = 0x0, if_ioctl = 0xffffffff80cd8930 <vlan_ioctl>, if_init = 0xffffffff80cd86f0 <vlan_init>, if_resolvemulti = 0xffffffff80cceb20 <ether_resolvemulti>, if_qflush = 0xffffffff80cd8920 <vlan_qflush>, if_transmit = 0xffffffff80cd8700 <vlan_transmit>, if_reassign = 0xffffffff80cced00 <ether_reassign>, if_get_counter = 0xffffffff80cc46a0 <if_get_counter_default>, if_requestencap = 0xffffffff80ccec30 <ether_requestencap>, if_counters = {0xfffffe018dcf6228, 0xfffffe018dcf6220, 0xfffffe018dcf6218, 0xfffffe018dcf6210, 0xfffffe018dcf6208, 0xfffffe018dcf6200, 0xfffffe018dcf61f8, 0xfffffe018dcf61f0, 0xfffffe018dcf61e8, 0xfffffe018dcf61e0, 0xfffffe018dcf61d8, 0xfffffe018dcf61d0}, if_hw_tsomax = 65518, if_hw_tsomaxsegcount = 35, if_hw_tsomaxsegsize = 2048, if_snd_tag_alloc = 0xffffffff80cd8ff0 <vlan_snd_tag_alloc>, if_snd_tag_modify = 0xffffffff80cd90f0 <vlan_snd_tag_modify>, if_snd_tag_query = 0xffffffff80cd9110 <vlan_snd_tag_query>, if_snd_tag_free = 0xffffffff80cd9130 <vlan_snd_tag_free>, if_next_snd_tag = 0xffffffff80cd9180 <vlan_next_snd_tag>, if_ratelimit_query = 0xffffffff80cd9190 <vlan_ratelimit_query>, if_ratelimit_setup = 0x0, if_pcp = 0 '\000', if_debugnet_methods = 0x0, if_epoch_ctx = {data = {0x0, 0x0}}, if_ispare = {0, 0, 0, 0}} (kgdb) This host runs FreeBSD stable/13-n246143-6aee7855180-dirty from a week and a half ago, GENERIC but with INVARIANTS enabled. Interface alc0 has a VLAN interface vlan0 over which mpd5 runs PPPoE to my ISP, which had an outage, thus a LCP echo request timeout occurred; the panic occurred as mpd was in the process of tearing down its ng0 interface. Its last log message was: Jul 9 01:22:01 server mpd[997]: [P] LCP: LayerDown
I think this is a bug in the ng_ppoe(4) code. Disconnecting hooks are called outside of NET_EPOCH, but ng_pppoe_disconnect() calls NG_SEND_DATA_ONLY() which should be called in NET_EPOCH. Please try the following patch: diff --git a/sys/netgraph/ng_pppoe.c b/sys/netgraph/ng_pppoe.c index 295a136cc55..e07f77b9d54 100644 --- a/sys/netgraph/ng_pppoe.c +++ b/sys/netgraph/ng_pppoe.c @@ -2037,6 +2037,7 @@ ng_pppoe_disconnect(hook_p hook) log(LOG_NOTICE, "ng_pppoe[%x]: session out of " "mbufs\n", node->nd_ID); else { + struct epoch_tracker et; struct pppoe_full_hdr *wh; struct pppoe_tag *tag; int msglen = strlen(SIGNOFF); @@ -2067,8 +2068,11 @@ ng_pppoe_disconnect(hook_p hook) m->m_pkthdr.len = m->m_len = sizeof(*wh) + sizeof(*tag) + msglen; wh->ph.length = htons(sizeof(*tag) + msglen); + + NET_EPOCH_ENTER(et); NG_SEND_DATA_ONLY(error, privp->ethernet_hook, m); + NET_EPOCH_EXIT(et); } } if (sp->state == PPPOE_LISTENING)
Test methodology: Unplug cable carrying the PPPoE session from the NTU, plug into a switch so link stays up, wait. Observed behaviour pre patch: same panic as ng0 gets torn down Observed behaviour with above patch applied: normal operation continues after mpd tears down the link, and the PPPoE link comes back up as soon as the cable is plugged back into the NTU. Success! Thanks, Alexandr Fedorov!
Apologies for the typo, Aleksandr
(In reply to Aleksandr Fedorov from comment #1) There are several other places where ng_pppoe sends data on the ethernet hook without entering an epoch section - I think they all need to be updated?
(In reply to Mark Johnston from comment #4) I think the previous fix closed all similar problems: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248958 and https://reviews.freebsd.org/D26226 I have not yet found other problem areas in the ng_pppoe(4) code. The main idea behind introducing NET_EPOCH in netgraph (4) is: - Data processing (NG_SEND_DATA_ *) should always be in the NET_EPOCH section. - Processing of service messages (NG_SEND_MSG_ *) should not be in the NET_EPOCH section. Because message handling (* _rcvmsg, connect/disconnect hooks, node constructor/destructor) allows sleeping and other forbidden operations for NET_EPOCH section. But on many nodes, some data processing routines generate messages, and message processing routines generate data (as in this PR). Unfortunately, there are cases that are not so easy to fix without reworking ng_base.c.
(In reply to Aleksandr Fedorov from comment #5) Is this the same issue as bug 248958? If so, I note the changed weren't tagged for MFC. Should those changes be merged to stable/* branches? If so, please re-open bug 248958, set mfc-stable{13,12} flags to "?" there, and close this issue as a duplicate of it Thanks
It's not the same bug. It manifests in a completely different file. At most it's the same class of bug introduced with epoch(9).
Same file but a different line, apologies again
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b27e6e91d0ad1f87b296f7583d4f5d938d7a997c commit b27e6e91d0ad1f87b296f7583d4f5d938d7a997c Author: Aleksandr Fedorov <afedorov@FreeBSD.org> AuthorDate: 2022-02-09 19:00:50 +0000 Commit: Aleksandr Fedorov <afedorov@FreeBSD.org> CommitDate: 2022-02-09 19:00:50 +0000 ng pppoe(4): Add the required NET_EPOCH section to the hook disconnection function. Disconnecting hooks are called outside of NET_EPOCH, but ng_pppoe_disconnect() calls NG_SEND_DATA_ONLY() which should be called in NET_EPOCH. PR: 257067 Reported by: niels=freebsd@bakker.net Reviewed by: vmaffione (mentor), glebius, donner Approved by: vmaffione (mentor), glebius, donner Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D34185 sys/netgraph/ng_pppoe.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=85cd9f7e989bab02aee32a5f25d517567a0cc928 commit 85cd9f7e989bab02aee32a5f25d517567a0cc928 Author: Aleksandr Fedorov <afedorov@FreeBSD.org> AuthorDate: 2022-02-09 19:00:50 +0000 Commit: Aleksandr Fedorov <afedorov@FreeBSD.org> CommitDate: 2022-02-13 12:05:45 +0000 ng pppoe(4): Add the required NET_EPOCH section to the hook disconnection function. Disconnecting hooks are called outside of NET_EPOCH, but ng_pppoe_disconnect() calls NG_SEND_DATA_ONLY() which should be called in NET_EPOCH. PR: 257067 Reported by: niels=freebsd@bakker.net Reviewed by: vmaffione (mentor), glebius, donner Approved by: vmaffione (mentor), glebius, donner Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D34185 (cherry picked from commit b27e6e91d0ad1f87b296f7583d4f5d938d7a997c) sys/netgraph/ng_pppoe.c | 4 ++++ 1 file changed, 4 insertions(+)
stable/12 doesn't support NET_EPOCH, so there is no need to do MFC.