Bug 241646 - netgraph/ng_bridge: Kernel panic after r353030 (ng_bridge_timeout())
Summary: netgraph/ng_bridge: Kernel panic after r353030 (ng_bridge_timeout())
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Gleb Smirnoff
URL:
Keywords: crash, regression
Depends on:
Blocks:
 
Reported: 2019-11-01 17:42 UTC by Aleksandr Fedorov
Modified: 2019-11-02 18:53 UTC (History)
5 users (show)

See Also:
koobs: maintainer-feedback? (glebius)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksandr Fedorov freebsd_committer freebsd_triage 2019-11-01 17:42:49 UTC
I observe the following kernel panic after r353030:

Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
panic: ng_bridge_timeout: links: 1 != 0
cpuid = 0
time = 1572351202
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe007184d890
vpanic() at vpanic+0x17e/frame 0xfffffe007184d8f0
panic() at panic+0x43/frame 0xfffffe007184d950
ng_bridge_timeout() at ng_bridge_timeout+0x1e1/frame 0xfffffe007184d990
ng_apply_item() at ng_apply_item+0xee/frame 0xfffffe007184da10
ng_snd_item() at ng_snd_item+0x2b0/frame 0xfffffe007184da50
ng_callout_trampoline() at ng_callout_trampoline+0x3f/frame 0xfffffe007184da70
softclock_call_cc() at softclock_call_cc+0x14b/frame 0xfffffe007184db20
softclock() at softclock+0x7c/frame 0xfffffe007184db50
ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe007184dbb0
fork_exit() at fork_exit+0x80/frame 0xfffffe007184dbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe007184dbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 1h6m40s
Dumping 5780 out of 65374 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread ()
    at /afedorov/vstack-develop-freebsd/sys/amd64/include/pcpu_aux.h:55
55      /afedorov/vstack-develop-freebsd/sys/amd64/include/pcpu_aux.h: No such file or directory.
(kgdb) #0  __curthread ()
    at /afedorov/vstack-develop-freebsd/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1)
    at /afedorov/vstack-develop-freebsd/sys/kern/kern_shutdown.c:392
#2  0xffffffff80bc0750 in kern_reboot (howto=260)
    at /afedorov/vstack-develop-freebsd/sys/kern/kern_shutdown.c:479
#3  0xffffffff80bc0ba6 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /afedorov/vstack-develop-freebsd/sys/kern/kern_shutdown.c:908
#4  0xffffffff80bc0903 in panic (fmt=<unavailable>)
    at /afedorov/vstack-develop-freebsd/sys/kern/kern_shutdown.c:835
#5  0xffffffff837481e1 in ng_bridge_timeout (node=0xfffff80023a7ac00, 
    hook=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
    at /afedorov/vstack-develop-freebsd/sys/netgraph/ng_bridge.c:1021
#6  0xffffffff82e6b80e in ng_apply_item (node=0xfffff80023a7ac00, 
    item=0xfffff8024fbf3c00, rw=<unavailable>)
    at /afedorov/vstack-develop-freebsd/sys/netgraph/ng_base.c:2474
#7  0xffffffff82e6b520 in ng_snd_item (item=0xfffff8024fbf3c00, flags=0)
    at /afedorov/vstack-develop-freebsd/sys/netgraph/ng_base.c:2320
#8  0xffffffff82e6d8ef in ng_callout_trampoline (arg=<unavailable>)
    at /afedorov/vstack-develop-freebsd/sys/netgraph/ng_base.c:3774
#9  0xffffffff80bdb1fb in softclock_call_cc (c=<optimized out>, 
    cc=0xffffffff81db4e80 <cc_cpu>, direct=<optimized out>)
    at /afedorov/vstack-develop-freebsd/sys/kern/kern_timeout.c:740
#10 0xffffffff80bdb5ac in softclock (arg=0xffffffff81db4e80 <cc_cpu>)
    at /afedorov/vstack-develop-freebsd/sys/kern/kern_timeout.c:878


Panic occurs at https://svnweb.freebsd.org/base/head/sys/netgraph/ng_bridge.c?revision=353030&view=markup#l1021

The ng_bridge_timeout () function calls NG_NODE_FOREACH_HOOK (node, ng_bridge_unmute, &counter, ret).

But the ng_bridge_unmute () function does not change the value of the 'counter' variable. So, KASSERT () is triggered.

The ng_bridge_unmute () function seems to have bugs.

--------------------------------
static int
ng_bridge_unmute(hook_p hook, void *arg)
{
	link_p link = NG_HOOK_PRIVATE(hook);
	node_p node = NG_HOOK_NODE(hook);
	priv_p priv = NG_NODE_PRIVATE(node);
	int *counter = arg; <<<< Get the int pointer!!!

	if (link->loopCount != 0) {
		link->loopCount--;
		if (link->loopCount == 0 && priv->conf.debugLevel >= 2) {
			log(LOG_INFO, "ng_bridge: %s:"
			    " restoring looped back %s\n",
			    ng_bridge_nodename(node), NG_HOOK_NAME(hook));
		}
	}
	counter++; <<<<< Increase address of local ponter
	return (1);
}
----------------------------------

As a workaround, I used the following patch:

diff --git a/sys/netgraph/ng_bridge.c b/sys/netgraph/ng_bridge.c
index cd649f0db1ce..44cb1330ee35 100644
--- a/sys/netgraph/ng_bridge.c
+++ b/sys/netgraph/ng_bridge.c
@@ -977,7 +977,8 @@ ng_bridge_unmute(hook_p hook, void *arg)
                            ng_bridge_nodename(node), NG_HOOK_NAME(hook));
                }
        }
-       counter++;
+
+       (*counter)++;
        return (1);
 }

But it seems to me that the 'counter' variable is not needed at all, like KASSERT ().
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-11-02 01:59:44 UTC
Thank you for the report, analysis and patch Aleksandr. Could you please include the patch against HEAD/CURRENT as an attachment please
Comment 2 Gleb Smirnoff freebsd_committer freebsd_triage 2019-11-02 03:09:39 UTC
Thanks a lot for submission!
Comment 3 commit-hook freebsd_committer freebsd_triage 2019-11-02 03:10:04 UTC
A commit references this bug:

Author: glebius
Date: Sat Nov  2 03:09:17 UTC 2019
New revision: 354244
URL: https://svnweb.freebsd.org/changeset/base/354244

Log:
  Fix regression from r353026.  Pointer was increased instead of value
  pointed to.

  PR:		241646
  Submitted by:	Aleksandr Fedorov <aleksandr.fedorov itglobal.com>

Changes:
  head/sys/netgraph/ng_bridge.c
Comment 4 Lutz Donnerhacke freebsd_committer freebsd_triage 2019-11-02 18:53:56 UTC
Thank you for fixing this bug.
The real solution is to remove the counter, it's a historic artifact.