Bug 221990 - panic: Assertion reclaimable == delta failed at ../../../net/iflib.c:1947
Summary: panic: Assertion reclaimable == delta failed at ../../../net/iflib.c:1947
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Stephen Hurd
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2017-09-02 06:58 UTC by Peter Holm
Modified: 2017-10-31 17:51 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Holm freebsd_committer freebsd_triage 2017-09-02 06:58:52 UTC
I see this panic during stress tests of the kernel. This is the latest one:

 0170901 23:54:32 all (113/133): holdcnt02.sh
panic: Assertion reclaimable == delta failed at ../../../net/iflib.c:1947
cpuid = 6
time = 1504303247
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe07c7810760
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe07c7810810
vpanic() at vpanic+0x19f/frame 0xfffffe07c7810890
kassert_panic() at kassert_panic+0x139/frame 0xfffffe07c7810900
_task_fn_rx() at _task_fn_rx+0xa3c/frame 0xfffffe07c78109f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x119/frame 0xfffffe07c7810a40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xab/frame 0xfffffe07c7810a70
fork_exit() at fork_exit+0x84/frame 0xfffffe07c7810ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe07c7810ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Details @ https://people.freebsd.org/~pho/stress/log/iflib001.txt

Here's a slightly older one, with some more debug info:
https://people.freebsd.org/~pho/stress/log/iflib002.txt
Comment 1 Peter Holm freebsd_committer freebsd_triage 2017-10-02 19:11:46 UTC
This panic is a real show stopper for me, when I run stress tests.
Comment 2 Kevin Bowling freebsd_committer freebsd_triage 2017-10-09 22:52:29 UTC
Can you comment out the MPASS while we investigate?  delta is only part of INVARIANTS so we might have just drifted the calculation.  Otherwise the mp_ring code might have a subtle concurrency issue hinted by sbahra@ that we don't know about yet.
Comment 3 Peter Holm freebsd_committer freebsd_triage 2017-10-10 10:45:13 UTC
Yes,

Index: /usr/src/sys/net/iflib.c
===================================================================
--- /usr/src/sys/net/iflib.c    (revision 323151)
+++ /usr/src/sys/net/iflib.c    (working copy)
@@ -1944,7 +1944,9 @@ __iflib_fl_refill_lt(if_ctx_t ctx, iflib_fl_t fl,
 #endif
 
        MPASS(fl->ifl_credits <= fl->ifl_size);
-       MPASS(reclaimable == delta);
+       if (reclaimable != delta)
+               printf("reclaimable = %d, not %d. %s\n", reclaimable, delta,
+                   __func__);
 
        if (reclaimable > 0)
                _iflib_fl_refill(ctx, fl, min(max, reclaimable));

This works for me.
Comment 4 Peter Holm freebsd_committer freebsd_triage 2017-10-20 04:04:33 UTC
I'm still able to get info problems, even with this "fix".

After running stress tests I get:

reclaimable = 1, not 3. __iflib_fl_refill_lt
reclaimable = 1, not 3. __iflib_fl_refill_lt
reclaimable = 1, not 3. __iflib_fl_refill_lt
reclaimable = 1, not 3. __iflib_fl_refill_lt
reclaimable = 1, not 3. __iflib_fl_refill_lt
reclaimable = 1, not 3. __iflib_fl_refill_lt
reclaimable = 1, not 3. __iflib_fl_refill_lt

An "init 1" followed by "exit" does not recover from this mode.
Comment 5 Stephen Hurd freebsd_committer freebsd_triage 2017-10-26 17:48:59 UTC
Can you try with this:

Index: sys/net/iflib.c
===================================================================
--- sys/net/iflib.c	(revision 324937)
+++ sys/net/iflib.c	(working copy)
@@ -1931,6 +1931,7 @@
 
 	}
 done:
+	MPASS(n == i == 0);
 	DBG_COUNTER_INC(rxd_flush);
 	if (fl->ifl_pidx == 0)
 		pidx = fl->ifl_size - 1;

It looks like ifl_credits could get out of sync in the error paths here, but I'm not sure you're hitting any of them.
Comment 6 commit-hook freebsd_committer freebsd_triage 2017-10-31 17:51:35 UTC
A commit references this bug:

Author: shurd
Date: Tue Oct 31 17:50:43 UTC 2017
New revision: 325241
URL: https://svnweb.freebsd.org/changeset/base/325241

Log:
  Fix PR221990 - Assertion at iflib.c:1947

  ifl_pidx and ifl_credits are going out of sync in _iflib_fl_refill() as they
  use different update log.  Use the same update logic for both, and add a
  final call to isc_rxd_refill() to handle early exits from the loop.

  PR:		221990
  Reported by:	pho
  Reviewed by:	sbruno
  Approved by:	sbruno (mentor)
  Sponsored by:	Limelight Networks
  Differential Revision:	https://reviews.freebsd.org/D12798

Changes:
  head/sys/net/iflib.c