I see this panic during stress tests of the kernel. This is the latest one: 0170901 23:54:32 all (113/133): holdcnt02.sh panic: Assertion reclaimable == delta failed at ../../../net/iflib.c:1947 cpuid = 6 time = 1504303247 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe07c7810760 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe07c7810810 vpanic() at vpanic+0x19f/frame 0xfffffe07c7810890 kassert_panic() at kassert_panic+0x139/frame 0xfffffe07c7810900 _task_fn_rx() at _task_fn_rx+0xa3c/frame 0xfffffe07c78109f0 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x119/frame 0xfffffe07c7810a40 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xab/frame 0xfffffe07c7810a70 fork_exit() at fork_exit+0x84/frame 0xfffffe07c7810ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe07c7810ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Details @ https://people.freebsd.org/~pho/stress/log/iflib001.txt Here's a slightly older one, with some more debug info: https://people.freebsd.org/~pho/stress/log/iflib002.txt
This panic is a real show stopper for me, when I run stress tests.
Can you comment out the MPASS while we investigate? delta is only part of INVARIANTS so we might have just drifted the calculation. Otherwise the mp_ring code might have a subtle concurrency issue hinted by sbahra@ that we don't know about yet.
Yes, Index: /usr/src/sys/net/iflib.c =================================================================== --- /usr/src/sys/net/iflib.c (revision 323151) +++ /usr/src/sys/net/iflib.c (working copy) @@ -1944,7 +1944,9 @@ __iflib_fl_refill_lt(if_ctx_t ctx, iflib_fl_t fl, #endif MPASS(fl->ifl_credits <= fl->ifl_size); - MPASS(reclaimable == delta); + if (reclaimable != delta) + printf("reclaimable = %d, not %d. %s\n", reclaimable, delta, + __func__); if (reclaimable > 0) _iflib_fl_refill(ctx, fl, min(max, reclaimable)); This works for me.
I'm still able to get info problems, even with this "fix". After running stress tests I get: reclaimable = 1, not 3. __iflib_fl_refill_lt reclaimable = 1, not 3. __iflib_fl_refill_lt reclaimable = 1, not 3. __iflib_fl_refill_lt reclaimable = 1, not 3. __iflib_fl_refill_lt reclaimable = 1, not 3. __iflib_fl_refill_lt reclaimable = 1, not 3. __iflib_fl_refill_lt reclaimable = 1, not 3. __iflib_fl_refill_lt An "init 1" followed by "exit" does not recover from this mode.
Can you try with this: Index: sys/net/iflib.c =================================================================== --- sys/net/iflib.c (revision 324937) +++ sys/net/iflib.c (working copy) @@ -1931,6 +1931,7 @@ } done: + MPASS(n == i == 0); DBG_COUNTER_INC(rxd_flush); if (fl->ifl_pidx == 0) pidx = fl->ifl_size - 1; It looks like ifl_credits could get out of sync in the error paths here, but I'm not sure you're hitting any of them.
A commit references this bug: Author: shurd Date: Tue Oct 31 17:50:43 UTC 2017 New revision: 325241 URL: https://svnweb.freebsd.org/changeset/base/325241 Log: Fix PR221990 - Assertion at iflib.c:1947 ifl_pidx and ifl_credits are going out of sync in _iflib_fl_refill() as they use different update log. Use the same update logic for both, and add a final call to isc_rxd_refill() to handle early exits from the loop. PR: 221990 Reported by: pho Reviewed by: sbruno Approved by: sbruno (mentor) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12798 Changes: head/sys/net/iflib.c