Bug 243126 - iflib: Assertion fl->ifl_cidx == cidx failed at /usr/src/sys/net/iflib.c:2531 with if_vmx(4)
Summary: iflib: Assertion fl->ifl_cidx == cidx failed at /usr/src/sys/net/iflib.c:2531...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net mailing list
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2020-01-06 11:11 UTC by Andriy Gapon
Modified: 2020-02-05 18:42 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andriy Gapon freebsd_committer 2020-01-06 11:11:11 UTC
This happened with FreeBSD head r351653 and if_em interface in a VMware guest.

I have a crash dump.
This is the stack trace:
#11 kdb_enter (why=0xffffffff80bdbcb7 "panic", msg=0xffffffff80bdbcb7 "panic") at /usr/src/sys/kern/subr_kdb.c:483
#12 0xffffffff8077cff1 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:987
#13 0xffffffff8077c2cd in kassert_panic (fmt=0xffffffff80b4f01d "Assertion %s failed at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:879
#14 0xffffffff808b721e in rxd_frag_to_sd (rxq=0xfffffe00003fe000, irf=<optimized out>, unload=<error reading variable: Cannot access memory at address 0x0>, sd=0xfffffe0011a54900, pf_rv=0xfffffe0011a549b0, ri=0xfffffe0011a54960)
    at /usr/src/sys/net/iflib.c:2531
#15 0xffffffff808b10e4 in assemble_segments (rxq=<optimized out>, ri=0x1dd05ea00000000, sd=0xfffff80002d1b470, pf_rv=0x100000000) at /usr/src/sys/net/iflib.c:2594
#16 iflib_rxd_pkt_get (rxq=<optimized out>, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2671
#17 iflib_rxeof (rxq=<optimized out>, budget=16) at /usr/src/sys/net/iflib.c:2803
#18 _task_fn_rx (context=<optimized out>) at /usr/src/sys/net/iflib.c:3795
#19 0xffffffff807c72a9 in gtaskqueue_run_locked (queue=0xfffff80002922600) at /usr/src/sys/kern/subr_gtaskqueue.c:377
#20 0xffffffff807c7028 in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:558
#21 0xffffffff8073ce34 in fork_exit (callout=0xffffffff807c6fa0 <gtaskqueue_thread_loop>, arg=0xfffffe00007f9008, frame=0xfffffe0011a54ac0) at /usr/src/sys/kern/kern_fork.c:1048

I can provide any additional information.
Also, I can do some debugging myself, but I need some help with that as I don't know iflib code at all.
Comment 1 Andriy Gapon freebsd_committer 2020-01-07 11:40:44 UTC
Correction, the panic happened with vmxnet3 network driver.
The VM was later switched to em as a workaround and that got me confused.

Some data from the crash:
(kgdb) fr 14
#14 0xffffffff808b721e in rxd_frag_to_sd (rxq=0xfffffe00003fe000, irf=<optimized out>, unload=<error reading variable: Cannot access memory at address 0x0>, sd=0xfffffe0011a54900, pf_rv=0xfffffe0011a549b0, ri=0xfffffe0011a54960)
    at /usr/src/sys/net/iflib.c:2531
2531    /usr/src/sys/net/iflib.c: No such file or directory.
(kgdb) p cidx
$1 = 142

(kgdb) p *rxq
$3 = {ifr_ctx = 0xfffff80002d22400, ifr_fl = 0xfffff80002d1a000, ifr_rx_irq = 0, pfil = 0xfffff8000436fb80, ifr_cq_cidx = 477, ifr_id = 0, ifr_nfl = 2 '\002', ifr_ntxqirq = 1 '\001', ifr_txqid = "\000\000\000",
  ifr_fl_offset = 1 '\001', ifr_lc = {ifp = 0xfffff80002d1a800, lro_mbuf_data = 0xfffffe00da9c4000, lro_queued = 1753531, lro_flushed = 339087, lro_bad_csum = 0, lro_cnt = 8, lro_mbuf_count = 0, lro_mbuf_max = 256,
    lro_ackcnt_lim = 65535, lro_length_lim = 65535, lro_hashsz = 251, lro_hash = 0xfffff80002fa5000, lro_active = {lh_first = 0x0}, lro_free = {lh_first = 0xfffffe00da9c5360}}, ifr_task = {gt_task = {ta_link = {stqe_next = 0x0},
      ta_flags = 2, ta_priority = 0, ta_func = 0xffffffff808b0bd0 <_task_fn_rx>, ta_context = 0xfffffe00003fe000}, gt_taskqueue = 0xfffff80002922600, gt_list = {le_next = 0x0, le_prev = 0xfffffe00117e98a8},
    gt_uniq = 0xfffffe00003fe000, gt_name = "rxq0", '\000' <repeats 27 times>, gt_dev = 0xfffff80002d3b000, gt_irq = 0xfffff80002d17900, gt_cpu = 0}, ifr_filter_info = {ifi_filter = 0xffffffff80af8510 <vmxnet3_rxq_intr>,
    ifi_filter_arg = 0xfffff80002fab800, ifi_task = 0xfffffe00003fe090, ifi_ctx = 0xfffffe00003fe000}, ifr_ifdi = 0xfffff80002d17d80, ifr_frags = {{irf_flid = 0 '\000', irf_idx = 142, irf_len = 1514}, {irf_flid = 1 '\001',
      irf_idx = 76, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 77, irf_len = 1762}, {irf_flid = 1 '\001', irf_idx = 53, irf_len = 1038}, {irf_flid = 1 '\001', irf_idx = 114, irf_len = 2048}, {irf_flid = 1 '\001',
      irf_idx = 115, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 116, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 117, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 118, irf_len = 2048}, {irf_flid = 1 '\001',
      irf_idx = 119, irf_len = 1906}, {irf_flid = 1 '\001', irf_idx = 184, irf_len = 1306}, {irf_flid = 1 '\001', irf_idx = 17, irf_len = 706}, {irf_flid = 1 '\001', irf_idx = 3, irf_len = 2048}, {irf_flid = 1 '\001',
      irf_idx = 4, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 5, irf_len = 2048}, {irf_flid = 1 '\001', irf_idx = 6, irf_len = 1202}, {irf_flid = 0 '\000', irf_idx = 0, irf_len = 0} <repeats 48 times>}}

(kgdb) p *ri
$4 = {iri_qsidx = 0, iri_vtag = 0, iri_len = 1514, iri_cidx = 477, iri_ifp = 0xfffff80002d1a800, iri_frags = 0xfffffe00003fe140, iri_flowid = 600473664, iri_csum_flags = 251658240, iri_csum_data = 65535, iri_flags = 0 '\000',
  iri_nfrags = 1 '\001', iri_rsstype = 130 '\202', iri_pad = 0 '\000'}


(kgdb) fr 17
#17 iflib_rxeof (rxq=<optimized out>, budget=16) at /usr/src/sys/net/iflib.c:2803
2803    in /usr/src/sys/net/iflib.c
(kgdb) i loc
ctx = <optimized out>
scctx = <optimized out>
lro_possible = <error reading variable lro_possible (Cannot access memory at address 0x0)>
v4_forwarding = <error reading variable v4_forwarding (Cannot access memory at address 0x0)>
v6_forwarding = <error reading variable v6_forwarding (Cannot access memory at address 0x0)>
sctx = 0xffffffff810e7780 <vmxnet3_sctx_init>
rx_pkts = 1
rx_bytes = 1514
mh = 0x0
mt = 0x0
ifp = 0xfffff80002d1a800
cidxp = 0xfffffe00003fe020
avail = 17
i = <error reading variable i (Cannot access memory at address 0x0)>
fl = <optimized out>
m = 0x0
budget_left = 16
ri = <optimized out>
err = <optimized out>
mf = <optimized out>
lro_enabled = <optimized out>

(kgdb) p *cidxp
$1 = 477

(kgdb) p *$5.ifc_sctx
$7 = {isc_magic = 3405705229, isc_driver = 0xffffffff810e7900 <vmxnet3_iflib_driver>, isc_q_align = 512, isc_tx_maxsize = 65536, isc_tx_maxsegsize = 16383, isc_tso_maxsize = 65550, isc_tso_maxsegsize = 16383,
  isc_rx_maxsize = 16383, isc_rx_maxsegsize = 16383, isc_rx_nsegments = 1, isc_admin_intrcnt = 1, isc_vendor_info = 0xffffffff810e7930 <vmxnet3_vendor_info_array>, isc_driver_version = 0xffffffff80ba63e4 "2",
  isc_parse_devinfo = 0x0, isc_nrxd_min = {32, 32, 32, 0, 0, 0, 0, 0}, isc_nrxd_default = {256, 256, 256, 0, 0, 0, 0, 0}, isc_nrxd_max = {2048, 2048, 2048, 0, 0, 0, 0, 0}, isc_ntxd_min = {32, 32, 0, 0, 0, 0, 0, 0},
  isc_ntxd_default = {512, 512, 0, 0, 0, 0, 0, 0}, isc_ntxd_max = {4096, 4096, 0, 0, 0, 0, 0, 0}, isc_nfl = 2, isc_ntxqs = 2, isc_nrxqs = 3, __spare0__ = 0, isc_tx_reclaim_thresh = 0, isc_flags = 9, isc_name = 0x0}

(kgdb) p *rxq->ifr_ctx
$5 = {ops = 0xfffff80002d18000, ifc_softc = 0xfffff80002d22000, ifc_dev = 0xfffff80002d3b000, ifc_ifp = 0xfffff80002d1a800, ifc_cpus = {__bits = {255, 0, 0, 0}}, ifc_sctx = 0xffffffff810e7780 <vmxnet3_sctx_init>,
  ifc_softc_ctx = {isc_vectors = 9, isc_nrxqsets = 8, isc_ntxqsets = 8, __spare0__ = 0, __spare1__ = 0, isc_msix_bar = 24, isc_tx_nsegments = 32, isc_ntxd = {512, 512, 0, 0, 0, 0, 0, 0}, isc_nrxd = {512, 256, 256, 0, 0, 0, 0,
      0}, isc_txqsizes = {8192, 8192, 0, 0, 0, 0, 0, 0}, isc_rxqsizes = {8192, 4096, 4096, 0, 0, 0, 0, 0}, isc_txd_size = "\000\000\000\000\000\000\000", isc_rxd_size = "\000\000\000\000\000\000\000",
    isc_tx_tso_segments_max = 32, isc_tx_tso_size_max = 65532, isc_tx_tso_segsize_max = 16383, isc_tx_csum_flags = 5654, isc_capabilities = 6621115, isc_capenable = 6554555, isc_rss_table_size = 128, isc_rss_table_mask = 127,
    isc_nrxqsets_max = 8, isc_ntxqsets_max = 8, __spare2__ = 0, isc_intr = IFLIB_INTR_MSIX, isc_max_frame_size = 1522, isc_min_frame_size = 0, isc_pause_frames = 0, __spare3__ = 0, __spare4__ = 0, __spare5__ = 0, __spare6__ = 0,
    __spare7__ = 0, __spare8__ = 0, __spare9__ = 0x0, isc_disable_msix = 0, isc_txrx = 0xffffffff810e7680 <vmxnet3_txrx>, isc_media = 0x0}, ifc_ctx_sx = {lock_object = {lo_name = 0xffffffff80be034d "iflib ctx lock",
      lo_flags = 36896768, lo_data = 0, lo_witness = 0x0}, sx_lock = 1}, ifc_state_mtx = {lock_object = {lo_name = 0xfffff80002dac7d0 "vmx0", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0},
  ifc_txqs = 0xfffffe00117e5000, ifc_rxqs = 0xfffffe00003fe000, ifc_if_flags = 34819, ifc_flags = 112, ifc_max_fl_buf_size = 2048, ifc_rx_mbuf_sz = 2048, ifc_link_state = 2, ifc_watchdog_events = 0, ifc_led_dev = 0x0,
  ifc_msix_mem = 0xfffff80002d15480, ifc_legacy_irq = {ii_res = 0x0, __spare0__ = 0, ii_tag = 0x0}, ifc_admin_task = {gt_task = {ta_link = {stqe_next = 0x0}, ta_flags = 2, ta_priority = 0,
      ta_func = 0xffffffff808acd60 <_task_fn_admin>, ta_context = 0xfffff80002d22400}, gt_taskqueue = 0xfffff80002923500, gt_list = {le_next = 0x0, le_prev = 0xfffffe00007fb000}, gt_uniq = 0xfffff80002d22400,
    gt_name = "admin", '\000' <repeats 26 times>, gt_dev = 0x0, gt_irq = 0x0, gt_cpu = -1}, ifc_vflr_task = {gt_task = {ta_link = {stqe_next = 0x0}, ta_flags = 0, ta_priority = 0, ta_func = 0x0, ta_context = 0x0},
    gt_taskqueue = 0x0, gt_list = {le_next = 0x0, le_prev = 0x0}, gt_uniq = 0x0, gt_name = '\000' <repeats 31 times>, gt_dev = 0x0, gt_irq = 0x0, gt_cpu = 0}, ifc_filter_info = {
    ifi_filter = 0xffffffff80af8550 <vmxnet3_event_intr>, ifi_filter_arg = 0xfffff80002d22000, ifi_task = 0xfffff80002d225f8, ifi_ctx = 0xfffff80002d22400}, ifc_media = {ifm_mask = -268435456, ifm_media = 0,
    ifm_cur = 0xfffff8000437fcc0, ifm_list = {lh_first = 0xfffff8000437fcc0}, ifm_change = 0xffffffff808b6d40 <iflib_media_change>, ifm_status = 0xffffffff808b6de0 <iflib_media_status>}, ifc_mediap = 0xfffff80002d22708,
  ifc_sysctl_node = 0xfffff80002d16d80, ifc_sysctl_ntxqs = 0, ifc_sysctl_nrxqs = 0, ifc_sysctl_qs_eq_override = 0, ifc_sysctl_rx_budget = 0, ifc_sysctl_tx_abdicate = 0, ifc_sysctl_core_offset = 0,
  ifc_sysctl_separate_txrx = 0 '\000', ifc_sysctl_ntxds = {0, 0, 0, 0, 0, 0, 0, 0}, ifc_sysctl_nrxds = {0, 0, 0, 0, 0, 0, 0, 0}, ifc_txrx = {ift_txd_encap = 0xffffffff80af5190 <vmxnet3_isc_txd_encap>,
    ift_txd_flush = 0xffffffff80af5430 <vmxnet3_isc_txd_flush>, ift_txd_credits_update = 0xffffffff80af5480 <vmxnet3_isc_txd_credits_update>, ift_rxd_available = 0xffffffff80af5560 <vmxnet3_isc_rxd_available>,
    ift_rxd_pkt_get = 0xffffffff80af5690 <vmxnet3_isc_rxd_pkt_get>, ift_rxd_refill = 0xffffffff80af5970 <vmxnet3_isc_rxd_refill>, ift_rxd_flush = 0xffffffff80af5a50 <vmxnet3_isc_rxd_flush>,
    ift_legacy_intr = 0xffffffff80af5ab0 <vmxnet3_legacy_intr>}, ifc_vlan_attach_event = 0xfffff80002da2300, ifc_vlan_detach_event = 0xfffff80002da22c0, ifc_mac = {octet = "\000PV\243\265\275"}}

(kgdb) p $5.ifc_softc_ctx
$8 = {isc_vectors = 9, isc_nrxqsets = 8, isc_ntxqsets = 8, __spare0__ = 0, __spare1__ = 0, isc_msix_bar = 24, isc_tx_nsegments = 32, isc_ntxd = {512, 512, 0, 0, 0, 0, 0, 0}, isc_nrxd = {512, 256, 256, 0, 0, 0, 0, 0},
  isc_txqsizes = {8192, 8192, 0, 0, 0, 0, 0, 0}, isc_rxqsizes = {8192, 4096, 4096, 0, 0, 0, 0, 0}, isc_txd_size = "\000\000\000\000\000\000\000", isc_rxd_size = "\000\000\000\000\000\000\000", isc_tx_tso_segments_max = 32,
  isc_tx_tso_size_max = 65532, isc_tx_tso_segsize_max = 16383, isc_tx_csum_flags = 5654, isc_capabilities = 6621115, isc_capenable = 6554555, isc_rss_table_size = 128, isc_rss_table_mask = 127, isc_nrxqsets_max = 8,
  isc_ntxqsets_max = 8, __spare2__ = 0, isc_intr = IFLIB_INTR_MSIX, isc_max_frame_size = 1522, isc_min_frame_size = 0, isc_pause_frames = 0, __spare3__ = 0, __spare4__ = 0, __spare5__ = 0, __spare6__ = 0, __spare7__ = 0,
  __spare8__ = 0, __spare9__ = 0x0, isc_disable_msix = 0, isc_txrx = 0xffffffff810e7680 <vmxnet3_txrx>, isc_media = 0x0}

(kgdb) p/x $7.isc_flags
$9 = 0x9

So, it seems that because isc_flags has IFLIB_HAS_RXCQ bit, cidx was taken from rxq->ifr_cq_cidx (477) in iflib_rxeof().
But rxd_frag_to_sd() asserts that fl->ifl_cidx == irf->irf_idx, but if we look at rxq->ifr_frags, none of the fragments has that ID.
In fact:
(kgdb) p rxq->ifr_frags[0]
$11 = {irf_flid = 0 '\000', irf_idx = 142, irf_len = 1514}
And that's where the assertion trips, 142 != 477.
Comment 2 Andriy Gapon freebsd_committer 2020-01-07 12:05:07 UTC
Looking at vmxnet3_isc_rxd_pkt_get, I see that both iri_cidx and ifr_cq_cidx will be set to cqidx after the last fragment.  irf_idx of each fragment is set to rxd_idx of the rspective descriptor.

Getting back to the assertion in rxd_frag_to_sd():

(kgdb) p *ri
$4 = {iri_qsidx = 0, iri_vtag = 0, iri_len = 1514, iri_cidx = 477, iri_ifp = 0xfffff80002d1a800, iri_frags = 0xfffffe00003fe140, iri_flowid = 600473664, iri_csum_flags = 251658240, iri_csum_data = 65535, iri_flags = 0 '\000',
  iri_nfrags = 1 '\001', iri_rsstype = 130 '\202', iri_pad = 0 '\000'}

(kgdb) p ri->iri_frags[0]
$13 = {irf_flid = 0 '\000', irf_idx = 142, irf_len = 1514}

So, irf_idx = 142 in the first and the only fragment.
irf_flid  is zero, so:

$15 = {ifl_cidx = 141, ifl_pidx = 140, ifl_credits = 254, ifl_gen = 1 '\001', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002daa8c0, ifl_fragidx = 139, ifl_size = 256, ifl_buf_size = 2048, ifl_cltype = 1, 
  ifl_zone = 0xfffff80002957000, ifl_sds = {ifsd_map = 0xfffff80002d2d800, ifsd_m = 0xfffff80002d1b800, ifsd_cl = 0xfffff80002d1b000, ifsd_ba = 0xfffff80002d1e000}, ifl_rxq = 0xfffffe00003fe000, ifl_id = 0 '\000', 
  ifl_buf_tag = 0xfffff80002cf8100, ifl_ifdi = 0xfffff80002d17da8, ifl_bus_addrs = {32051296256, 32050450432, 31627747328, 31625582592, 31637278720, 6269554688, 6269536256, 6272448512, 6512416768, 6269433856, 5922217984, 
    6273015808, 31638720512, 6261741568, 6268790784, 29797609472, 6271569920, 6271576064, 6271574016, 6271580160, 6271578112, 6271584256, 6271582208, 6271588352, 6271739904, 6271733760, 6271735808, 6271729664, 6271731712, 
    6271725568, 6271727616, 6271721472}, ifl_vm_addrs = {0xfffff8077667f800 ...

So, ifl_cidx is 141 and that's the actual problem.
477 that I mentioned in the previous comment is irrelevant.
The real problem is ifl_cidx != irf_idx, 141 != 142.

Still no clue how that could happen.
Comment 3 Andriy Gapon freebsd_committer 2020-01-07 12:50:48 UTC
I wonder if irf_idx can "jump ahead" of ifl_cidx because of a skipped zero-length packet?

Hmm, it seems like it could be the case.
Given the current value of iri_cidx/ifr_cq_cidx, I examined three descriptors before that index:

(kgdb) p $19.vxcr_u.rxcd[476]
$21 = {rxd_idx = 142, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 2, no_csum = 0, pad2 = 0, rss_hash = 600473664, len = 1514, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 1, udp = 0, tcp = 1, ipcsum_ok = 1, ipv6 = 0, 
  ipv4 = 1, fragment = 0, fcs = 0, type = 3, gen = 1}

(kgdb) p $19.vxcr_u.rxcd[475]
$22 = {rxd_idx = 141, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 0, pad2 = 0, rss_hash = 0, len = 0, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, 
  fragment = 0, fcs = 0, type = 3, gen = 1}

(kgdb) p $19.vxcr_u.rxcd[474]
$23 = {rxd_idx = 140, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 2, no_csum = 0, pad2 = 0, rss_hash = 600473664, len = 66, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 1, udp = 0, tcp = 1, ipcsum_ok = 1, ipv6 = 0, 
  ipv4 = 1, fragment = 0, fcs = 0, type = 3, gen = 1}

The descriptor at 476 with rxd_idx = 142 seems like the current packet.
And the previous descriptor at 475 with rxd_idx = 141 is a zero-length packet: eop = 1, sop = 1, len = 0.
The packet before it is a normal packet again: eop = 1, sop = 1,  len = 66.

Now, how to fix the problem?
I see two ways:
- a driver can notify iflib of a zero length packet (via a new callback), so that iflib can skip the corresponding entry in the appropriate free list
- rxd_frag_to_sd() can skip through fl entries until ifl_cidx becomes equal to irf_idx
Comment 4 Andriy Gapon freebsd_committer 2020-01-07 13:02:02 UTC
I wonder if vmxnet3_isc_rxd_pkt_get() should not hide those zero-length packets.
It could return packets with irf_len = 0.
I see that there is some provision for that case in assemble_segments(), but not sure if it is really expected.
Especially, in rxd_frag_to_sd().

Could any of iflib developers please help?
Thank you!
Comment 5 Andriy Gapon freebsd_committer 2020-01-16 11:53:37 UTC
Still looking for help from iflib developers.
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2020-01-21 03:40:32 UTC
@Andriy Is this a recent regression/change in behaviour? Any chance of bisection?
Comment 7 Andriy Gapon freebsd_committer 2020-01-21 05:53:09 UTC
(In reply to Kubilay Kocak from comment #6)
That's certainly a fallout from iflib migration.
There have not been that many changes to vmx code.
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2020-01-22 03:02:05 UTC
(In reply to Andriy Gapon from comment #7)

Could you request feedback from developers that were involved in that migration by setting maintainer-feedback ? <email> flags. Thanks!