Bug 259458

Summary: iflib_rxeof NULL pointer crash with vmxnet3 driver
Product: Base System Reporter: Andriy Gapon <avg>
Component: kernAssignee: Andriy Gapon <avg>
Status: Closed FIXED    
Severity: Affects Only Me CC: markj, net
Priority: --- Keywords: crash
Version: 12.2-RELEASEFlags: koobs: mfc-stable13?
koobs: mfc-stable12?
Hardware: Any   
OS: Any   

Description Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 13:15:55 UTC
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff809d4bad
stack pointer           = 0x28:0xfffffe00c85cba40
frame pointer           = 0x28:0xfffffe00c85cba40
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_0)
trap number             = 12
panic: page fault
cpuid = 0
time = 1634262163
KDB: stack backtrace:
 stack1 db_trace_self_wrapper+0x2b vpanic+0x182 panic+0x43 trap_fatal+0x391 trap_pfault+0x4f trap+0x286 calltrap+0x8
 stack2 memcpy_erms+0x1d _task_fn_rx+0x2b9 gtaskqueue_run_locked+0xed gtaskqueue_thread_loop+0x7e fork_exit+0x6d fork_trampoline+0xe
KDB: enter: panic


#11 kdb_enter (why=0xffffffff80b17af9 "panic", msg=0xffffffff80b17af9 "panic") at /usr/src/sys/kern/subr_kdb.c:486
#12 0xffffffff8073ddce in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:975
#13 0xffffffff8073dc23 in panic (fmt=0xffffffff81178120 <cnputs_mtx+24> "") at /usr/src/sys/kern/kern_shutdown.c:909
#14 0xffffffff809d8b31 in trap_fatal (frame=0xfffffe00c85cb980, eva=0) at /usr/src/sys/amd64/amd64/trap.c:921
#15 0xffffffff809d8b8f in trap_pfault (frame=0xfffffe00c85cb980, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:739
#16 0xffffffff809d8256 in trap (frame=0xfffffe00c85cb980) at /usr/src/sys/amd64/amd64/trap.c:405
#17 <signal handler called>
#18 memcpy_erms () at /usr/src/sys/amd64/amd64/support.S:577
#19 0xffffffff8084d049 in iflib_rxd_pkt_get (rxq=0xfffffe00ea9f5000, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2737
#20 iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2879
#21 _task_fn_rx (context=<optimized out>) at /usr/src/sys/net/iflib.c:3868
#22 0xffffffff807808bd in gtaskqueue_run_locked (queue=0xfffff800020c7200) at /usr/src/sys/kern/subr_gtaskqueue.c:362
#23 0xffffffff8078068e in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:537
#24 0xffffffff8070792d in fork_exit (callout=0xffffffff80780610 <gtaskqueue_thread_loop>, arg=0xfffffe00007f8008, frame=0xfffffe00c85cbc00) at /usr/src/sys/kern/kern_fork.c:108
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 13:17:35 UTC
(kgdb) fr 19
#19 0xffffffff8084d049 in iflib_rxd_pkt_get (rxq=0xfffffe00ea9f5000, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2737
2737    /usr/src/sys/net/iflib.c: No such file or directory.

(kgdb) i loc
sd = {ifsd_cl = 0xfffff80002d61a38, ifsd_m = 0xfffff80002d62a38, ifsd_fl = 0xfffff80002d93400}
m = 0xfffff80123211c00

(kgdb) p m->m_data
$1 = (caddr_t) 0xfffff80123211c58 ""

(kgdb) p sd.ifsd_cl
$2 = (caddr_t *) 0xfffff80002d61a38
(kgdb) p *sd.ifsd_cl
$3 = (caddr_t) 0x0
Comment 2 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 13:21:16 UTC
(kgdb) fr 20  
#20 iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2879
2879    in /usr/src/sys/net/iflib.c
(kgdb) i loc
ri = {iri_qsidx = 0, iri_vtag = 0, iri_len = 60, iri_cidx = 328, iri_ifp = 0xfffff80002d9e000, iri_frags = 0xfffffe00ea9f5180, iri_flowid = 0, iri_csum_flags = 0, iri_csum_data = 0, iri_flags = 0 '\000', iri_nfrags = 1 '\001',
  iri_rsstype = 0 '\000', iri_pad = 0 '\000'}
ctx = 0xfffff80002dd2000
lro_possible = <error reading variable lro_possible (Cannot access memory at address 0x0)>
v4_forwarding = <error reading variable v4_forwarding (Cannot access memory at address 0x0)>
v6_forwarding = <error reading variable v6_forwarding (Cannot access memory at address 0x0)>
retval = <error reading variable retval (Cannot access memory at address 0x0)>
scctx = <optimized out>
sctx = 0xffffffff810f1100 <vmxnet3_sctx_init>
rx_pkts = <error reading variable rx_pkts (Cannot access memory at address 0x0)>
rx_bytes = <error reading variable rx_bytes (Cannot access memory at address 0x0)>
mh = 0xfffff800b371d100
mt = 0xfffff800b371d100
ifp = 0xfffff80002d9e000
cidxp = 0xfffffe00ea9f5018
avail = 1
budget_left = 15
err = <optimized out>
m = <optimized out>
i = <optimized out>
fl = <optimized out>
mf = <optimized out>
lro_enabled = <optimized out>

(kgdb) p *cidxp
$4 = 328

(kgdb) p ri.iri_frags[0]
$5 = {irf_flid = 0 '\000', irf_idx = 327, irf_len = 60}

(kgdb) fr 19
#19 0xffffffff8084d049 in iflib_rxd_pkt_get (rxq=0xfffffe00ea9f5000, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2737
2737    /usr/src/sys/net/iflib.c: No such file or directory.
(kgdb) p *rxq
$6 = {ifr_ctx = 0xfffff80002dd2000, ifr_fl = 0xfffff80002d93400, ifr_rx_irq = 0, ifr_cq_cidx = 328, ifr_id = 0, ifr_nfl = 2 '\002', ifr_ntxqirq = 1 '\001', ifr_txqid = "\000\000\000", ifr_fl_offset = 1 '\001', ifr_lc = {
    ifp = 0xfffff80002d9e000, lro_mbuf_data = 0xfffffe00ea9f1000, lro_queued = 0, lro_flushed = 0, lro_bad_csum = 0, lro_cnt = 8, lro_mbuf_count = 0, lro_mbuf_max = 512, lro_ackcnt_lim = 65535, lro_length_lim = 65535,
    lro_hashsz = 509, lro_hash = 0xfffff8000410d000, lro_active = {lh_first = 0x0}, lro_free = {lh_first = 0xfffffe00ea9f33f0}}, ifr_task = {gt_task = {ta_link = {stqe_next = 0x0}, ta_flags = 2, ta_priority = 0,
      ta_func = 0xffffffff8084cd90 <_task_fn_rx>, ta_context = 0xfffffe00ea9f5000}, gt_taskqueue = 0xfffff800020c7200, gt_list = {le_next = 0x0, le_prev = 0xfffffe00015f08a8}, gt_uniq = 0xfffffe00ea9f5000,
    gt_name = "rxq0", '\000' <repeats 27 times>, gt_irq = 257, gt_cpu = 0}, ifr_watchdog = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0,
    c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0, c_iflags = 16, c_cpu = 0, c_exec_time = 0, c_lines = {u128 = 1528, u16 = {1528, 0, 0, 0, 0, 0, 0, 0}}}, ifr_filter_info = {
    ifi_filter = 0xffffffff80a3c580 <vmxnet3_rxq_intr>, ifi_filter_arg = 0xfffff80004110000, ifi_task = 0xfffffe00ea9f5088, ifi_ctx = 0xfffffe00ea9f5000}, ifr_ifdi = 0xfffff80002d99400, ifr_frags = {{irf_flid = 0 '\000',
      irf_idx = 327, irf_len = 60}, {irf_flid = 0 '\000', irf_idx = 0, irf_len = 0} <repeats 63 times>}}


(kgdb) p rxq->ifr_fl[0]
$7 = {ifl_cidx = 328, ifl_pidx = 341, ifl_credits = 509, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002cb5ec0, ifl_fragidx = 142, ifl_size = 512, ifl_buf_size = 2048, ifl_cltype = 1,
  ifl_zone = 0xfffff800029c6000, ifl_sds = {ifsd_map = 0xfffff80002d5f000, ifsd_m = 0xfffff80002d62000, ifsd_cl = 0xfffff80002d61000, ifsd_ba = 0xfffff80002d60000}, ifl_rxq = 0xfffffe00ea9f5000, ifl_id = 0 '\000',
  ifl_buf_tag = 0xfffff80002d74400, ifl_ifdi = 0xfffff80002d99428, ifl_bus_addrs = {4884103168, 4884094976, 4887971840, 4887965696, 4898656256, 4898662400, 4898660352, 4898617344, 4753053696, 4753018880, 4753020928, 4883597312,
    4898639872, 4898646016, 4898643968, 4898650112, 4884144128, 4884150272, 4884148224, 4884154368, 4884152320, 4884158464, 4884156416, 4884162560, 4884160512, 4884166656, 4884111360, 4884117504, 4884115456, 4884121600,
    4884119552, 4884125696}, ifl_rxd_idxs = {141, 137, 120, 121, 323, 324, 325, 326, 0, 1, 2, 3, 315, 316, 317, 318, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511}}

(kgdb) p $7.ifl_sds.ifsd_cl[327]
$8 = (caddr_t) 0x0
(kgdb) p $7.ifl_sds.ifsd_cl[326]
$9 = (caddr_t) 0xfffff80123faf800 "\377\377\377\377\377\377"
(kgdb) p $7.ifl_sds.ifsd_cl[328]
$10 = (caddr_t) 0xfffff8012322b800 "\377\377\377\377\377\377"
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 14:03:03 UTC
(kgdb) p *sctx
$16 = {isc_magic = 3405705229, isc_driver = 0xffffffff810f1280 <vmxnet3_iflib_driver>, isc_q_align = 512, isc_tx_maxsize = 65536, isc_tx_maxsegsize = 16383, isc_tso_maxsize = 65550, isc_tso_maxsegsize = 16383, 
  isc_rx_maxsize = 16383, isc_rx_maxsegsize = 16383, isc_rx_nsegments = 1, isc_admin_intrcnt = 1, isc_vendor_info = 0xffffffff810f12b0 <vmxnet3_vendor_info_array>, isc_driver_version = 0xffffffff80aead7c "2", 
  isc_parse_devinfo = 0x0, isc_nrxd_min = {32, 32, 32, 0, 0, 0, 0, 0}, isc_nrxd_default = {512, 512, 512, 0, 0, 0, 0, 0}, isc_nrxd_max = {2048, 2048, 2048, 0, 0, 0, 0, 0}, isc_ntxd_min = {32, 32, 0, 0, 0, 0, 0, 0}, 
  isc_ntxd_default = {512, 512, 0, 0, 0, 0, 0, 0}, isc_ntxd_max = {4096, 4096, 0, 0, 0, 0, 0, 0}, isc_nfl = 2, isc_ntxqs = 2, isc_nrxqs = 3, isc_rx_process_limit = 0, isc_tx_reclaim_thresh = 0, isc_flags = 262153, 
  isc_name = 0x0}


(kgdb) p rxq->ifr_ctx->ifc_softc_ctx
$22 = {isc_vectors = 9, isc_nrxqsets = 8, isc_ntxqsets = 8, isc_min_tx_latency = 0 '\000', isc_rx_mvec_enable = 0 '\000', isc_txrx_budget_bytes_max = 2097152, isc_msix_bar = 24, isc_tx_nsegments = 32, isc_ntxd = {512, 512, 0, 
    0, 0, 0, 0, 0}, isc_nrxd = {1024, 512, 512, 0, 0, 0, 0, 0}, isc_txqsizes = {8192, 8192, 0, 0, 0, 0, 0, 0}, isc_rxqsizes = {16384, 8192, 8192, 0, 0, 0, 0, 0}, isc_txd_size = "\000\000\000\000\000\000\000", 
  isc_rxd_size = "\000\000\000\000\000\000\000", isc_tx_tso_segments_max = 32, isc_tx_tso_size_max = 65532, isc_tx_tso_segsize_max = 16383, isc_tx_csum_flags = 5654, isc_capabilities = 6621115, isc_capenable = 6554555, 
  isc_rss_table_size = 128, isc_rss_table_mask = 127, isc_nrxqsets_max = 8, isc_ntxqsets_max = 8, isc_tx_qdepth = 2048, isc_intr = IFLIB_INTR_MSIX, isc_rxd_buf_size = {0, 0, 4096, 0, 0, 0, 0, 0}, isc_max_frame_size = 22, 
  isc_min_frame_size = 0, isc_pause_frames = 0, isc_vendor_info = {pvi_vendor_id = 0, pvi_device_id = 0, pvi_subvendor_id = 0, pvi_subdevice_id = 0, pvi_rev_id = 0, pvi_class_mask = 0, pvi_name = 0x0}, isc_disable_msix = 0, 
  isc_txrx = 0xffffffff810f1000 <vmxnet3_txrx>}
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 14:05:42 UTC
(kgdb) p ri
$26 = {iri_qsidx = 0, iri_vtag = 0, iri_len = 60, iri_cidx = 328, iri_ifp = 0xfffff80002d9e000, iri_frags = 0xfffffe00ea9f5180, iri_flowid = 0, iri_csum_flags = 0, iri_csum_data = 0, iri_flags = 0 '\000', iri_nfrags = 1 '\001', iri_rsstype = 0 '\000', iri_pad = 0 '\000'}
Comment 5 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 14:48:07 UTC
(kgdb) p $26.iri_frags[0]
$33 = {irf_flid = 0 '\000', irf_idx = 327, irf_len = 60}

(kgdb) fr 19
#19 0xffffffff8084d049 in iflib_rxd_pkt_get (rxq=0xfffffe00ea9f5000, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2737
2737    in /usr/src/sys/net/iflib.c
(kgdb) i loc
sd = {ifsd_cl = 0xfffff80002d61a38, ifsd_m = 0xfffff80002d62a38, ifsd_fl = 0xfffff80002d93400}
m = 0xfffff80123211c00
(kgdb) p sd.ifsd_cl[0]
$27 = (caddr_t) 0x0
(kgdb) p sd.ifsd_cl[1]
$28 = (caddr_t) 0xfffff8012322b800 "\377\377\377\377\377\377"
(kgdb) p sd.ifsd_cl[2]
$29 = (caddr_t) 0xfffff8012322a000 "\377\377\377\377\377\377"
(kgdb) p sd.ifsd_m[0]
$30 = (struct mbuf *) 0x0
(kgdb) p sd.ifsd_m[1]
$31 = (struct mbuf *) 0xfffff80123211b00
(kgdb) p sd.ifsd_m[2]
$32 = (struct mbuf *) 0xfffff80123211a00
Comment 6 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 15:12:16 UTC
I noticed a discrepancy between ifl_cidx / iri_cidx / ifr_cq_cidx that are equal to 328 and irf_idx that's set to 327.

Initially, I thought that this could be a come back of an older problem: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126#c2
But in this case I do not see any zero-length packets near the current index.
In fact, given that vxcr_zero_length is zero, there hadn't been any zero-length packets at all.

Looking at the code in vmxnet3_isc_rxd_pkt_get(), I think that iri_cidx != irf_idx is not a problem.  irf_idx is the last processed fragment's (packet's) descriptor ID, while iri_cidx is the next one.  So, everything is correct there.


(kgdb) p $20->ifc_softc 
$34 = (void *) 0xfffff80002d93800
(kgdb) p *(struct vmxnet3_softc *)$20->ifc_softc
$35 = {vmx_dev = 0xfffff80002db6300, vmx_ctx = 0xfffff80002dd2000, vmx_sctx = 0xffffffff810f1100 <vmxnet3_sctx_init>, vmx_scctx = 0xfffff80002dd2048, vmx_ifp = 0xfffff80002d9e000, vmx_ds = 0xfffff80002d91400, vmx_flags = 2, 
  vmx_rxq = 0xfffff80004110000, vmx_txq = 0xfffff80004110800, vmx_res0 = 0xfffff80002d98b00, vmx_iot0 = 1, vmx_ioh0 = 18446735281866391552, vmx_res1 = 0xfffff80002d98a00, vmx_iot1 = 1, vmx_ioh1 = 18446735281866395648, 
  vmx_link_active = 1, vmx_intr_mask_mode = 0, vmx_event_intr_idx = 8, vmx_event_intr_irq = {ii_res = 0xfffff80002d99d00, ii_rid = 9, ii_tag = 0xfffff80004413e80}, vmx_mcast = 0xfffff8000440bc00 "", 
  vmx_rss = 0xfffff8000440bb00, vmx_ds_dma = {idi_paddr = 47780864, idi_vaddr = 0xfffff80002d91400 "\341\376\276\272", idi_tag = 0xfffff8000440b900, idi_map = 0x0, idi_size = 720}, vmx_qs_dma = {idi_paddr = 68214784, 
    idi_vaddr = 0xfffff8000410e000 "", idi_tag = 0xfffff80002db4c00, idi_map = 0x0, idi_size = 4096}, vmx_mcast_dma = {idi_paddr = 71351296, idi_vaddr = 0xfffff8000440bc00 "", idi_tag = 0xfffff8000440b700, idi_map = 0x0, 
    idi_size = 192}, vmx_rss_dma = {idi_paddr = 71351040, idi_vaddr = 0xfffff8000440bb00 "\017", idi_tag = 0xfffff8000440b800, idi_map = 0x0, idi_size = 176}, vmx_media = 0xfffff80002dd22f0, vmx_vlan_filter = {
    0 <repeats 128 times>}, vmx_lladdr = "\000PV\246\237\""}


(kgdb) p $35.vmx_rxq[0]
$36 = {vxrxq_sc = 0xfffff80002d93800, vxrxq_id = 0, vxrxq_intr_idx = 0, vxrxq_irq = {ii_res = 0xfffff80002df8f00, ii_rid = 1, ii_tag = 0xfffff80002d99000}, vxrxq_cmd_ring = {{vxrxr_rxd = 0xfffffe00eaaf4000, vxrxr_ndesc = 512, 
      vxrxr_gen = 0, vxrxr_paddr = 57622528, vxrxr_desc_skips = 1017, vxrxr_refill_start = 142}, {vxrxr_rxd = 0xfffffe00eaaf6000, vxrxr_ndesc = 512, vxrxr_gen = 1, vxrxr_paddr = 57630720, vxrxr_desc_skips = 0, 
      vxrxr_refill_start = 511}}, vxrxq_comp_ring = {vxcr_u = {txcd = 0xfffffe00eaaf0000, rxcd = 0xfffffe00eaaf0000}, vxcr_next = 0, vxcr_ndesc = 1024, vxcr_gen = 1, vxcr_paddr = 57606144, vxcr_zero_length = 0, 
    vxcr_pkt_errors = 0}, vxrxq_rs = 0xfffff8000410e800, vxrxq_sysctl = 0xfffff80004415480, vxrxq_name = "vmx0-rx0\000\000\000\000\000\000\000"}
(kgdb) p $36.vxrxq_comp_ring

$37 = {vxcr_u = {txcd = 0xfffffe00eaaf0000, rxcd = 0xfffffe00eaaf0000}, vxcr_next = 0, vxcr_ndesc = 1024, vxcr_gen = 1, vxcr_paddr = 57606144, vxcr_zero_length = 0, vxcr_pkt_errors = 0}


(kgdb) p $37.vxcr_u.rxcd[325]
$38 = {rxd_idx = 325, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, 
  fragment = 0, fcs = 0, type = 3, gen = 1}
(kgdb) p $37.vxcr_u.rxcd[326]
$39 = {rxd_idx = 326, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, 
  fragment = 0, fcs = 0, type = 3, gen = 1}
(kgdb) p $37.vxcr_u.rxcd[327]
$40 = {rxd_idx = 327, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, 
  fragment = 0, fcs = 0, type = 3, gen = 1}
(kgdb) p $37.vxcr_u.rxcd[328]
$41 = {rxd_idx = 328, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, 
  fragment = 0, fcs = 0, type = 3, gen = 1}
Comment 7 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 15:30:16 UTC
In the end, it looks like somehow index 327 was not re-filled earlier.
Or it was somehow consumed twice in a row...


(kgdb) p/x *$2.ifl_rx_bitmap@8
$73 = {0xffffffffffffffff, 0xffffffffffffffff, 0x3fff, 0x0, 0x0, 0xffffffffffffff00, 0xffffffffffffffff, 0xffffffffffffffff}

(kgdb) p $2.ifl_sds.ifsd_m[327]
$74 = (struct mbuf *) 0x0
(kgdb) p $2.ifl_sds.ifsd_m[328]
$75 = (struct mbuf *) 0xfffff80123211b00
(kgdb) p $2.ifl_sds.ifsd_cl[327]
$76 = (caddr_t) 0x0
(kgdb) p $2.ifl_sds.ifsd_cl[328]
$77 = (caddr_t) 0xfffff8012322b800 "\377\377\377\377\377\377"

ifl_rx_bitmap has bits [0, 141] set, bits [142, 327] clear and bits [328, 511] set again.
Given ifl_fragidx = 142 the latest refill went up to 141 which was what we actually see.
So, it was not the latest refill.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2021-10-26 15:33:16 UTC
Do you have allocation failures in the mbuf zone?  On 12.2 I think you can simply check the uz_fails field of the zone.
Comment 9 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 15:36:53 UTC
It seems that I got a little bit confused again with respect to ifl_rx_bitmap and ifl_sds.ifsd_m.  iflib_rxd_pkt_get() calls rxd_frag_to_sd() before the memcpy that crashed and the latter clears the current bit in ifl_rx_bitmap.  Also, *sd.ifsd_m was assigned NULL in iflib_rxd_pkt_get().  And originally I established that 'm' was not NULL, so originally ifsd_m was not NULL.

So, it looks that index 327 was actually filled when the call started.
The problem was only with ifsd_cl.  Somehow it was not refilled and was NULL.
Comment 10 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 15:55:55 UTC
(In reply to Mark Johnston from comment #8)
Not seeing any fails in either the mbuf or cluster zone:

(kgdb) p *$2.ifl_zone
$81 = {uz_lockptr = 0xfffff800029c70e0, uz_name = 0xffffffff80ab810f "mbuf_cluster", uz_domain = 0xfffff800029c6580, uz_flags = 264, uz_size = 2048, uz_ctor = 0xffffffff80721440 <mb_ctor_clust>, uz_dtor = 0x0, uz_init = 0x0, 
  uz_fini = 0x0, uz_import = 0xffffffff80981940 <zone_import>, uz_release = 0xffffffff8097dbc0 <zone_release>, uz_arg = 0xfffff800029c6000, uz_slab = 0xffffffff809818a0 <zone_fetch_slab>, uz_count = 12, uz_count_min = 5, 
  uz_link = {le_next = 0x0, le_prev = 0xfffff800029c7110}, uz_kegs = {lh_first = 0xfffff800029c60a0}, uz_lock = {lock_object = {lo_name = 0xffffffff80ab810f "mbuf_cluster", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, 
    mtx_lock = 0}, uz_klink = {kl_link = {le_next = 0x0, le_prev = 0xfffff800029c6078}, kl_keg = 0xfffff800029c70e0}, uz_warning = 0xffffffff80abda84 "kern.ipc.nmbclusters limit reached", uz_ratecheck = {tv_sec = 0, 
    tv_usec = 0}, uz_maxaction = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0xffffffff807213f0 <mb_reclaim>, ta_context = 0xfffff800029c6000}, uz_allocs = 0, uz_fails = 0, uz_frees = 0, 
  uz_sleeps = 0, uz_cpu = 0xfffff800029c6180}

(kgdb) p *zone_mbuf
$82 = {uz_lockptr = 0xfffff800029c7000, uz_name = 0xffffffff80a8de52 "mbuf", uz_domain = 0xfffff800028e6b40, uz_flags = 0, uz_size = 256, uz_ctor = 0xffffffff80721320 <mb_ctor_mbuf>, uz_dtor = 0xffffffff807213d0 <mb_dtor_mbuf>, 
  uz_init = 0x0, uz_fini = 0x0, uz_import = 0xffffffff80981940 <zone_import>, uz_release = 0xffffffff8097dbc0 <zone_release>, uz_arg = 0xfffff800028e65c0, uz_slab = 0xffffffff809818a0 <zone_fetch_slab>, uz_count = 253, 
  uz_count_min = 253, uz_link = {le_next = 0xfffff800029c65c0, le_prev = 0xfffff800029c7030}, uz_kegs = {lh_first = 0xfffff800028e6660}, uz_lock = {lock_object = {lo_name = 0xffffffff80a8de52 "mbuf", lo_flags = 21168128, 
      lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, uz_klink = {kl_link = {le_next = 0x0, le_prev = 0xfffff800028e6638}, kl_keg = 0xfffff800029c7000}, uz_warning = 0xffffffff80acfe1a "kern.ipc.nmbufs limit reached", 
  uz_ratecheck = {tv_sec = 0, tv_usec = 0}, uz_maxaction = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0xffffffff807213f0 <mb_reclaim>, ta_context = 0xfffff800028e65c0}, uz_allocs = 0, uz_fails = 0, 
  uz_frees = 0, uz_sleeps = 0, uz_cpu = 0xfffff800028e6740}
Comment 11 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 16:23:31 UTC
Also, got a different crash but quite possibly related:

<6>vmx0: link state changed to UP


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x1ea09e000
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80a5dd07
stack pointer           = 0x0:0xfffffe00c85cb930
frame pointer           = 0x0:0xfffffe00c85cb960
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_0)
trap number             = 12
panic: page fault
cpuid = 0
time = 1635193519
KDB: stack backtrace:
 stack1 db_trace_self_wrapper+0x2b vpanic+0x182 panic+0x43 trap_fatal+0x391 trap_pfault+0x4f trap+0x286 calltrap+0x8
 stack2 bounce_bus_dmamap_sync+0x17 iflib_fl_refill+0x31b _task_fn_rx+0x84b gtaskqueue_run_locked+0xed gtaskqueue_thread_loop+0x7e fork_exit+0x6d fork_trampoline+0xe
KDB: enter: panic
Comment 12 Andriy Gapon freebsd_committer freebsd_triage 2021-10-26 16:37:44 UTC
I think that I need to try to rule out bug 253473 before anything else.
Comment 13 Andriy Gapon freebsd_committer freebsd_triage 2021-11-01 09:46:28 UTC
(In reply to Andriy Gapon from comment #12)
Applied the fix but still seeing variations of the crash.

The data I see makes me think that somehow concurrent calls to iflib_fl_refill() for the same iflib_fl_t (free list) are happening.

One hypothetical possibility is that iflib_stop/iflib_fl_setup are not sufficiently synchronized with the taskqueue threads (that call _task_fn_rx).  I am not familiar with grouptask / taskqgroup code, but I do not see any drain-like calls.

So, I wonder if there could be a race between an enqueued task and the stop/setup operations.
Comment 14 Andriy Gapon freebsd_committer freebsd_triage 2021-11-01 10:13:04 UTC
Some data from the latest crash I've got.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x15fc12000
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80a5dd07
stack pointer           = 0x28:0xfffffe00c85cb930
frame pointer           = 0x28:0xfffffe00c85cb960
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_0)
trap number             = 12
panic: page fault
cpuid = 0
time = 1635553138
KDB: stack backtrace:
 stack1 db_trace_self_wrapper+0x2b vpanic+0x182 panic+0x43 trap_fatal+0x391 trap_pfault+0x4f trap+0x286 calltrap+0x8
 stack2 bounce_bus_dmamap_sync+0x17 iflib_fl_refill+0x31b _task_fn_rx+0x84b gtaskqueue_run_locked+0xed gtaskqueue_thread_loop+0x7e fork_exit+0x6d fork_trampoline+0xe

(kgdb) bt
[snip]
#13 0xffffffff8073dc23 in panic (fmt=0xffffffff81178120 <cnputs_mtx+24> "") at /usr/src/sys/kern/kern_shutdown.c:909
#14 0xffffffff809d8b31 in trap_fatal (frame=0xfffffe00c85cb870, eva=5901459456) at /usr/src/sys/amd64/amd64/trap.c:921
#15 0xffffffff809d8b8f in trap_pfault (frame=0xfffffe00c85cb870, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:739
#16 0xffffffff809d8256 in trap (frame=0xfffffe00c85cb870) at /usr/src/sys/amd64/amd64/trap.c:405
#17 <signal handler called>
#18 0xffffffff80a5dd07 in bounce_bus_dmamap_sync (dmat=0xfffff80002d83400, map=0x15fc12000, op=1) at /usr/src/sys/x86/x86/busdma_bounce.c:973
#19 0xffffffff8085104b in bus_dmamap_sync (dmat=0xfffff80002d83400, map=0x15fc12000, op=<error reading variable: Cannot access memory at address 0x1>) at /usr/src/sys/x86/include/bus_dma.h:125
#20 iflib_fl_refill (ctx=0xfffff80002dd7000, fl=<optimized out>, count=<optimized out>) at /usr/src/sys/net/iflib.c:2109
#21 0xffffffff8084d5db in iflib_fl_refill_all (ctx=0xfffff80002dd7000, fl=0xfffff80002d955c0) at /usr/src/sys/net/iflib.c:2188
#22 iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2899
#23 _task_fn_rx (context=<optimized out>) at /usr/src/sys/net/iflib.c:3868
#24 0xffffffff807808bd in gtaskqueue_run_locked (queue=0xfffff800020c7200) at /usr/src/sys/kern/subr_gtaskqueue.c:362
#25 0xffffffff8078068e in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:537
#26 0xffffffff8070792d in fork_exit (callout=0xffffffff80780610 <gtaskqueue_thread_loop>, arg=0xfffffe00007f8008, frame=0xfffffe00c85cbc00) at /usr/src/sys/kern/kern_fork.c:1088


(kgdb) fr 21
#21 0xffffffff8084d5db in iflib_fl_refill_all (ctx=0xfffff80002dd7000, fl=0xfffff80002d955c0) at /usr/src/sys/net/iflib.c:2188
2188    in /usr/src/sys/net/iflib.c
(kgdb) p fl
$1 = (iflib_fl_t) 0xfffff80002d955c0

(kgdb) fr 20
#20 iflib_fl_refill (ctx=0xfffff80002dd7000, fl=<optimized out>, count=<optimized out>) at /usr/src/sys/net/iflib.c:2109
2109    /usr/src/sys/net/iflib.c: No such file or directory.
(kgdb) i loc
iru = {iru_paddrs = 0xfffff80002d95640, iru_idxs = 0xfffff80002d95740, iru_pidx = 1888, iru_qsidx = 0, iru_count = 32, iru_buf_size = 4096, iru_flidx = 1 '\001'}
cb_arg = {error = 0, seg = {ds_addr = 5901459456, ds_len = 4096}, nseg = 1}
sd_m = 0xfffffe00eabdc000
sd_map = 0xfffffe00eabe8000
sd_cl = 0xfffffe00eabe0000
sd_ba = 0xfffffe00eabe4000
idx = 1949
pidx = 1920
frag_idx = -1
n = <optimized out>
i = 29
credits = 1949
bus_addr = 18446735283517988864
cl = <optimized out>
err = <optimized out>
m = <optimized out>

(kgdb) p $1.ifl_size
$4 = 2048

(kgdb) p/x *$1.ifl_rx_bitmap@32
$7 = {0xffffffffffffffff <repeats 32 times>}

(kgdb)  p *$1
$8 = {ifl_cidx = 0, ifl_pidx = 1920, ifl_credits = 1920, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002d83200, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 4096, ifl_cltype = 3, 
  ifl_zone = 0xfffff800029c5000, ifl_sds = {ifsd_map = 0xfffffe00eabe8000, ifsd_m = 0xfffffe00eabdc000, ifsd_cl = 0xfffffe00eabe0000, ifsd_ba = 0xfffffe00eabe4000}, ifl_rxq = 0xfffffe00ea9f5000, ifl_id = 1 '\001', 
  ifl_buf_tag = 0xfffff80002d83400, ifl_ifdi = 0xfffff80002d9b4d0, ifl_bus_addrs = {5901619200, 5901623296, 5901533184, 5901537280, 5901541376, 5901545472, 5901549568, 5901553664, 5901557760, 5901561856, 5901565952, 5901570048, 
    5901574144, 5901488128, 5901492224, 5901496320, 5901500416, 5901504512, 5901508608, 5901512704, 5901516800, 5901520896, 5901524992, 5901529088, 5901443072, 5901447168, 5901451264, 5901455360, 5901459456, 5901602816, 
    5901606912, 5901611008}, ifl_rxd_idxs = {1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 2047, 1917, 
    1918, 1919}}

Things of note:
- frag_idx = -1 and ifl_rx_bitmap is indeed full
- i = 29 and there is a jump from 1947 to 2047 (maximum index as ifl_size = 2048) in ifl_rxd_idxs at positions 27 and 28

This makes me suspect that a concurrent refill topped the free list while the refill in question was running.
Comment 15 Andriy Gapon freebsd_committer freebsd_triage 2021-11-01 10:17:52 UTC
Also of note is that in all cases I see
vmx0: link state changed to UP
shortly before the crash.
Comment 16 Andriy Gapon freebsd_committer freebsd_triage 2021-11-01 11:31:09 UTC
Actually, I am not sure that comment #0 and comment #14 (and comment #11) are the same problem.
It could very well be that the crash from comment #0 is really fixed by the fix from bug 253473.

The more I dig into it, the more the problem in comment #14 is starting to look like uninitialized memory in ifl_rx_bitmap rather than a race.
Comment 17 Andriy Gapon freebsd_committer freebsd_triage 2021-11-01 13:03:01 UTC
Some additional observations from another crash of exactly the same kind.

There are 8 receive queues with 2 free lists per each.
As far as I can tell, all free lists had been initialized and had initial 128 credits. Just a single packet had been received.  It was on rxq0 and its descriptor matched free list number 0.  So, a single credit was consumed from that free list and its cidx was advanced to 1.  After that the free list was topped up with credits.  Additionally the code was also topping up the other free list for rxq0 and that's when the problem happened.  iflib_fl_refill() was called with count of 1919 (2048 - 128 - 1) and it was able to fill 1247 credits before the free list's bitmap became full somehow...

Free lists for receive queues other than zero look like this:
(kgdb) p *ctx->ifc_rxqs[$i++].ifr_fl@2
$52 = {{ifl_cidx = 0, ifl_pidx = 128, ifl_credits = 128, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002fb0c00, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 2048, ifl_cltype = 1, 
    ifl_zone = 0xfffff800029c6000, ifl_sds = {ifsd_map = 0xfffffe00eac18000, ifsd_m = 0xfffffe00eabfc000, ifsd_cl = 0xfffffe00eac10000, ifsd_ba = 0xfffffe00eac14000}, ifl_rxq = 0xfffffe00ea9f5300, ifl_id = 0 '\000', 
    ifl_buf_tag = 0xfffff80002fb0e00, ifl_ifdi = 0xfffff80002fc56a8, ifl_bus_addrs = {6106355712, 6106349568, 6106351616, 6106345472, 6106347520, 6106361856, 6106363904, 6106357760, 6106359808, 6106353664, 6106376192, 
      6106370048, 6106372096, 6106365952, 6106368000, 6106382336, 6106384384, 6106378240, 6106380288, 6106374144, 6106396672, 6106390528, 6106392576, 6106386432, 6106388480, 6106402816, 6106404864, 6106398720, 6106400768, 
      6106394624, 6104950784, 6106415104}, ifl_rxd_idxs = {96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127}},
{ ifl_cidx = 0, ifl_pidx = 128, ifl_credits = 128, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002fb0b00, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 4096, ifl_cltype = 3, 
    ifl_zone = 0xfffff800029c5000, ifl_sds = {ifsd_map = 0xfffffe00eac28000, ifsd_m = 0xfffffe00eac1c000, ifsd_cl = 0xfffffe00eac20000, ifsd_ba = 0xfffffe00eac24000}, ifl_rxq = 0xfffffe00ea9f5300, ifl_id = 1 '\001', 
    ifl_buf_tag = 0xfffff80002fb0d00, ifl_ifdi = 0xfffff80002fc56d0, ifl_bus_addrs = {8338677760, 8338681856, 8338685952, 8338690048, 8338620416, 8338624512, 8338628608, 8338632704, 6105874432, 6105878528, 6105882624, 
      6105886720, 6105890816, 6105894912, 6105899008, 6105903104, 6105907200, 6105911296, 6105915392, 6105919488, 6105923584, 6105927680, 6105931776, 6105935872, 6105939968, 6105944064, 6105948160, 6105952256, 6105956352, 
      6105960448, 6105964544, 6105968640}, ifl_rxd_idxs = {96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127}}}

And their bitmaps are all like this:
$62 = {0xffffffffffffffff, 0xffffffffffffffff, 0x0 <repeats 30 times>}

Here are the free lists of rxq0 at the time of the crash:
$51 = {{ifl_cidx = 1, ifl_pidx = 0, ifl_credits = 2047, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002faf400, ifl_fragidx = 0, ifl_size = 2048, ifl_buf_size = 2048, ifl_cltype = 1, 
    ifl_zone = 0xfffff800029c6000, ifl_sds = {ifsd_map = 0xfffffe00eabd8000, ifsd_m = 0xfffffe00eabcc000, ifsd_cl = 0xfffffe00eabd0000, ifsd_ba = 0xfffffe00eabd4000}, ifl_rxq = 0xfffffe00ea9f5000, ifl_id = 0 '\000', 
    ifl_buf_tag = 0xfffff80002faf600, ifl_ifdi = 0xfffff80002fc5728, ifl_bus_addrs = {6101612544, 6101651456, 6101649408, 6101639168, 6101641216, 6101626880, 6101628928, 6101635072, 6101637120, 6101659648, 6101661696, 
      6101647360, 6101624832, 6101618688, 6101620736, 6101614592, 6101616640, 6101671936, 6101673984, 6101667840, 6101669888, 6101622784, 6101678080, 6101680128, 6101682176, 6101655552, 6101657600, 6101786624, 6101788672, 
      6101766144, 6101768192, 6101676032}, ifl_rxd_idxs = {2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039, 2040, 2041, 2042, 2043, 
      2044, 2045, 2046, 2047}},
{ifl_cidx = 0, ifl_pidx = 1344, ifl_credits = 1344, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002faf300, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 4096, 
    ifl_cltype = 3, ifl_zone = 0xfffff800029c5000, ifl_sds = {ifsd_map = 0xfffffe00eabe8000, ifsd_m = 0xfffffe00eabdc000, ifsd_cl = 0xfffffe00eabe0000, ifsd_ba = 0xfffffe00eabe4000}, ifl_rxq = 0xfffffe00ea9f5000, 
    ifl_id = 1 '\001', ifl_buf_tag = 0xfffff80002faf500, ifl_ifdi = 0xfffff80002fc5750, ifl_bus_addrs = {8347934720, 8347938816, 8347942912, 8347824128, 8347828224, 8347832320, 8347836416, 8347840512, 8347844608, 8347848704, 
      8347852800, 8347856896, 8347860992, 8347865088, 8347869184, 8347873280, 8347877376, 8347881472, 8347758592, 8347762688, 8347770880, 8347774976, 8347779072, 8347783168, 8347787264, 8347791360, 8347795456, 8347799552, 
      8347803648, 8347807744, 8347811840, 8347930624}, ifl_rxd_idxs = {1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369, 
      1370, 1371, 1372, 1373, 2047, 1343}}}

And their bitmaps:
(kgdb) set $i=0
(kgdb) p/x *ctx->ifc_rxqs[$i/2].ifr_fl[$i++%2].ifl_rx_bitmap@32
$60 = {0xfffffffffffffffe, 0xffffffffffffffff <repeats 31 times>}
(kgdb) 
$61 = {0xffffffffffffffff <repeats 32 times>}

I am out of ideas what could have caused the full bitmap for fl 1 of rxq0 after receiving just one packet.  All other fields in the free list do not appear to be corrupt or inconsistent.  It's only ifl_rx_bitmap and ifl_rxd_idxs at position 30.
Comment 18 Andriy Gapon freebsd_committer freebsd_triage 2021-11-08 09:14:46 UTC
I think that comment #13 could be on spot.
After applying the following patch the problem hasn't been seen for several days (whereas previously it happened at least once per day on one of test machines).

@@ -2572,7 +2572,8 @@
                        bzero((void *)di->idi_vaddr, di->idi_size);
        }
        for (i = 0; i < scctx->isc_nrxqsets; i++, rxq++) {
-               /* make sure all transmitters have completed before proceeding XXX */
+               gtaskqueue_drain(rxq->ifr_task.gt_taskqueue,
+                   &rxq->ifr_task.gt_task);
 
                rxq->ifr_cq_cidx = 0;
                for (j = 0, di = rxq->ifr_ifdi; j < sctx->isc_nrxqs; j++, di++)

I cannot come up with an exact interaction between threads that could lead to the symptoms, but there certainly was a data race.
I used anonymous dtrace to capture initial network configuration and I saw some interleaving between _task_fn_rx and iflib_if_ioctl -> iflib_stop / iflib_init_locked. One specific ioctl that I saw was SIOCSIFCAP.
Comment 19 Andriy Gapon freebsd_committer freebsd_triage 2021-11-08 09:26:03 UTC
iflib_stop has a couple of 1 ms delays in it (after stopping the hardware).  I think that those could be enough for the taskqueues to complete any queued work on real hardware.
But on VMs, especially if the hypervisor is very busy (oversubscribed), the taskqueue threads can get delayed arbitrarily.
I think that that may explain why the problem is rarely seen and why we see it with vmxnet3 specifically.
Comment 20 commit-hook freebsd_committer freebsd_triage 2021-11-19 08:02:23 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1bfdb812c786ac2607a82633f9c84a5d16f54079

commit 1bfdb812c786ac2607a82633f9c84a5d16f54079
Author:     Andriy Gapon <avg@FreeBSD.org>
AuthorDate: 2021-11-19 07:56:30 +0000
Commit:     Andriy Gapon <avg@FreeBSD.org>
CommitDate: 2021-11-19 08:00:38 +0000

    iflib_stop: drain rx tasks to prevent any data races

    iflib_stop modifies iflib data structures that are used by _task_fn_rx,
    most prominently the free lists.  So, iflib_stop has to ensure that the
    rx task threads are not active.

    This should help to fix a crash seen when iflib_if_ioctl (e.g.,
    SIOCSIFCAP) is called while there is already traffic flowing.

    The crash has been seen on VMWare guests with vmxnet3 driver.

    My guess is that on physical hardware the couple of 1ms delays that
    iflib_stop has after disabling interrupts are enough for the queued work
    to be completed before any iflib state is touched.

    But on busy hypervisors the guests might not get enough CPU time to
    complete the work, thus there can be a race between the taskqueue
    threads and the work done to handle an ioctl, specifically in iflib_stop
    and iflib_init_locked.

    PR:             259458
    Reviewed by:    markj
    MFC after:      3 weeks
    Differential Revision:  https://reviews.freebsd.org/D32926

 sys/net/iflib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 21 commit-hook freebsd_committer freebsd_triage 2021-12-10 12:33:43 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5f24d2a82c1fa6555e550dcda99d0c1aae1e44b3

commit 5f24d2a82c1fa6555e550dcda99d0c1aae1e44b3
Author:     Andriy Gapon <avg@FreeBSD.org>
AuthorDate: 2021-11-19 07:56:30 +0000
Commit:     Andriy Gapon <avg@FreeBSD.org>
CommitDate: 2021-12-10 12:32:37 +0000

    iflib_stop: drain rx tasks to prevent any data races

    iflib_stop modifies iflib data structures that are used by _task_fn_rx,
    most prominently the free lists.  So, iflib_stop has to ensure that the
    rx task threads are not active.

    This should help to fix a crash seen when iflib_if_ioctl (e.g.,
    SIOCSIFCAP) is called while there is already traffic flowing.

    The crash has been seen on VMWare guests with vmxnet3 driver.

    My guess is that on physical hardware the couple of 1ms delays that
    iflib_stop has after disabling interrupts are enough for the queued work
    to be completed before any iflib state is touched.

    But on busy hypervisors the guests might not get enough CPU time to
    complete the work, thus there can be a race between the taskqueue
    threads and the work done to handle an ioctl, specifically in iflib_stop
    and iflib_init_locked.

    PR:             259458

    (cherry picked from commit 1bfdb812c786ac2607a82633f9c84a5d16f54079)

 sys/net/iflib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 22 commit-hook freebsd_committer freebsd_triage 2021-12-10 12:33:44 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f43d2e1199b9fd265b72281d2779e2554fa7cb6d

commit f43d2e1199b9fd265b72281d2779e2554fa7cb6d
Author:     Andriy Gapon <avg@FreeBSD.org>
AuthorDate: 2021-11-19 07:56:30 +0000
Commit:     Andriy Gapon <avg@FreeBSD.org>
CommitDate: 2021-12-10 12:33:12 +0000

    iflib_stop: drain rx tasks to prevent any data races

    iflib_stop modifies iflib data structures that are used by _task_fn_rx,
    most prominently the free lists.  So, iflib_stop has to ensure that the
    rx task threads are not active.

    This should help to fix a crash seen when iflib_if_ioctl (e.g.,
    SIOCSIFCAP) is called while there is already traffic flowing.

    The crash has been seen on VMWare guests with vmxnet3 driver.

    My guess is that on physical hardware the couple of 1ms delays that
    iflib_stop has after disabling interrupts are enough for the queued work
    to be completed before any iflib state is touched.

    But on busy hypervisors the guests might not get enough CPU time to
    complete the work, thus there can be a race between the taskqueue
    threads and the work done to handle an ioctl, specifically in iflib_stop
    and iflib_init_locked.

    PR:             259458

    (cherry picked from commit 1bfdb812c786ac2607a82633f9c84a5d16f54079)

 sys/net/iflib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)