Summary: | iflib_rxeof NULL pointer crash with vmxnet3 driver | ||
---|---|---|---|
Product: | Base System | Reporter: | Andriy Gapon <avg> |
Component: | kern | Assignee: | Andriy Gapon <avg> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | markj, net |
Priority: | --- | Keywords: | crash |
Version: | 12.2-RELEASE | Flags: | koobs:
mfc-stable13?
koobs: mfc-stable12? |
Hardware: | Any | ||
OS: | Any |
Description
Andriy Gapon
2021-10-26 13:15:55 UTC
(kgdb) fr 19 #19 0xffffffff8084d049 in iflib_rxd_pkt_get (rxq=0xfffffe00ea9f5000, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2737 2737 /usr/src/sys/net/iflib.c: No such file or directory. (kgdb) i loc sd = {ifsd_cl = 0xfffff80002d61a38, ifsd_m = 0xfffff80002d62a38, ifsd_fl = 0xfffff80002d93400} m = 0xfffff80123211c00 (kgdb) p m->m_data $1 = (caddr_t) 0xfffff80123211c58 "" (kgdb) p sd.ifsd_cl $2 = (caddr_t *) 0xfffff80002d61a38 (kgdb) p *sd.ifsd_cl $3 = (caddr_t) 0x0 (kgdb) fr 20 #20 iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2879 2879 in /usr/src/sys/net/iflib.c (kgdb) i loc ri = {iri_qsidx = 0, iri_vtag = 0, iri_len = 60, iri_cidx = 328, iri_ifp = 0xfffff80002d9e000, iri_frags = 0xfffffe00ea9f5180, iri_flowid = 0, iri_csum_flags = 0, iri_csum_data = 0, iri_flags = 0 '\000', iri_nfrags = 1 '\001', iri_rsstype = 0 '\000', iri_pad = 0 '\000'} ctx = 0xfffff80002dd2000 lro_possible = <error reading variable lro_possible (Cannot access memory at address 0x0)> v4_forwarding = <error reading variable v4_forwarding (Cannot access memory at address 0x0)> v6_forwarding = <error reading variable v6_forwarding (Cannot access memory at address 0x0)> retval = <error reading variable retval (Cannot access memory at address 0x0)> scctx = <optimized out> sctx = 0xffffffff810f1100 <vmxnet3_sctx_init> rx_pkts = <error reading variable rx_pkts (Cannot access memory at address 0x0)> rx_bytes = <error reading variable rx_bytes (Cannot access memory at address 0x0)> mh = 0xfffff800b371d100 mt = 0xfffff800b371d100 ifp = 0xfffff80002d9e000 cidxp = 0xfffffe00ea9f5018 avail = 1 budget_left = 15 err = <optimized out> m = <optimized out> i = <optimized out> fl = <optimized out> mf = <optimized out> lro_enabled = <optimized out> (kgdb) p *cidxp $4 = 328 (kgdb) p ri.iri_frags[0] $5 = {irf_flid = 0 '\000', irf_idx = 327, irf_len = 60} (kgdb) fr 19 #19 0xffffffff8084d049 in iflib_rxd_pkt_get (rxq=0xfffffe00ea9f5000, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2737 2737 /usr/src/sys/net/iflib.c: No such file or directory. (kgdb) p *rxq $6 = {ifr_ctx = 0xfffff80002dd2000, ifr_fl = 0xfffff80002d93400, ifr_rx_irq = 0, ifr_cq_cidx = 328, ifr_id = 0, ifr_nfl = 2 '\002', ifr_ntxqirq = 1 '\001', ifr_txqid = "\000\000\000", ifr_fl_offset = 1 '\001', ifr_lc = { ifp = 0xfffff80002d9e000, lro_mbuf_data = 0xfffffe00ea9f1000, lro_queued = 0, lro_flushed = 0, lro_bad_csum = 0, lro_cnt = 8, lro_mbuf_count = 0, lro_mbuf_max = 512, lro_ackcnt_lim = 65535, lro_length_lim = 65535, lro_hashsz = 509, lro_hash = 0xfffff8000410d000, lro_active = {lh_first = 0x0}, lro_free = {lh_first = 0xfffffe00ea9f33f0}}, ifr_task = {gt_task = {ta_link = {stqe_next = 0x0}, ta_flags = 2, ta_priority = 0, ta_func = 0xffffffff8084cd90 <_task_fn_rx>, ta_context = 0xfffffe00ea9f5000}, gt_taskqueue = 0xfffff800020c7200, gt_list = {le_next = 0x0, le_prev = 0xfffffe00015f08a8}, gt_uniq = 0xfffffe00ea9f5000, gt_name = "rxq0", '\000' <repeats 27 times>, gt_irq = 257, gt_cpu = 0}, ifr_watchdog = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0x0, c_lock = 0x0, c_flags = 0, c_iflags = 16, c_cpu = 0, c_exec_time = 0, c_lines = {u128 = 1528, u16 = {1528, 0, 0, 0, 0, 0, 0, 0}}}, ifr_filter_info = { ifi_filter = 0xffffffff80a3c580 <vmxnet3_rxq_intr>, ifi_filter_arg = 0xfffff80004110000, ifi_task = 0xfffffe00ea9f5088, ifi_ctx = 0xfffffe00ea9f5000}, ifr_ifdi = 0xfffff80002d99400, ifr_frags = {{irf_flid = 0 '\000', irf_idx = 327, irf_len = 60}, {irf_flid = 0 '\000', irf_idx = 0, irf_len = 0} <repeats 63 times>}} (kgdb) p rxq->ifr_fl[0] $7 = {ifl_cidx = 328, ifl_pidx = 341, ifl_credits = 509, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002cb5ec0, ifl_fragidx = 142, ifl_size = 512, ifl_buf_size = 2048, ifl_cltype = 1, ifl_zone = 0xfffff800029c6000, ifl_sds = {ifsd_map = 0xfffff80002d5f000, ifsd_m = 0xfffff80002d62000, ifsd_cl = 0xfffff80002d61000, ifsd_ba = 0xfffff80002d60000}, ifl_rxq = 0xfffffe00ea9f5000, ifl_id = 0 '\000', ifl_buf_tag = 0xfffff80002d74400, ifl_ifdi = 0xfffff80002d99428, ifl_bus_addrs = {4884103168, 4884094976, 4887971840, 4887965696, 4898656256, 4898662400, 4898660352, 4898617344, 4753053696, 4753018880, 4753020928, 4883597312, 4898639872, 4898646016, 4898643968, 4898650112, 4884144128, 4884150272, 4884148224, 4884154368, 4884152320, 4884158464, 4884156416, 4884162560, 4884160512, 4884166656, 4884111360, 4884117504, 4884115456, 4884121600, 4884119552, 4884125696}, ifl_rxd_idxs = {141, 137, 120, 121, 323, 324, 325, 326, 0, 1, 2, 3, 315, 316, 317, 318, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511}} (kgdb) p $7.ifl_sds.ifsd_cl[327] $8 = (caddr_t) 0x0 (kgdb) p $7.ifl_sds.ifsd_cl[326] $9 = (caddr_t) 0xfffff80123faf800 "\377\377\377\377\377\377" (kgdb) p $7.ifl_sds.ifsd_cl[328] $10 = (caddr_t) 0xfffff8012322b800 "\377\377\377\377\377\377" (kgdb) p *sctx $16 = {isc_magic = 3405705229, isc_driver = 0xffffffff810f1280 <vmxnet3_iflib_driver>, isc_q_align = 512, isc_tx_maxsize = 65536, isc_tx_maxsegsize = 16383, isc_tso_maxsize = 65550, isc_tso_maxsegsize = 16383, isc_rx_maxsize = 16383, isc_rx_maxsegsize = 16383, isc_rx_nsegments = 1, isc_admin_intrcnt = 1, isc_vendor_info = 0xffffffff810f12b0 <vmxnet3_vendor_info_array>, isc_driver_version = 0xffffffff80aead7c "2", isc_parse_devinfo = 0x0, isc_nrxd_min = {32, 32, 32, 0, 0, 0, 0, 0}, isc_nrxd_default = {512, 512, 512, 0, 0, 0, 0, 0}, isc_nrxd_max = {2048, 2048, 2048, 0, 0, 0, 0, 0}, isc_ntxd_min = {32, 32, 0, 0, 0, 0, 0, 0}, isc_ntxd_default = {512, 512, 0, 0, 0, 0, 0, 0}, isc_ntxd_max = {4096, 4096, 0, 0, 0, 0, 0, 0}, isc_nfl = 2, isc_ntxqs = 2, isc_nrxqs = 3, isc_rx_process_limit = 0, isc_tx_reclaim_thresh = 0, isc_flags = 262153, isc_name = 0x0} (kgdb) p rxq->ifr_ctx->ifc_softc_ctx $22 = {isc_vectors = 9, isc_nrxqsets = 8, isc_ntxqsets = 8, isc_min_tx_latency = 0 '\000', isc_rx_mvec_enable = 0 '\000', isc_txrx_budget_bytes_max = 2097152, isc_msix_bar = 24, isc_tx_nsegments = 32, isc_ntxd = {512, 512, 0, 0, 0, 0, 0, 0}, isc_nrxd = {1024, 512, 512, 0, 0, 0, 0, 0}, isc_txqsizes = {8192, 8192, 0, 0, 0, 0, 0, 0}, isc_rxqsizes = {16384, 8192, 8192, 0, 0, 0, 0, 0}, isc_txd_size = "\000\000\000\000\000\000\000", isc_rxd_size = "\000\000\000\000\000\000\000", isc_tx_tso_segments_max = 32, isc_tx_tso_size_max = 65532, isc_tx_tso_segsize_max = 16383, isc_tx_csum_flags = 5654, isc_capabilities = 6621115, isc_capenable = 6554555, isc_rss_table_size = 128, isc_rss_table_mask = 127, isc_nrxqsets_max = 8, isc_ntxqsets_max = 8, isc_tx_qdepth = 2048, isc_intr = IFLIB_INTR_MSIX, isc_rxd_buf_size = {0, 0, 4096, 0, 0, 0, 0, 0}, isc_max_frame_size = 22, isc_min_frame_size = 0, isc_pause_frames = 0, isc_vendor_info = {pvi_vendor_id = 0, pvi_device_id = 0, pvi_subvendor_id = 0, pvi_subdevice_id = 0, pvi_rev_id = 0, pvi_class_mask = 0, pvi_name = 0x0}, isc_disable_msix = 0, isc_txrx = 0xffffffff810f1000 <vmxnet3_txrx>} (kgdb) p ri $26 = {iri_qsidx = 0, iri_vtag = 0, iri_len = 60, iri_cidx = 328, iri_ifp = 0xfffff80002d9e000, iri_frags = 0xfffffe00ea9f5180, iri_flowid = 0, iri_csum_flags = 0, iri_csum_data = 0, iri_flags = 0 '\000', iri_nfrags = 1 '\001', iri_rsstype = 0 '\000', iri_pad = 0 '\000'} (kgdb) p $26.iri_frags[0] $33 = {irf_flid = 0 '\000', irf_idx = 327, irf_len = 60} (kgdb) fr 19 #19 0xffffffff8084d049 in iflib_rxd_pkt_get (rxq=0xfffffe00ea9f5000, ri=<optimized out>) at /usr/src/sys/net/iflib.c:2737 2737 in /usr/src/sys/net/iflib.c (kgdb) i loc sd = {ifsd_cl = 0xfffff80002d61a38, ifsd_m = 0xfffff80002d62a38, ifsd_fl = 0xfffff80002d93400} m = 0xfffff80123211c00 (kgdb) p sd.ifsd_cl[0] $27 = (caddr_t) 0x0 (kgdb) p sd.ifsd_cl[1] $28 = (caddr_t) 0xfffff8012322b800 "\377\377\377\377\377\377" (kgdb) p sd.ifsd_cl[2] $29 = (caddr_t) 0xfffff8012322a000 "\377\377\377\377\377\377" (kgdb) p sd.ifsd_m[0] $30 = (struct mbuf *) 0x0 (kgdb) p sd.ifsd_m[1] $31 = (struct mbuf *) 0xfffff80123211b00 (kgdb) p sd.ifsd_m[2] $32 = (struct mbuf *) 0xfffff80123211a00 I noticed a discrepancy between ifl_cidx / iri_cidx / ifr_cq_cidx that are equal to 328 and irf_idx that's set to 327. Initially, I thought that this could be a come back of an older problem: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243126#c2 But in this case I do not see any zero-length packets near the current index. In fact, given that vxcr_zero_length is zero, there hadn't been any zero-length packets at all. Looking at the code in vmxnet3_isc_rxd_pkt_get(), I think that iri_cidx != irf_idx is not a problem. irf_idx is the last processed fragment's (packet's) descriptor ID, while iri_cidx is the next one. So, everything is correct there. (kgdb) p $20->ifc_softc $34 = (void *) 0xfffff80002d93800 (kgdb) p *(struct vmxnet3_softc *)$20->ifc_softc $35 = {vmx_dev = 0xfffff80002db6300, vmx_ctx = 0xfffff80002dd2000, vmx_sctx = 0xffffffff810f1100 <vmxnet3_sctx_init>, vmx_scctx = 0xfffff80002dd2048, vmx_ifp = 0xfffff80002d9e000, vmx_ds = 0xfffff80002d91400, vmx_flags = 2, vmx_rxq = 0xfffff80004110000, vmx_txq = 0xfffff80004110800, vmx_res0 = 0xfffff80002d98b00, vmx_iot0 = 1, vmx_ioh0 = 18446735281866391552, vmx_res1 = 0xfffff80002d98a00, vmx_iot1 = 1, vmx_ioh1 = 18446735281866395648, vmx_link_active = 1, vmx_intr_mask_mode = 0, vmx_event_intr_idx = 8, vmx_event_intr_irq = {ii_res = 0xfffff80002d99d00, ii_rid = 9, ii_tag = 0xfffff80004413e80}, vmx_mcast = 0xfffff8000440bc00 "", vmx_rss = 0xfffff8000440bb00, vmx_ds_dma = {idi_paddr = 47780864, idi_vaddr = 0xfffff80002d91400 "\341\376\276\272", idi_tag = 0xfffff8000440b900, idi_map = 0x0, idi_size = 720}, vmx_qs_dma = {idi_paddr = 68214784, idi_vaddr = 0xfffff8000410e000 "", idi_tag = 0xfffff80002db4c00, idi_map = 0x0, idi_size = 4096}, vmx_mcast_dma = {idi_paddr = 71351296, idi_vaddr = 0xfffff8000440bc00 "", idi_tag = 0xfffff8000440b700, idi_map = 0x0, idi_size = 192}, vmx_rss_dma = {idi_paddr = 71351040, idi_vaddr = 0xfffff8000440bb00 "\017", idi_tag = 0xfffff8000440b800, idi_map = 0x0, idi_size = 176}, vmx_media = 0xfffff80002dd22f0, vmx_vlan_filter = { 0 <repeats 128 times>}, vmx_lladdr = "\000PV\246\237\""} (kgdb) p $35.vmx_rxq[0] $36 = {vxrxq_sc = 0xfffff80002d93800, vxrxq_id = 0, vxrxq_intr_idx = 0, vxrxq_irq = {ii_res = 0xfffff80002df8f00, ii_rid = 1, ii_tag = 0xfffff80002d99000}, vxrxq_cmd_ring = {{vxrxr_rxd = 0xfffffe00eaaf4000, vxrxr_ndesc = 512, vxrxr_gen = 0, vxrxr_paddr = 57622528, vxrxr_desc_skips = 1017, vxrxr_refill_start = 142}, {vxrxr_rxd = 0xfffffe00eaaf6000, vxrxr_ndesc = 512, vxrxr_gen = 1, vxrxr_paddr = 57630720, vxrxr_desc_skips = 0, vxrxr_refill_start = 511}}, vxrxq_comp_ring = {vxcr_u = {txcd = 0xfffffe00eaaf0000, rxcd = 0xfffffe00eaaf0000}, vxcr_next = 0, vxcr_ndesc = 1024, vxcr_gen = 1, vxcr_paddr = 57606144, vxcr_zero_length = 0, vxcr_pkt_errors = 0}, vxrxq_rs = 0xfffff8000410e800, vxrxq_sysctl = 0xfffff80004415480, vxrxq_name = "vmx0-rx0\000\000\000\000\000\000\000"} (kgdb) p $36.vxrxq_comp_ring $37 = {vxcr_u = {txcd = 0xfffffe00eaaf0000, rxcd = 0xfffffe00eaaf0000}, vxcr_next = 0, vxcr_ndesc = 1024, vxcr_gen = 1, vxcr_paddr = 57606144, vxcr_zero_length = 0, vxcr_pkt_errors = 0} (kgdb) p $37.vxcr_u.rxcd[325] $38 = {rxd_idx = 325, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, fragment = 0, fcs = 0, type = 3, gen = 1} (kgdb) p $37.vxcr_u.rxcd[326] $39 = {rxd_idx = 326, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, fragment = 0, fcs = 0, type = 3, gen = 1} (kgdb) p $37.vxcr_u.rxcd[327] $40 = {rxd_idx = 327, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, fragment = 0, fcs = 0, type = 3, gen = 1} (kgdb) p $37.vxcr_u.rxcd[328] $41 = {rxd_idx = 328, pad1 = 0, eop = 1, sop = 1, qid = 0, rss_type = 0, no_csum = 1, pad2 = 0, rss_hash = 0, len = 60, error = 0, vlan = 0, vtag = 0, csum = 0, csum_ok = 0, udp = 0, tcp = 0, ipcsum_ok = 0, ipv6 = 0, ipv4 = 0, fragment = 0, fcs = 0, type = 3, gen = 1} In the end, it looks like somehow index 327 was not re-filled earlier. Or it was somehow consumed twice in a row... (kgdb) p/x *$2.ifl_rx_bitmap@8 $73 = {0xffffffffffffffff, 0xffffffffffffffff, 0x3fff, 0x0, 0x0, 0xffffffffffffff00, 0xffffffffffffffff, 0xffffffffffffffff} (kgdb) p $2.ifl_sds.ifsd_m[327] $74 = (struct mbuf *) 0x0 (kgdb) p $2.ifl_sds.ifsd_m[328] $75 = (struct mbuf *) 0xfffff80123211b00 (kgdb) p $2.ifl_sds.ifsd_cl[327] $76 = (caddr_t) 0x0 (kgdb) p $2.ifl_sds.ifsd_cl[328] $77 = (caddr_t) 0xfffff8012322b800 "\377\377\377\377\377\377" ifl_rx_bitmap has bits [0, 141] set, bits [142, 327] clear and bits [328, 511] set again. Given ifl_fragidx = 142 the latest refill went up to 141 which was what we actually see. So, it was not the latest refill. Do you have allocation failures in the mbuf zone? On 12.2 I think you can simply check the uz_fails field of the zone. It seems that I got a little bit confused again with respect to ifl_rx_bitmap and ifl_sds.ifsd_m. iflib_rxd_pkt_get() calls rxd_frag_to_sd() before the memcpy that crashed and the latter clears the current bit in ifl_rx_bitmap. Also, *sd.ifsd_m was assigned NULL in iflib_rxd_pkt_get(). And originally I established that 'm' was not NULL, so originally ifsd_m was not NULL. So, it looks that index 327 was actually filled when the call started. The problem was only with ifsd_cl. Somehow it was not refilled and was NULL. (In reply to Mark Johnston from comment #8) Not seeing any fails in either the mbuf or cluster zone: (kgdb) p *$2.ifl_zone $81 = {uz_lockptr = 0xfffff800029c70e0, uz_name = 0xffffffff80ab810f "mbuf_cluster", uz_domain = 0xfffff800029c6580, uz_flags = 264, uz_size = 2048, uz_ctor = 0xffffffff80721440 <mb_ctor_clust>, uz_dtor = 0x0, uz_init = 0x0, uz_fini = 0x0, uz_import = 0xffffffff80981940 <zone_import>, uz_release = 0xffffffff8097dbc0 <zone_release>, uz_arg = 0xfffff800029c6000, uz_slab = 0xffffffff809818a0 <zone_fetch_slab>, uz_count = 12, uz_count_min = 5, uz_link = {le_next = 0x0, le_prev = 0xfffff800029c7110}, uz_kegs = {lh_first = 0xfffff800029c60a0}, uz_lock = {lock_object = {lo_name = 0xffffffff80ab810f "mbuf_cluster", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, uz_klink = {kl_link = {le_next = 0x0, le_prev = 0xfffff800029c6078}, kl_keg = 0xfffff800029c70e0}, uz_warning = 0xffffffff80abda84 "kern.ipc.nmbclusters limit reached", uz_ratecheck = {tv_sec = 0, tv_usec = 0}, uz_maxaction = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0xffffffff807213f0 <mb_reclaim>, ta_context = 0xfffff800029c6000}, uz_allocs = 0, uz_fails = 0, uz_frees = 0, uz_sleeps = 0, uz_cpu = 0xfffff800029c6180} (kgdb) p *zone_mbuf $82 = {uz_lockptr = 0xfffff800029c7000, uz_name = 0xffffffff80a8de52 "mbuf", uz_domain = 0xfffff800028e6b40, uz_flags = 0, uz_size = 256, uz_ctor = 0xffffffff80721320 <mb_ctor_mbuf>, uz_dtor = 0xffffffff807213d0 <mb_dtor_mbuf>, uz_init = 0x0, uz_fini = 0x0, uz_import = 0xffffffff80981940 <zone_import>, uz_release = 0xffffffff8097dbc0 <zone_release>, uz_arg = 0xfffff800028e65c0, uz_slab = 0xffffffff809818a0 <zone_fetch_slab>, uz_count = 253, uz_count_min = 253, uz_link = {le_next = 0xfffff800029c65c0, le_prev = 0xfffff800029c7030}, uz_kegs = {lh_first = 0xfffff800028e6660}, uz_lock = {lock_object = {lo_name = 0xffffffff80a8de52 "mbuf", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, uz_klink = {kl_link = {le_next = 0x0, le_prev = 0xfffff800028e6638}, kl_keg = 0xfffff800029c7000}, uz_warning = 0xffffffff80acfe1a "kern.ipc.nmbufs limit reached", uz_ratecheck = {tv_sec = 0, tv_usec = 0}, uz_maxaction = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0xffffffff807213f0 <mb_reclaim>, ta_context = 0xfffff800028e65c0}, uz_allocs = 0, uz_fails = 0, uz_frees = 0, uz_sleeps = 0, uz_cpu = 0xfffff800028e6740} Also, got a different crash but quite possibly related: <6>vmx0: link state changed to UP Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x1ea09e000 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80a5dd07 stack pointer = 0x0:0xfffffe00c85cb930 frame pointer = 0x0:0xfffffe00c85cb960 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_0) trap number = 12 panic: page fault cpuid = 0 time = 1635193519 KDB: stack backtrace: stack1 db_trace_self_wrapper+0x2b vpanic+0x182 panic+0x43 trap_fatal+0x391 trap_pfault+0x4f trap+0x286 calltrap+0x8 stack2 bounce_bus_dmamap_sync+0x17 iflib_fl_refill+0x31b _task_fn_rx+0x84b gtaskqueue_run_locked+0xed gtaskqueue_thread_loop+0x7e fork_exit+0x6d fork_trampoline+0xe KDB: enter: panic I think that I need to try to rule out bug 253473 before anything else. (In reply to Andriy Gapon from comment #12) Applied the fix but still seeing variations of the crash. The data I see makes me think that somehow concurrent calls to iflib_fl_refill() for the same iflib_fl_t (free list) are happening. One hypothetical possibility is that iflib_stop/iflib_fl_setup are not sufficiently synchronized with the taskqueue threads (that call _task_fn_rx). I am not familiar with grouptask / taskqgroup code, but I do not see any drain-like calls. So, I wonder if there could be a race between an enqueued task and the stop/setup operations. Some data from the latest crash I've got. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x15fc12000 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80a5dd07 stack pointer = 0x28:0xfffffe00c85cb930 frame pointer = 0x28:0xfffffe00c85cb960 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_0) trap number = 12 panic: page fault cpuid = 0 time = 1635553138 KDB: stack backtrace: stack1 db_trace_self_wrapper+0x2b vpanic+0x182 panic+0x43 trap_fatal+0x391 trap_pfault+0x4f trap+0x286 calltrap+0x8 stack2 bounce_bus_dmamap_sync+0x17 iflib_fl_refill+0x31b _task_fn_rx+0x84b gtaskqueue_run_locked+0xed gtaskqueue_thread_loop+0x7e fork_exit+0x6d fork_trampoline+0xe (kgdb) bt [snip] #13 0xffffffff8073dc23 in panic (fmt=0xffffffff81178120 <cnputs_mtx+24> "") at /usr/src/sys/kern/kern_shutdown.c:909 #14 0xffffffff809d8b31 in trap_fatal (frame=0xfffffe00c85cb870, eva=5901459456) at /usr/src/sys/amd64/amd64/trap.c:921 #15 0xffffffff809d8b8f in trap_pfault (frame=0xfffffe00c85cb870, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:739 #16 0xffffffff809d8256 in trap (frame=0xfffffe00c85cb870) at /usr/src/sys/amd64/amd64/trap.c:405 #17 <signal handler called> #18 0xffffffff80a5dd07 in bounce_bus_dmamap_sync (dmat=0xfffff80002d83400, map=0x15fc12000, op=1) at /usr/src/sys/x86/x86/busdma_bounce.c:973 #19 0xffffffff8085104b in bus_dmamap_sync (dmat=0xfffff80002d83400, map=0x15fc12000, op=<error reading variable: Cannot access memory at address 0x1>) at /usr/src/sys/x86/include/bus_dma.h:125 #20 iflib_fl_refill (ctx=0xfffff80002dd7000, fl=<optimized out>, count=<optimized out>) at /usr/src/sys/net/iflib.c:2109 #21 0xffffffff8084d5db in iflib_fl_refill_all (ctx=0xfffff80002dd7000, fl=0xfffff80002d955c0) at /usr/src/sys/net/iflib.c:2188 #22 iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2899 #23 _task_fn_rx (context=<optimized out>) at /usr/src/sys/net/iflib.c:3868 #24 0xffffffff807808bd in gtaskqueue_run_locked (queue=0xfffff800020c7200) at /usr/src/sys/kern/subr_gtaskqueue.c:362 #25 0xffffffff8078068e in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:537 #26 0xffffffff8070792d in fork_exit (callout=0xffffffff80780610 <gtaskqueue_thread_loop>, arg=0xfffffe00007f8008, frame=0xfffffe00c85cbc00) at /usr/src/sys/kern/kern_fork.c:1088 (kgdb) fr 21 #21 0xffffffff8084d5db in iflib_fl_refill_all (ctx=0xfffff80002dd7000, fl=0xfffff80002d955c0) at /usr/src/sys/net/iflib.c:2188 2188 in /usr/src/sys/net/iflib.c (kgdb) p fl $1 = (iflib_fl_t) 0xfffff80002d955c0 (kgdb) fr 20 #20 iflib_fl_refill (ctx=0xfffff80002dd7000, fl=<optimized out>, count=<optimized out>) at /usr/src/sys/net/iflib.c:2109 2109 /usr/src/sys/net/iflib.c: No such file or directory. (kgdb) i loc iru = {iru_paddrs = 0xfffff80002d95640, iru_idxs = 0xfffff80002d95740, iru_pidx = 1888, iru_qsidx = 0, iru_count = 32, iru_buf_size = 4096, iru_flidx = 1 '\001'} cb_arg = {error = 0, seg = {ds_addr = 5901459456, ds_len = 4096}, nseg = 1} sd_m = 0xfffffe00eabdc000 sd_map = 0xfffffe00eabe8000 sd_cl = 0xfffffe00eabe0000 sd_ba = 0xfffffe00eabe4000 idx = 1949 pidx = 1920 frag_idx = -1 n = <optimized out> i = 29 credits = 1949 bus_addr = 18446735283517988864 cl = <optimized out> err = <optimized out> m = <optimized out> (kgdb) p $1.ifl_size $4 = 2048 (kgdb) p/x *$1.ifl_rx_bitmap@32 $7 = {0xffffffffffffffff <repeats 32 times>} (kgdb) p *$1 $8 = {ifl_cidx = 0, ifl_pidx = 1920, ifl_credits = 1920, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002d83200, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 4096, ifl_cltype = 3, ifl_zone = 0xfffff800029c5000, ifl_sds = {ifsd_map = 0xfffffe00eabe8000, ifsd_m = 0xfffffe00eabdc000, ifsd_cl = 0xfffffe00eabe0000, ifsd_ba = 0xfffffe00eabe4000}, ifl_rxq = 0xfffffe00ea9f5000, ifl_id = 1 '\001', ifl_buf_tag = 0xfffff80002d83400, ifl_ifdi = 0xfffff80002d9b4d0, ifl_bus_addrs = {5901619200, 5901623296, 5901533184, 5901537280, 5901541376, 5901545472, 5901549568, 5901553664, 5901557760, 5901561856, 5901565952, 5901570048, 5901574144, 5901488128, 5901492224, 5901496320, 5901500416, 5901504512, 5901508608, 5901512704, 5901516800, 5901520896, 5901524992, 5901529088, 5901443072, 5901447168, 5901451264, 5901455360, 5901459456, 5901602816, 5901606912, 5901611008}, ifl_rxd_idxs = {1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 2047, 1917, 1918, 1919}} Things of note: - frag_idx = -1 and ifl_rx_bitmap is indeed full - i = 29 and there is a jump from 1947 to 2047 (maximum index as ifl_size = 2048) in ifl_rxd_idxs at positions 27 and 28 This makes me suspect that a concurrent refill topped the free list while the refill in question was running. Also of note is that in all cases I see vmx0: link state changed to UP shortly before the crash. Actually, I am not sure that comment #0 and comment #14 (and comment #11) are the same problem. It could very well be that the crash from comment #0 is really fixed by the fix from bug 253473. The more I dig into it, the more the problem in comment #14 is starting to look like uninitialized memory in ifl_rx_bitmap rather than a race. Some additional observations from another crash of exactly the same kind. There are 8 receive queues with 2 free lists per each. As far as I can tell, all free lists had been initialized and had initial 128 credits. Just a single packet had been received. It was on rxq0 and its descriptor matched free list number 0. So, a single credit was consumed from that free list and its cidx was advanced to 1. After that the free list was topped up with credits. Additionally the code was also topping up the other free list for rxq0 and that's when the problem happened. iflib_fl_refill() was called with count of 1919 (2048 - 128 - 1) and it was able to fill 1247 credits before the free list's bitmap became full somehow... Free lists for receive queues other than zero look like this: (kgdb) p *ctx->ifc_rxqs[$i++].ifr_fl@2 $52 = {{ifl_cidx = 0, ifl_pidx = 128, ifl_credits = 128, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002fb0c00, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 2048, ifl_cltype = 1, ifl_zone = 0xfffff800029c6000, ifl_sds = {ifsd_map = 0xfffffe00eac18000, ifsd_m = 0xfffffe00eabfc000, ifsd_cl = 0xfffffe00eac10000, ifsd_ba = 0xfffffe00eac14000}, ifl_rxq = 0xfffffe00ea9f5300, ifl_id = 0 '\000', ifl_buf_tag = 0xfffff80002fb0e00, ifl_ifdi = 0xfffff80002fc56a8, ifl_bus_addrs = {6106355712, 6106349568, 6106351616, 6106345472, 6106347520, 6106361856, 6106363904, 6106357760, 6106359808, 6106353664, 6106376192, 6106370048, 6106372096, 6106365952, 6106368000, 6106382336, 6106384384, 6106378240, 6106380288, 6106374144, 6106396672, 6106390528, 6106392576, 6106386432, 6106388480, 6106402816, 6106404864, 6106398720, 6106400768, 6106394624, 6104950784, 6106415104}, ifl_rxd_idxs = {96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127}}, { ifl_cidx = 0, ifl_pidx = 128, ifl_credits = 128, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002fb0b00, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 4096, ifl_cltype = 3, ifl_zone = 0xfffff800029c5000, ifl_sds = {ifsd_map = 0xfffffe00eac28000, ifsd_m = 0xfffffe00eac1c000, ifsd_cl = 0xfffffe00eac20000, ifsd_ba = 0xfffffe00eac24000}, ifl_rxq = 0xfffffe00ea9f5300, ifl_id = 1 '\001', ifl_buf_tag = 0xfffff80002fb0d00, ifl_ifdi = 0xfffff80002fc56d0, ifl_bus_addrs = {8338677760, 8338681856, 8338685952, 8338690048, 8338620416, 8338624512, 8338628608, 8338632704, 6105874432, 6105878528, 6105882624, 6105886720, 6105890816, 6105894912, 6105899008, 6105903104, 6105907200, 6105911296, 6105915392, 6105919488, 6105923584, 6105927680, 6105931776, 6105935872, 6105939968, 6105944064, 6105948160, 6105952256, 6105956352, 6105960448, 6105964544, 6105968640}, ifl_rxd_idxs = {96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127}}} And their bitmaps are all like this: $62 = {0xffffffffffffffff, 0xffffffffffffffff, 0x0 <repeats 30 times>} Here are the free lists of rxq0 at the time of the crash: $51 = {{ifl_cidx = 1, ifl_pidx = 0, ifl_credits = 2047, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002faf400, ifl_fragidx = 0, ifl_size = 2048, ifl_buf_size = 2048, ifl_cltype = 1, ifl_zone = 0xfffff800029c6000, ifl_sds = {ifsd_map = 0xfffffe00eabd8000, ifsd_m = 0xfffffe00eabcc000, ifsd_cl = 0xfffffe00eabd0000, ifsd_ba = 0xfffffe00eabd4000}, ifl_rxq = 0xfffffe00ea9f5000, ifl_id = 0 '\000', ifl_buf_tag = 0xfffff80002faf600, ifl_ifdi = 0xfffff80002fc5728, ifl_bus_addrs = {6101612544, 6101651456, 6101649408, 6101639168, 6101641216, 6101626880, 6101628928, 6101635072, 6101637120, 6101659648, 6101661696, 6101647360, 6101624832, 6101618688, 6101620736, 6101614592, 6101616640, 6101671936, 6101673984, 6101667840, 6101669888, 6101622784, 6101678080, 6101680128, 6101682176, 6101655552, 6101657600, 6101786624, 6101788672, 6101766144, 6101768192, 6101676032}, ifl_rxd_idxs = {2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039, 2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047}}, {ifl_cidx = 0, ifl_pidx = 1344, ifl_credits = 1344, ifl_gen = 0 '\000', ifl_rxd_size = 0 '\000', ifl_rx_bitmap = 0xfffff80002faf300, ifl_fragidx = 128, ifl_size = 2048, ifl_buf_size = 4096, ifl_cltype = 3, ifl_zone = 0xfffff800029c5000, ifl_sds = {ifsd_map = 0xfffffe00eabe8000, ifsd_m = 0xfffffe00eabdc000, ifsd_cl = 0xfffffe00eabe0000, ifsd_ba = 0xfffffe00eabe4000}, ifl_rxq = 0xfffffe00ea9f5000, ifl_id = 1 '\001', ifl_buf_tag = 0xfffff80002faf500, ifl_ifdi = 0xfffff80002fc5750, ifl_bus_addrs = {8347934720, 8347938816, 8347942912, 8347824128, 8347828224, 8347832320, 8347836416, 8347840512, 8347844608, 8347848704, 8347852800, 8347856896, 8347860992, 8347865088, 8347869184, 8347873280, 8347877376, 8347881472, 8347758592, 8347762688, 8347770880, 8347774976, 8347779072, 8347783168, 8347787264, 8347791360, 8347795456, 8347799552, 8347803648, 8347807744, 8347811840, 8347930624}, ifl_rxd_idxs = {1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369, 1370, 1371, 1372, 1373, 2047, 1343}}} And their bitmaps: (kgdb) set $i=0 (kgdb) p/x *ctx->ifc_rxqs[$i/2].ifr_fl[$i++%2].ifl_rx_bitmap@32 $60 = {0xfffffffffffffffe, 0xffffffffffffffff <repeats 31 times>} (kgdb) $61 = {0xffffffffffffffff <repeats 32 times>} I am out of ideas what could have caused the full bitmap for fl 1 of rxq0 after receiving just one packet. All other fields in the free list do not appear to be corrupt or inconsistent. It's only ifl_rx_bitmap and ifl_rxd_idxs at position 30. I think that comment #13 could be on spot. After applying the following patch the problem hasn't been seen for several days (whereas previously it happened at least once per day on one of test machines). @@ -2572,7 +2572,8 @@ bzero((void *)di->idi_vaddr, di->idi_size); } for (i = 0; i < scctx->isc_nrxqsets; i++, rxq++) { - /* make sure all transmitters have completed before proceeding XXX */ + gtaskqueue_drain(rxq->ifr_task.gt_taskqueue, + &rxq->ifr_task.gt_task); rxq->ifr_cq_cidx = 0; for (j = 0, di = rxq->ifr_ifdi; j < sctx->isc_nrxqs; j++, di++) I cannot come up with an exact interaction between threads that could lead to the symptoms, but there certainly was a data race. I used anonymous dtrace to capture initial network configuration and I saw some interleaving between _task_fn_rx and iflib_if_ioctl -> iflib_stop / iflib_init_locked. One specific ioctl that I saw was SIOCSIFCAP. iflib_stop has a couple of 1 ms delays in it (after stopping the hardware). I think that those could be enough for the taskqueues to complete any queued work on real hardware. But on VMs, especially if the hypervisor is very busy (oversubscribed), the taskqueue threads can get delayed arbitrarily. I think that that may explain why the problem is rarely seen and why we see it with vmxnet3 specifically. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1bfdb812c786ac2607a82633f9c84a5d16f54079 commit 1bfdb812c786ac2607a82633f9c84a5d16f54079 Author: Andriy Gapon <avg@FreeBSD.org> AuthorDate: 2021-11-19 07:56:30 +0000 Commit: Andriy Gapon <avg@FreeBSD.org> CommitDate: 2021-11-19 08:00:38 +0000 iflib_stop: drain rx tasks to prevent any data races iflib_stop modifies iflib data structures that are used by _task_fn_rx, most prominently the free lists. So, iflib_stop has to ensure that the rx task threads are not active. This should help to fix a crash seen when iflib_if_ioctl (e.g., SIOCSIFCAP) is called while there is already traffic flowing. The crash has been seen on VMWare guests with vmxnet3 driver. My guess is that on physical hardware the couple of 1ms delays that iflib_stop has after disabling interrupts are enough for the queued work to be completed before any iflib state is touched. But on busy hypervisors the guests might not get enough CPU time to complete the work, thus there can be a race between the taskqueue threads and the work done to handle an ioctl, specifically in iflib_stop and iflib_init_locked. PR: 259458 Reviewed by: markj MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D32926 sys/net/iflib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5f24d2a82c1fa6555e550dcda99d0c1aae1e44b3 commit 5f24d2a82c1fa6555e550dcda99d0c1aae1e44b3 Author: Andriy Gapon <avg@FreeBSD.org> AuthorDate: 2021-11-19 07:56:30 +0000 Commit: Andriy Gapon <avg@FreeBSD.org> CommitDate: 2021-12-10 12:32:37 +0000 iflib_stop: drain rx tasks to prevent any data races iflib_stop modifies iflib data structures that are used by _task_fn_rx, most prominently the free lists. So, iflib_stop has to ensure that the rx task threads are not active. This should help to fix a crash seen when iflib_if_ioctl (e.g., SIOCSIFCAP) is called while there is already traffic flowing. The crash has been seen on VMWare guests with vmxnet3 driver. My guess is that on physical hardware the couple of 1ms delays that iflib_stop has after disabling interrupts are enough for the queued work to be completed before any iflib state is touched. But on busy hypervisors the guests might not get enough CPU time to complete the work, thus there can be a race between the taskqueue threads and the work done to handle an ioctl, specifically in iflib_stop and iflib_init_locked. PR: 259458 (cherry picked from commit 1bfdb812c786ac2607a82633f9c84a5d16f54079) sys/net/iflib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f43d2e1199b9fd265b72281d2779e2554fa7cb6d commit f43d2e1199b9fd265b72281d2779e2554fa7cb6d Author: Andriy Gapon <avg@FreeBSD.org> AuthorDate: 2021-11-19 07:56:30 +0000 Commit: Andriy Gapon <avg@FreeBSD.org> CommitDate: 2021-12-10 12:33:12 +0000 iflib_stop: drain rx tasks to prevent any data races iflib_stop modifies iflib data structures that are used by _task_fn_rx, most prominently the free lists. So, iflib_stop has to ensure that the rx task threads are not active. This should help to fix a crash seen when iflib_if_ioctl (e.g., SIOCSIFCAP) is called while there is already traffic flowing. The crash has been seen on VMWare guests with vmxnet3 driver. My guess is that on physical hardware the couple of 1ms delays that iflib_stop has after disabling interrupts are enough for the queued work to be completed before any iflib state is touched. But on busy hypervisors the guests might not get enough CPU time to complete the work, thus there can be a race between the taskqueue threads and the work done to handle an ioctl, specifically in iflib_stop and iflib_init_locked. PR: 259458 (cherry picked from commit 1bfdb812c786ac2607a82633f9c84a5d16f54079) sys/net/iflib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) |