Created attachment 247417 [details] info for a crash of 11na in an iwlwifi private development build Per discussion with Bojern, debug the firmware error "lkpi_ic_ampdu_rx_start: mo_ampdu_action returned -60" after enabling 11na in a private build for development. Attached log file contains reproduce method and the crash backtraces in detail. quote from Bjoern: "I would suggest you can go and start debugging the code path from the call from LinuxKPI to mo_ampdu_action -> iwlwifi -> downcall into the firmware and see if you find the problem that makes the firmware "stall"." quote of the firmware error messages: iwlwifi0: lkpi_ic_ampdu_rx_start: mo_ampdu_action returned -60. ni 0xfffffe00bdfbb000 rap 0xfffffe00bdfbbd58 iwlwifi0: HCMD_ACTIVE already clear for command UNKNOWN iwlwifi0: No beacon heard and the session protection is over already... iwlwifi0: linuxkpi_ieee80211_connection_loss: vif 0xfffffe00bdf1fc80 vap 0xfffffe00bdf1f010 state RUN iwlwifi0: Failed to send flush command (-5) iwlwifi0: flush request fail iwlwifi0: Couldn't send the SESSION_PROTECTION_CMD WARNING trans->state != IWL_TRANS_FW_ALIVE failed at /usr/src/sys/contrib/dev/iwlwifi/iwl-trans.h:1446 iwlwifi0: iwl_trans_wait_txq_empty bad state = 0 iwlwifi0: Failed to trigger RX queues sync (-5) iwlwifi0: Failed to synchronize multicast groups update iwlwifi0: Failed to send MAC_CONFIG_CMD (action:2): -5 iwlwifi0: failed to update MAC 0xfffffe00bdf1fc9eM iwlwifi0: Failed to send flush command (-5) iwlwifi0: Failed to send MAC_CONFIG_CMD (action:2): -5 iwlwifi0: failed to update MAC 0xfffffe00bdf1fc9eM iwlwifi0: Failed to send LINK_CONFIG_CMD (action:2): -5 iwlwifi0: PHY ctxt cmd error. ret=-5
For the iwlwifi0: lkpi_ic_ampdu_rx_start: mo_ampdu_action returned -60 error, "-60" means "-ETIMEDOUT". Here are the function call sequences: lkpi_ic_ampdu_rx_start lkpi_80211_mo_ampdu_action iwl_mvm_mac_ampdu_action iwl_mvm_sta_rx_agg iwl_mvm_fw_baid_op iwl_mvm_fw_baid_op_cmd iwl_mvm_send_cmd_pdu_status iwl_mvm_send_cmd_status iwl_trans_send_cmd iwl_trans_txq_send_hcmd iwl_trans_txq_send_hcmd_sync In iwl_trans_txq_send_hcmd_sync(), the code shows the event wait timeout: static int iwl_trans_txq_send_hcmd_sync(struct iwl_trans *trans, struct iwl_host_cmd *cmd) { ... cmd_idx = trans->ops->send_cmd(trans, cmd); << cmd_idx=20 in my print if (cmd_idx < 0) { ret = cmd_idx; clear_bit(STATUS_SYNC_HCMD_ACTIVE, &trans->status); IWL_ERR(trans, "Error sending %s: enqueue_hcmd failed: %d\n", cmd_str, ret); return ret; } ret = wait_event_timeout(trans->wait_command_queue, !test_bit(STATUS_SYNC_HCMD_ACTIVE, &trans->status), HOST_COMPLETE_TIMEOUT); if (!ret) { IWL_ERR(trans, "Error sending %s: time out after %dms.\n", cmd_str, jiffies_to_msecs(HOST_COMPLETE_TIMEOUT)); IWL_ERR(trans, "Current CMD queue read_ptr %d write_ptr %d\n", txq->read_ptr, txq->write_ptr); clear_bit(STATUS_SYNC_HCMD_ACTIVE, &trans->status); IWL_DEBUG_INFO(trans, "Clearing HCMD_ACTIVE for command %s\n", cmd_str); ret = -ETIMEDOUT; << -ETIMEDOUT is returned here iwl_trans_sync_nmi(trans); goto cancel; } ... Will dig further from here.
(In reply to Cheng Cui from comment #1) Also the cmd_str shows "UNKNOWN", I guess the cmd_idx=20 played something wrong or the idx is larger than the CMD queue size. based on error messages: iwlwifi0: Error sending UNKNOWN: time out after 2000ms. iwlwifi0: Current CMD queue read_ptr 16 write_ptr 17
(In reply to Cheng Cui from comment #2) In private build I triggered a panic to see backtraces: iwlwifi0: cc: iwl_get_cmd_string ret->cmd_id=0 ret_str=UNKNOWN iwlwifi0: cc: iwl_trans_txq_send_hcmd_sync cmd_idx=12 ret=0 iwlwifi0: Error sending UNKNOWN: time out after 2000ms. iwlwifi0: Current CMD queue read_ptr 140 write_ptr 141 panic: cc: iwl_trans_txq_send_hcmd_sync shall panic here (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=textdump@entry=0) at /usr/src/sys/kern/kern_shutdown.c:403 #2 0xffffffff804a4c1a in db_dump (dummy=<optimized out>, dummy2=<optimized out>, dummy3=<optimized out>, dummy4=<optimized out>) at /usr/src/sys/ddb/db_command.c:590 #3 0xffffffff804a4a1d in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=true) at /usr/src/sys/ddb/db_command.c:503 #4 0xffffffff804a46dd in db_command_loop () at /usr/src/sys/ddb/db_command.c:550 #5 0xffffffff804a7db6 in db_trap (type=<optimized out>, code=<optimized out>) at /usr/src/sys/ddb/db_main.c:267 #6 0xffffffff80ba0a63 in kdb_trap (type=type@entry=3, code=code@entry=0, tf=tf@entry=0xfffffe007ebd5390) at /usr/src/sys/kern/subr_kdb.c:790 #7 0xffffffff81058839 in trap (frame=0xfffffe007ebd5390) at /usr/src/sys/amd64/amd64/trap.c:606 #8 <signal handler called> #9 kdb_enter (why=<optimized out>, msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:556 #10 0xffffffff80b51623 in vpanic (fmt=0xffffffff82680f62 "cc: %s shall panic here\n", ap=ap@entry=0xfffffe007ebd55c0) at /usr/src/sys/kern/kern_shutdown.c:953 #11 0xffffffff80b51403 in panic (fmt=0xffffffff81980420 <cnputs_mtx> "MW\025\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:889 #12 0xffffffff8267ecd7 in iwl_trans_txq_send_hcmd_sync (trans=0xfffffe00804d4040, cmd=0xfffffe007ebd56e8) at /usr/src/sys/contrib/dev/iwlwifi/queue/tx.c:1816 #13 iwl_trans_txq_send_hcmd (trans=0xfffffe00804d4040, cmd=0xfffffe007ebd56e8) at /usr/src/sys/contrib/dev/iwlwifi/queue/tx.c:1904 #14 0xffffffff8261b12c in iwl_trans_send_cmd (trans=0xfffffe00804d4040, cmd=cmd@entry=0xfffffe007ebd56e8) at /usr/src/sys/contrib/dev/iwlwifi/iwl-trans.c:191 #15 0xffffffff8265e054 in iwl_mvm_send_cmd_status (mvm=mvm@entry=0xfffffe00bde30448, cmd=cmd@entry=0xfffffe007ebd56e8, status=status@entry=0xfffffe007ebd57a0) at /usr/src/sys/contrib/dev/iwlwifi/mvm/utils.c:100 #16 0xffffffff8265e17f in iwl_mvm_send_cmd_pdu_status (mvm=0xffffffff81980420 <cnputs_mtx>, mvm@entry=0xfffffe00bde30448, id=id@entry=1302, len=len@entry=16, data=0xffffffff8121d172, data@entry=0xfffffe007ebd5748, status=0x0, status@entry=0xfffffe007ebd57a0) at /usr/src/sys/contrib/dev/iwlwifi/mvm/utils.c:139 #17 0xffffffff82650b05 in iwl_mvm_fw_baid_op_cmd (mvm=0xfffffe00bde30448, sta=0xfffff80003dc2880, start=true, tid=0, ssn=0, buf_size=64, baid=-5) at /usr/src/sys/contrib/dev/iwlwifi/mvm/sta.c:2849 #18 iwl_mvm_fw_baid_op (mvm=0xfffffe00bde30448, sta=0xfffff80003dc2880, start=true, tid=0, ssn=0, buf_size=64, baid=-1) at /usr/src/sys/contrib/dev/iwlwifi/mvm/sta.c:2876 #19 iwl_mvm_sta_rx_agg (mvm=mvm@entry=0xfffffe00bde30448, sta=sta@entry=0xfffff80003dc2880, tid=0, ssn=0, start=true, buf_size=64, timeout=0) at /usr/src/sys/contrib/dev/iwlwifi/mvm/sta.c:2949 #20 0xffffffff8262e640 in iwl_mvm_mac_ampdu_action (hw=<optimized out>, vif=0xfffffe00bdf67c80, params=<optimized out>) at /usr/src/sys/contrib/dev/iwlwifi/mvm/mac80211.c:983 #21 0xffffffff80dd9e31 in lkpi_ic_ampdu_rx_start (ni=0xfffffe00be003000, rap=0xfffffe00be003d58, baparamset=4099, batimeout=0, baseqctl=0) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:3788 #22 0xffffffff80ce1033 in ht_recv_action_ba_addba_request (ni=0xfffffe00be003000, wh=<optimized out>, frm=<optimized out>, efrm=<optimized out>) at /usr/src/sys/net80211/ieee80211_ht.c:2441 #23 0xffffffff80d0b5d3 in sta_input (ni=0xfffffe00be003000, m=0xfffff80007b01500, rxs=0xfffffe007ebd59f8, rssi=-110, nf=<optimized out>) at /usr/src/sys/net80211/ieee80211_sta.c:973 #24 0xffffffff80ce5816 in ieee80211_input_mimo (ni=ni@entry=0xfffffe00be003000, m=m@entry=0xfffff80007b01500) at /usr/src/sys/net80211/ieee80211_input.c:100 #25 0xffffffff80ddace9 in linuxkpi_ieee80211_rx (hw=0xfffffe00bde30380, skb=0xfffff80005770800, sta=sta@entry=0xfffff80003dc2880, napi=<optimized out>, list=list@entry=0x0) at /usr/src/sys/compat/linuxkpi/common/src/linux_80211.c:4845 #26 0xffffffff82643587 in ieee80211_rx_napi (hw=0xffffffff81980420 <cnputs_mtx>, sta=0xfffff80003dc2880, skb=0xfffff80005770800, napi=0xfffffe00bde3c098) at /usr/src/sys/compat/linuxkpi/common/include/net/mac80211.h:1618 #27 iwl_mvm_pass_packet_to_mac80211 (mvm=0xfffffe00bde30448, napi=0xfffffe00bde3c098, skb=0xfffff80005770800, queue=0, sta=0xfffff80003dc2880, link_sta=0xfffff80003dc2938) at /usr/src/sys/contrib/dev/iwlwifi/mvm/rxmq.c:257 #28 iwl_mvm_rx_mpdu_mq (mvm=0xfffffe00bde30448, napi=0xfffffe00bde3c098, rxb=<optimized out>, queue=0) at /usr/src/sys/contrib/dev/iwlwifi/mvm/rxmq.c:2645 #29 0xffffffff8266de5f in iwl_op_mode_rx (op_mode=0xffffffff81980420 <cnputs_mtx>, napi=0xfffffe00bde3c098, rxb=0xfffffe007ebd5d50) at /usr/src/sys/contrib/dev/iwlwifi/iwl-op-mode.h:139 #30 iwl_pcie_rx_handle_rb (rxq=0xfffffe00bde3c000, i=<optimized out>, trans=<optimized out>, rxb=<optimized out>, emergency=<optimized out>) at /usr/src/sys/contrib/dev/iwlwifi/pcie/rx.c:1381 #31 iwl_pcie_rx_handle (trans=trans@entry=0xfffffe00804d4040, queue=<optimized out>, budget=budget@entry=64) at /usr/src/sys/contrib/dev/iwlwifi/pcie/rx.c:1568 #32 0xffffffff8266d8fd in iwl_pcie_napi_poll_msix (napi=0xfffffe00bde3c098, budget=64) at /usr/src/sys/contrib/dev/iwlwifi/pcie/rx.c:1044 #33 0xffffffff80deb8ff in lkpi_napi_task (ctx=0xfffffe00bde3c098, pending=<optimized out>) at /usr/src/sys/compat/linuxkpi/common/src/linux_netdev.c:302 #34 0xffffffff80bb743b in taskqueue_run_locked (queue=queue@entry=0xfffff8000782e900) at /usr/src/sys/kern/subr_taskqueue.c:511 #35 0xffffffff80bb84f3 in taskqueue_thread_loop (arg=arg@entry=0xfffffe00804d6120) at /usr/src/sys/kern/subr_taskqueue.c:823 #36 0xffffffff80b06ec2 in fork_exit (callout=0xffffffff80bb8420 <taskqueue_thread_loop>, arg=0xfffffe00804d6120, frame=0xfffffe007ebd5f40) at /usr/src/sys/kern/kern_fork.c:1157 #37 <signal handler called> (kgdb) f 12 #12 0xffffffff8267ecd7 in iwl_trans_txq_send_hcmd_sync (trans=0xfffffe00804d4040, cmd=0xfffffe007ebd56e8) at /usr/src/sys/contrib/dev/iwlwifi/queue/tx.c:1816 1816 panic("cc: %s shall panic here\n", __func__); (kgdb) l iwl_trans_txq_send_hcmd_sync 1772 #define HOST_COMPLETE_TIMEOUT (2 * HZ) 1773 1774 static int iwl_trans_txq_send_hcmd_sync(struct iwl_trans *trans, 1775 struct iwl_host_cmd *cmd) 1776 { 1777 const char *cmd_str = iwl_get_cmd_string(trans, cmd->id); 1778 struct iwl_txq *txq = trans->txqs.txq[trans->txqs.cmd.q_id]; 1779 int cmd_idx; 1780 int ret; 1781 (kgdb) 1782 IWL_DEBUG_INFO(trans, "Attempting to send sync command %s\n", cmd_str); 1783 1784 if (WARN(test_and_set_bit(STATUS_SYNC_HCMD_ACTIVE, 1785 &trans->status), 1786 "Command %s: a command is already active!\n", cmd_str)) 1787 return -EIO; 1788 1789 IWL_DEBUG_INFO(trans, "Setting HCMD_ACTIVE for command %s\n", cmd_str); 1790 1791 cmd_idx = trans->ops->send_cmd(trans, cmd); (kgdb) 1792 if (cmd_idx < 0) { 1793 ret = cmd_idx; 1794 clear_bit(STATUS_SYNC_HCMD_ACTIVE, &trans->status); 1795 IWL_ERR(trans, "Error sending %s: enqueue_hcmd failed: %d\n", 1796 cmd_str, ret); 1797 return ret; 1798 } 1799 1800 ret = wait_event_timeout(trans->wait_command_queue, 1801 !test_bit(STATUS_SYNC_HCMD_ACTIVE, (kgdb) 1802 &trans->status), 1803 HOST_COMPLETE_TIMEOUT); 1804 if (!ret) { 1805 IWL_ERR(trans, "cc: %s cmd_idx=%d ret=%d\n", __func__, cmd_idx, ret); 1806 IWL_ERR(trans, "Error sending %s: time out after %dms.\n", 1807 cmd_str, jiffies_to_msecs(HOST_COMPLETE_TIMEOUT)); 1808 1809 IWL_ERR(trans, "Current CMD queue read_ptr %d write_ptr %d\n", 1810 txq->read_ptr, txq->write_ptr); 1811 (kgdb) 1812 clear_bit(STATUS_SYNC_HCMD_ACTIVE, &trans->status); 1813 IWL_DEBUG_INFO(trans, "Clearing HCMD_ACTIVE for command %s\n", 1814 cmd_str); 1815 1816 panic("cc: %s shall panic here\n", __func__); << hit here 1817 ret = -ETIMEDOUT; 1818 1819 iwl_trans_sync_nmi(trans); 1820 goto cancel; 1821 } (kgdb) p cmd_str $1 = 0xffffffff82682169 "UNKNOWN" (kgdb) p cmd $4 = (struct iwl_host_cmd *) 0xfffffe007ebd56e8 (kgdb) p *cmd $5 = {data = {0xfffffe007ebd5748, 0x0}, resp_pkt = 0x0, _page = 0x0, _rx_page_order = 0, flags = 2, id = 1302, len = {16, 0}, dataflags = "\000"} (kgdb) p cmd->id $6 = 1302 (kgdb) p (cmd->id & 0xFF00) >> 8 // iwl_cmd_groupid $7 = 5 << grp=5 (kgdb) p cmd->id & 0xFF // iwl_cmd_opcode $8 = 22 << cmd=22 (kgdb) p trans->command_groups $9 = (const struct iwl_hcmd_arr *) 0xffffffff8269a080 <iwl_mvm_groups> (kgdb) p trans->command_groups_size $10 = 13 (kgdb) p trans->command_groups[5].arr $11 = (const struct iwl_hcmd_names *) 0xffffffff8269a840 <iwl_mvm_data_path_names> (kgdb) p *trans->command_groups[5].arr $14 = {cmd_id = 0 '\000', cmd_name = 0xffffffff8268699b "DQA_ENABLE_CMD"} (kgdb) p *(trans->command_groups[5].arr+1) $15 = {cmd_id = 1 '\001', cmd_name = 0xffffffff82685561 "UPDATE_MU_GROUPS_CMD"} (kgdb) p *(trans->command_groups[5].arr+2) $16 = {cmd_id = 2 '\002', cmd_name = 0xffffffff82680a46 "TRIGGER_RX_QUEUES_NOTIF_CMD"} (kgdb) p *(trans->command_groups[5].arr+3) $17 = {cmd_id = 7 '\a', cmd_name = 0xffffffff8268d23b "STA_HE_CTXT_CMD"} (kgdb) p *(trans->command_groups[5].arr+4) $18 = {cmd_id = 8 '\b', cmd_name = 0xffffffff8268b4e1 "RLC_CONFIG_CMD"} (kgdb) p *(trans->command_groups[5].arr+5) $19 = {cmd_id = 13 '\r', cmd_name = 0xffffffff82681dc9 "RFH_QUEUE_CONFIG_CMD"} (kgdb) p *(trans->command_groups[5].arr+6) $20 = {cmd_id = 15 '\017', cmd_name = 0xffffffff82680030 "TLC_MNG_CONFIG_CMD"} (kgdb) p *(trans->command_groups[5].arr+7) $21 = {cmd_id = 20 '\024', cmd_name = 0xffffffff82685576 "CHEST_COLLECTOR_FILTER_CONFIG_CMD"} (kgdb) p *(trans->command_groups[5].arr+8) $22 = {cmd_id = 23 '\027', cmd_name = 0xffffffff8268887b "SCD_QUEUE_CONFIG_CMD"} (kgdb) p *(trans->command_groups[5].arr+9) $23 = {cmd_id = 24 '\030', cmd_name = 0xffffffff82691716 "SEC_KEY_CMD"} (kgdb) p *(trans->command_groups[5].arr+10) $24 = {cmd_id = 244 '\364', cmd_name = 0xffffffff82687db2 "MONITOR_NOTIF"} (kgdb) p *(trans->command_groups[5].arr+11) $25 = {cmd_id = 246 '\366', cmd_name = 0xffffffff8268a000 "THERMAL_DUAL_CHAIN_REQUEST"} (kgdb) p *(trans->command_groups[5].arr+12) $26 = {cmd_id = 253 '\375', cmd_name = 0xffffffff8268aa94 "STA_PM_NOTIF"} (kgdb) p *(trans->command_groups[5].arr+13) $27 = {cmd_id = 254 '\376', cmd_name = 0xffffffff826920ea "MU_GROUP_MGMT_NOTIF"} (kgdb) p *(trans->command_groups[5].arr+14) $28 = {cmd_id = 255 '\377', cmd_name = 0xffffffff8268e54f "RX_QUEUES_NOTIFICATION"} In iwl_get_cmd_string(), the ret was NULL because of no match found. ret = bsearch(&cmd, arr->arr, arr->size, size, iwl_hcmd_names_cmp); So there was no match of such "cmd=22" from the "trans->command_groups[5].arr". cmd=22== 0x16, which shall be requesting: "RX_BAID_ALLOCATION_CONFIG_CMD" according to "datapath.h". Stay tuned.
According to iwl_mvm_fw_baid_op_cmd(), the group is DATA_PATH_GROUP (grp=5) and the cmd is 22 which means "RX_BAID_ALLOCATION_CONFIG_CMD". I am wondering why trans->command_groups[5].arr did not have the "cmd_id = 22" entry. enum iwl_mvm_command_groups { LEGACY_GROUP = 0x0, LONG_GROUP = 0x1, SYSTEM_GROUP = 0x2, MAC_CONF_GROUP = 0x3, PHY_OPS_GROUP = 0x4, DATA_PATH_GROUP = 0x5, << grp=5 SCAN_GROUP = 0x6, NAN_GROUP = 0x7, LOCATION_GROUP = 0x8, PROT_OFFLOAD_GROUP = 0xb, REGULATORY_AND_NVM_GROUP = 0xc, DEBUG_GROUP = 0xf, }; enum iwl_data_path_subcmd_ids { ... /** * @RX_BAID_ALLOCATION_CONFIG_CMD: Allocate/deallocate a BAID for an RX * blockack session, uses &struct iwl_rx_baid_cfg_cmd for the * command, and &struct iwl_rx_baid_cfg_resp as a response. */ RX_BAID_ALLOCATION_CONFIG_CMD = 0x16, << 0x16==22 ... }
(In reply to Cheng Cui from comment #4) Here, "RX_BAID_ALLOCATION_CONFIG_CMD" is not added in iwl_mvm_data_path_names[]. And this is not just a problem in 11na, 11a also has this same problem. I wonder why iwl_mvm_fw_baid_op_cmd() was added in below Linux patch without adding "RX_BAID_ALLOCATION_CONFIG_CMD" in iwl_mvm_data_path_names[]. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=97f70c56e6ca1e9bad1cbf6d3b0592f150bb95b1 /* Please keep this array *SORTED* by hex value. * Access is done through binary search */ static const struct iwl_hcmd_names iwl_mvm_data_path_names[] = { HCMD_NAME(DQA_ENABLE_CMD), HCMD_NAME(UPDATE_MU_GROUPS_CMD), HCMD_NAME(TRIGGER_RX_QUEUES_NOTIF_CMD), HCMD_NAME(STA_HE_CTXT_CMD), HCMD_NAME(RLC_CONFIG_CMD), HCMD_NAME(RFH_QUEUE_CONFIG_CMD), HCMD_NAME(TLC_MNG_CONFIG_CMD), HCMD_NAME(CHEST_COLLECTOR_FILTER_CONFIG_CMD), HCMD_NAME(SCD_QUEUE_CONFIG_CMD), HCMD_NAME(SEC_KEY_CMD), HCMD_NAME(MONITOR_NOTIF), HCMD_NAME(THERMAL_DUAL_CHAIN_REQUEST), HCMD_NAME(STA_PM_NOTIF), HCMD_NAME(MU_GROUP_MGMT_NOTIF), HCMD_NAME(RX_QUEUES_NOTIFICATION), };
(In reply to Cheng Cui from comment #5) This problem grp=5 cmd=22 exists in 11na and 11ng association. on FreeBSD 11a, 11b and 11g, there is no grp=5 cmd=22 during association. Also, I noticed in 11a, the iwl_mvm_mld_free_sta_link() is called frequently after association, but not sure if it is related with the panic. Next I will test a custom Linux kernel to see the pattern of grp & cmd.
Created attachment 247579 [details] grp,cmd prints during 802.11n association of iwlwifi Linux
(In reply to Cheng Cui from comment #7) Built a custom Linux kernel in my Ubuntu22.04 VM with the same debug print added in iwl_get_cmd_string(). With 2.4Ghz and 5Ghz bands, tested iwlwifi of Linux 802.11n in both bands (11ng and 11na), but didn't find the combination of (grp=5 cmd=22). Full print info is attached in file named "linux_iwl_print.log". So there are some possibilities of the problem that makes the firmware "stall". [1] the firmware does not recognize (grp=5 cmd=22), so it freeze [2] iwl_mvm_fw_baid_op_cmd() is not used in Linux, but used in FreeBSD Will further explore the above speculations.
(In reply to Cheng Cui from comment #8) In Linux kernel v5.10, which is used in my Ubuntu Linux 22.04 VM (v5.15.0-91), iwlagn_mac_ampdu_action() is used instead of iwl_mvm_mac_ampdu_action() for option IEEE80211_AMPDU_RX_START. The following are my inspections of the calling traces that prove iwl_mvm_fw_baid_op_cmd() is no longer used to send out (grp=5 cmd=22 == RX_BAID_ALLOCATION_CONFIG_CMD) for RX_AMPDU. linux kernel v5.10: iwl_sta_rx_agg_start iwl_send_add_sta iwl_dvm_send_cmd iwl_trans_send_cmd iwl_trans_pcie_gen2_send_hcmd
(In reply to Cheng Cui from comment #9) In FreeBSD, once I disabled HT_RXAGG, the 802.11na association worked as follows: root@n1_iwl_vm:~ # sysctl -d compat.linuxkpi.iwlwifi_11n_disable compat.linuxkpi.iwlwifi_11n_disable: disable 11n functionality, bitmap: 1: full, 2: disable agg TX, 4: disable agg RX, 8 enable agg TX root@n1_iwl_vm:~ # sysctl compat.linuxkpi.iwlwifi_11n_disable compat.linuxkpi.iwlwifi_11n_disable: 4 root@n1_iwl_vm:~ # ifconfig wlan0 wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether e0:2e:0b:92:e5:82 groups: wlan ssid TP-LINK_5G channel 48 (5240 MHz 11a ht/20) bssid 30:b5:c2:aa:83:34 regdomain FCC country US authmode WPA2/802.11i privacy ON deftxkey UNDEF TKIP 2:128-bit txpower 17 bmiss 7 mcastrate 6 mgmtrate 6 scanvalid 60 ampdulimit 64k ampdudensity 4 -amsdutx amsdurx shortgi -ldpctx ldpcrx -uapsd wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet MCS mode 11na status: associated nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> root@n1_iwl_vm:~ # dhclient wlan0 DHCPREQUEST on wlan0 to 255.255.255.255 port 67 DHCPACK from 192.168.0.1 Bogus Host Name option 12: n1_iwl_vm (n1_iwl_vm) bound to 192.168.0.190 -- renewal in 3600 seconds. root@n1_iwl_vm:~ # ifconfig wlan0 wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=0 ether e0:2e:0b:92:e5:82 inet 192.168.0.190 netmask 0xffffff00 broadcast 192.168.0.255 groups: wlan ssid TP-LINK_5G channel 48 (5240 MHz 11a ht/20) bssid 30:b5:c2:aa:83:34 regdomain FCC country US authmode WPA2/802.11i privacy ON deftxkey UNDEF TKIP 2:128-bit txpower 17 bmiss 7 mcastrate 6 mgmtrate 6 scanvalid 60 ampdulimit 64k ampdudensity 4 -amsdutx amsdurx shortgi -ldpctx ldpcrx -uapsd wme roaming MANUAL parent interface: iwlwifi0 media: IEEE 802.11 Wireless Ethernet MCS mode 11na status: associated nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
(In reply to Cheng Cui from comment #9) Well, comment #9 is not quite accurate. Linux kernel v5.10 does not call iwl_mvm_fw_baid_op_cmd() to send out (grp=5 cmd=22), but the call traces are not right as they are from dvm, not mvm. And the latest Linux kernel master branch has the call to iwl_mvm_fw_baid_op_cmd(), I need more time to figure out what's going on.
(In reply to Cheng Cui from comment #11) The latest Linux kernel v6.7.0 does send out the (grp=5 cmd=22) for RX_AMPDU. I can confirm this from my custom build with prints in iwl_get_cmd_string(). Although the return is also "UNKNOWN", there seems to be nothing wrong with the user experience on using 802.11na in Linux. And the firmware version is the same as it in the current FreeBSD. root@n2linuxvm:~ # lshw -C network | egrep firmware configuration: broadcast=yes driver=iwlwifi driverversion=6.7.0+ firmware=77.2df8986f.0 cc-a0-77.ucode latency=0 link=no multicast=yes wireless=IEEE 802.11 Need to dig deeper.
(In reply to Cheng Cui from comment #12) in fbsd: lkpi_ic_ampdu_rx_start lkpi_80211_mo_ampdu_action iwl_mvm_mac_ampdu_action iwl_mvm_sta_rx_agg iwl_mvm_fw_baid_op iwl_mvm_fw_baid_op_cmd << RX_BAID_ALLOCATION_CONFIG_CMD added here iwl_mvm_send_cmd_pdu_status iwl_mvm_send_cmd_status iwl_trans_send_cmd iwl_trans_txq_send_hcmd iwl_trans_txq_send_hcmd_sync iwl_pcie_gen2_enqueue_hcmd iwl_trans_txq_send_hcmd_sync called: iwl_trans_sync_nmi iwl_trans_pcie_sync_nmi iwl_trans_sync_nmi_with_addr iwl_trans_fw_error << set IWL_TRANS_NO_FW iwl_op_mode_nic_error iwl_mvm_nic_error << if fw crashes, nic will not be started in linux v6.7 iwl_mvm_mac_ampdu_action << returns 0 on (grp=5 cmd=22) iwl_mvm_sta_rx_agg iwl_mvm_fw_baid_op iwl_mvm_fw_baid_op_cmd << start=true, (grp=5 cmd=22==RX_BAID_ALLOCATION_CONFIG_CMD) added here iwl_mvm_send_cmd_pdu_status iwl_mvm_send_cmd_status iwl_trans_send_cmd iwl_trans_txq_send_hcmd iwl_trans_txq_send_hcmd_sync << from which iwl_get_cmd_string() prints cc: * iwl_pcie_gen2_enqueue_hcmd << linked by trans->ops->send_cmd
(In reply to Cheng Cui from comment #13) Also tried different versions of iwlwifi firmwares that could be found from the linux-firmware.git. For versions cc-a0-{77,74,73,72} the firmware crashes on (grp=5 cmd=22), but for older versions cc-a0-{66,59,50} the firmware crashes on (grp=1 cmd=24). So this problem is more likely from upper layer that sent unacceptable/disordered cmds or a missing command, or a flag, etc..
(In reply to Cheng Cui from comment #14) Played with trace-cmd in Linux kernel for the iwlwifi function traces for more than two days. It looks to be powerful, but too complex. At least for now, I still cannot figure out how to backtrack enough function traces from the iwl_trans_send_cmd() for various iwl cmds to the fw. e.g. "trace-cmd record -p function_graph -e iwlwifi nmcli radio wifi on" Meanwhile, I am dumping out all the iwl cmds to the fw (Linux kernel 6.7.0 with the same fw version) during authentication/association, and try to understand the orders/workflow just under a few pkts exchange from the AP.
(In reply to Cheng Cui from comment #15) Dumped hex values of the hcmd (grp=5 cmd=22) manually in freebsd together with comparison with the same hcmd in Linux. Turned out there is nothing wrong with the hcmd in values or size. I also tested on a second AP where the association went well until the Add Block Ack Request packet was received to trigger the A-MPDU processing, which caused fw timeout: iwlwifi0: 100162 2147469605 iwl_mvm_mac_sta_state_common IWL_DL_MAC80211 station 0xfffff8000d0ea928M state change 3->4 iwlwifi0: 100162 2147469606 iwl_pcie_gen2_enqueue_hcmd IWL_DL_HC Sending command REPLY_BEACON_FILTERING_CMD (01.d2), seq: 0x005F, 68 bytes at 95[31]:0 iwlwifi0: 100162 2147469606 iwl_pcie_gen2_enqueue_hcmd IWL_DL_HC Sending command MAC_CONFIG_CMD (03.08), seq: 0x0060, 60 bytes at 96[0]:0 iwlwifi0: 100162 2147469606 iwl_pcie_gen2_enqueue_hcmd IWL_DL_HC Sending command MAC_CONFIG_CMD (03.08), seq: 0x0061, 60 bytes at 97[1]:0 iwlwifi0: 100024 2147471399 iwl_pcie_gen2_enqueue_hcmd IWL_DL_HC Sending command STATISTICS_CMD (01.9c), seq: 0x0062, 12 bytes at 98[2]:0 iwlwifi0: 100024 2147471399 iwl_pcie_gen2_enqueue_hcmd IWL_DL_HC Sending command SCAN_CFG_CMD (01.0c), seq: 0x0063, 20 bytes at 99[3]:0 iwlwifi0: 100023 2147471725 iwl_pcie_gen2_enqueue_hcmd IWL_DL_HC Sending command SCAN_CFG_CMD (01.0c), seq: 0x0064, 20 bytes at 100[4]:0 iwlwifi0: 100160 2147471725 iwl_mvm_mac_ampdu_action IWL_DL_HT A-MPDU action on addr 0xfffff8000d0ea928M tid 6: action 0 iwlwifi0: Error sending SCAN_CFG_CMD: time out after 2000ms. <<< iwlwifi0: Current CMD queue read_ptr 100 write_ptr 101 iwlwifi0: Start IWL Error Log Dump: iwlwifi0: Transport status: 0x0000004A, valid: 6 iwlwifi0: Loaded firmware version: 77.2df8986f.0 cc-a0-77.ucode
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=759a996d610d9354aac5c48a6bdc9cedcba2f48b commit 759a996d610d9354aac5c48a6bdc9cedcba2f48b Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-12 16:03:13 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-22 22:46:48 +0000 LinuxKPI: 802.11: implement a deferred RX path Some calls, e.g., action frames cause us to call through all the way down to firmware from the RX path without any deferral in net80211. For LinuxKPI and iwlwifi this goes (with omissions) like this: lkpi_napi_task -> linuxkpi_ieee80211_rx -> ieee80211_input_mimo -> sta_input -> ht_recv_action_ba_addba_request -> lkpi_ic_ampdu_rx_start -> iwl_mvm_mac_ampdu_action -> iwl_trans_txq_send_hcmd. At that point we are waiting for an interrupt from the firmware but given the lkpi_napi_task has not finished (and may have more to dispatch based on budget and what was received) we will not see the new interrupt/fw response. With no answer from the firmware, the software timeout in the driver kills the command and the firmware and issues a complete restart. Implement the deferred RX path in LinuxKPI for the moment. At a later point we should carefully shift this into net80211. This fixes the hangs for (*ic_ampdu_rx_start)() calls with iwlwifi. MFC after: 3 days PR: 276083 Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43968 sys/compat/linuxkpi/common/src/linux_80211.c | 134 ++++++++++++++++++++++++--- sys/compat/linuxkpi/common/src/linux_80211.h | 31 +++++++ 2 files changed, 154 insertions(+), 11 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dd167e59079b65a53a45744055db6634f3c19a56 commit dd167e59079b65a53a45744055db6634f3c19a56 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-12 16:03:13 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-04-01 23:46:10 +0000 LinuxKPI: 802.11: implement a deferred RX path Some calls, e.g., action frames cause us to call through all the way down to firmware from the RX path without any deferral in net80211. For LinuxKPI and iwlwifi this goes (with omissions) like this: lkpi_napi_task -> linuxkpi_ieee80211_rx -> ieee80211_input_mimo -> sta_input -> ht_recv_action_ba_addba_request -> lkpi_ic_ampdu_rx_start -> iwl_mvm_mac_ampdu_action -> iwl_trans_txq_send_hcmd. At that point we are waiting for an interrupt from the firmware but given the lkpi_napi_task has not finished (and may have more to dispatch based on budget and what was received) we will not see the new interrupt/fw response. With no answer from the firmware, the software timeout in the driver kills the command and the firmware and issues a complete restart. Implement the deferred RX path in LinuxKPI for the moment. At a later point we should carefully shift this into net80211. This fixes the hangs for (*ic_ampdu_rx_start)() calls with iwlwifi. PR: 276083 Reviewed by: cc (cherry picked from commit 759a996d610d9354aac5c48a6bdc9cedcba2f48b) sys/compat/linuxkpi/common/src/linux_80211.c | 134 ++++++++++++++++++++++++--- sys/compat/linuxkpi/common/src/linux_80211.h | 31 +++++++ 2 files changed, 154 insertions(+), 11 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2a12368e7ced1c103d5be14e545f3e0f25edf947 commit 2a12368e7ced1c103d5be14e545f3e0f25edf947 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2024-02-12 16:03:13 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-06-14 14:55:15 +0000 LinuxKPI: 802.11: implement a deferred RX path Some calls, e.g., action frames cause us to call through all the way down to firmware from the RX path without any deferral in net80211. For LinuxKPI and iwlwifi this goes (with omissions) like this: lkpi_napi_task -> linuxkpi_ieee80211_rx -> ieee80211_input_mimo -> sta_input -> ht_recv_action_ba_addba_request -> lkpi_ic_ampdu_rx_start -> iwl_mvm_mac_ampdu_action -> iwl_trans_txq_send_hcmd. At that point we are waiting for an interrupt from the firmware but given the lkpi_napi_task has not finished (and may have more to dispatch based on budget and what was received) we will not see the new interrupt/fw response. With no answer from the firmware, the software timeout in the driver kills the command and the firmware and issues a complete restart. Implement the deferred RX path in LinuxKPI for the moment. At a later point we should carefully shift this into net80211. This fixes the hangs for (*ic_ampdu_rx_start)() calls with iwlwifi. PR: 276083 Reviewed by: cc Differential Revision: https://reviews.freebsd.org/D43968 (cherry picked from commit 759a996d610d9354aac5c48a6bdc9cedcba2f48b) sys/compat/linuxkpi/common/src/linux_80211.c | 134 ++++++++++++++++++++++++--- sys/compat/linuxkpi/common/src/linux_80211.h | 31 +++++++ 2 files changed, 154 insertions(+), 11 deletions(-)
I believe this is fixed in all active branches now. Thanks a lot for reporting, tracking, all the debug information, ...