Bug 281560 - gve (4) uma deadlock during high tcp throughput
Summary: gve (4) uma deadlock during high tcp throughput
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: vendor
Depends on:
Blocks:
 
Reported: 2024-09-17 21:21 UTC by shailend
Modified: 2024-10-29 14:19 UTC (History)
9 users (show)

See Also:


Attachments
procstat_kka (98.19 KB, text/plain)
2024-09-26 17:03 UTC, shailend
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description shailend 2024-09-17 21:21:46 UTC
In https://reviews.freebsd.org/D46690 I am adding support for a newer queue format to gve that supports ~200Gbps of throughput.  

While stress testing these changes I ran into a deadlock that I'm unable to find a root cause for. This is reproduced by having two VMs sending bidirectional tcp traffic to each other, and the deadlock sets in after a few hours.  

I am listing the stack traces of the driver taskqueues after the deadlock sets in for two different occurrences of the deadlock:

```
[root@FreeBSD14 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done
    0 100416 kernel              gve0 rxq 0          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100417 kernel              gve0 rxq 1          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100418 kernel              gve0 rxq 2          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100419 kernel              gve0 rxq 3          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100420 kernel              gve0 rxq 4          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100421 kernel              gve0 rxq 5          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_active 
    0 100422 kernel              gve0 rxq 6          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100423 kernel              gve0 rxq 7          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100424 kernel              gve0 rxq 8          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100425 kernel              gve0 rxq 9          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_active 
    0 100426 kernel              gve0 rxq 10         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100427 kernel              gve0 rxq 11         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100428 kernel              gve0 rxq 12         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100429 kernel              gve0 rxq 13         mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100430 kernel              gve0 rxq 14         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 
    0 100431 kernel              gve0 rxq 15         mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100434 kernel              gve0 txq 1          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100448 kernel              gve0 txq 8          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100456 kernel              gve0 txq 12         mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100458 kernel              gve0 txq 13         mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
```  

```
[root@FreeBSD14 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done
    0 100433 kernel              gve0 rxq 0          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100434 kernel              gve0 rxq 1          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100435 kernel              gve0 rxq 2          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100436 kernel              gve0 rxq 3          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100437 kernel              gve0 rxq 4          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100438 kernel              gve0 rxq 5          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100439 kernel              gve0 rxq 6          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100440 kernel              gve0 rxq 7          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100441 kernel              gve0 rxq 8          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked 
    0 100442 kernel              gve0 rxq 9          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked 
    0 100443 kernel              gve0 rxq 10         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100444 kernel              gve0 rxq 11         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100445 kernel              gve0 rxq 12         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100446 kernel              gve0 rxq 13         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100447 kernel              gve0 rxq 14         mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked 
    0 100448 kernel              gve0 rxq 15         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 
    0 100457 kernel              gve0 txq 4          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100463 kernel              gve0 txq 7          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100465 kernel              gve0 txq 8          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100467 kernel              gve0 txq 9          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100469 kernel              gve0 txq 10         mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
```  

I tried to look into any incorrect kpi use in the driver (https://reviews.freebsd.org/D46690) but couldn't find any offenses. It is rather straightforward: in the Rx paths it inputs cluster mbufs and m_freems transmitted mbufs in the Tx path. Thus I'm hesitantly suspecting some race condition in the uma stack given the traces.  

I reproduced the issue with INVARIANTS and WITNESS but they didn't catch anything. 

The issue also reproduces with a slightly different datapath mode in the driver (https://reviews.freebsd.org/D46691). In this mode, it is not cluster mbufs that are input in the Rx path usually, but rather mbufs with external storage attached. The stack traces of the driver taskqueues after the deadlock look like so:

```
[root@FreeBSD13 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done
    0 100451 kernel              gve0 rxq 1          mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 
    0 100452 kernel              gve0 rxq 2          mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100453 kernel              gve0 rxq 3          mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100454 kernel              gve0 rxq 4          mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 
    0 100455 kernel              gve0 rxq 5          mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 
    0 100457 kernel              gve0 rxq 7          mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 
    0 100458 kernel              gve0 rxq 8          mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 
    0 100459 kernel              gve0 rxq 9          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush 
    0 100460 kernel              gve0 rxq 10         mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 
    0 100461 kernel              gve0 rxq 11         mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100462 kernel              gve0 rxq 12         mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry tcp_m_copym tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100463 kernel              gve0 rxq 13         mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100464 kernel              gve0 rxq 14         mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 
    0 100465 kernel              gve0 rxq 15         mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100466 kernel              gve0 txq 0          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100468 kernel              gve0 txq 1          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100470 kernel              gve0 txq 2          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100472 kernel              gve0 txq 3          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100474 kernel              gve0 txq 4          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100480 kernel              gve0 txq 7          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100482 kernel              gve0 txq 8          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100484 kernel              gve0 txq 9          mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100486 kernel              gve0 txq 10         mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100488 kernel              gve0 txq 11         mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100490 kernel              gve0 txq 12         mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100492 kernel              gve0 txq 13         mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100494 kernel              gve0 txq 14         mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100496 kernel              gve0 txq 15         mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
```

I've reproduced this in 14-1 and 13-1. 

Any guidance or help in uncovering the source of this deadlock would be very helpful!
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2024-09-18 16:55:47 UTC
If you have the deadlock, there must be a resource acquisition loop, which
you should be able to unwind by looking at threads waiting for a resource,
then finding the owner of the resource, then  seeing what it is doing etc.

That said, do you run with INVARIANTS and WITNESS enabled?  If not, you must.
For instance, only INVARIANTS-enabled kernel would detect an attempt to
take a blockable lock inside smr section.  Similarly, WITNESS might directly
point to the resource loop for locks.
Comment 2 shailend 2024-09-18 17:19:31 UTC
Actually I did run it with INVARIANTS and WITNESS and other options listed on https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks and the deadlock reproduces without any findings from INVARIANTS or WITNESS.  

`procstat -k ${tid?}` is the only only tool I know to look at the deadlocked threads and that hints at some locking issue in the uma zone cache (as per the per-thread procstat -k in comment #0).  

Are there some other ways I can look at the locked threads and more definitively figure out what resource they are blocked on and who owns it?
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2024-09-18 19:07:36 UTC
If you have WITNESS configured, then you can get an overview of the locks
ownership on the system, using the 'show alllocks' ddb command.  This should
allow you to see lock owners, including the shared locks (like read-locked
rwlocks).  Another hint is that ddb 'ps' command output wchan ia the address
of lock (or something very close to it) where the thread blocked.

This should be enough to see the loops.  Basically, take any blocked thread,
see which lock it needs ('ps'), then find a thread that owns the lock.  Now
look at the backtrace of the lock owner and see why it does not make progress,
e.g. typically waiting for yet another lock, and so on.
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2024-09-20 15:26:00 UTC
(In reply to Konstantin Belousov from comment #3)
It'd also be useful to see output from "show page" and "show uma" from DDB.  Perhaps there's a kernel memory leak somewhere?
Comment 5 Andrew Gallatin freebsd_committer freebsd_triage 2024-09-20 15:41:42 UTC
(In reply to Mark Johnston from comment #4)
To expand on this, the pattern of stack traces look very much like a degenerate behavior in the UMA memory allocator when the per-cpu caches are empty, and the allocation must go back to the zone, and you wind up taking a lock on every allocation.

I'd check to see if there is a mbuf leak.  The output of vmstat -z would be helpful here.
Comment 6 shailend 2024-09-26 00:38:27 UTC
I reproduced this with invariants+witness+ddb, it takes much longer to hit the deadlock due to lowered throughput due to invariants and witness. Backtraces of locked driver threads:  

```
[root@FreeBSD14 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done
    0 100413 kernel              gve0 rxq 0          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100414 kernel              gve0 rxq 1          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100415 kernel              gve0 rxq 2          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100416 kernel              gve0 rxq 3          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100417 kernel              gve0 rxq 4          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 
    0 100418 kernel              gve0 rxq 5          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 
    0 100419 kernel              gve0 rxq 6          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100420 kernel              gve0 rxq 7          mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100421 kernel              gve0 rxq 8          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input 
    0 100422 kernel              gve0 rxq 9          mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 
    0 100423 kernel              gve0 rxq 10         mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 
    0 100424 kernel              gve0 rxq 11         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 
    0 100425 kernel              gve0 rxq 12         mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input 
    0 100426 kernel              gve0 rxq 13         mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_push_and_replace 
    0 100427 kernel              gve0 rxq 14         mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input 
    0 100433 kernel              gve0 txq 2          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100437 kernel              gve0 txq 4          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100443 kernel              gve0 txq 7          mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100453 kernel              gve0 txq 12         mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 
    0 100459 kernel              gve0 txq 15         mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline
```

show alllocks output:

```
db> show alllocks
Process 857 (iperf) thread 0xfffff800168c7000 (100673)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4d2c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800168c8740 (100674)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a020140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800168c8000 (100675)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e288c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800168cf740 (100676)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8ae9140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800168cf000 (100677)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e56dc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8845000 (100678)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b85708c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8770000 (100679)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a0cb8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8716000 (100680)
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016dbe8e0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e5e2020) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dbe8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8730000 (100681)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a14bc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87f5000 (100682)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e13ec80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87ab000 (100683)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e25e140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8827000 (100684)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b88b28c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b874e000 (100685)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6c78c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8750000 (100686)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e56d140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8764000 (100687)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6c7500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b875d000 (100688)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dbe500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b880c000 (100689)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e044500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8789000 (100690)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e2b0140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87e9740 (100691)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8d3ac80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87f6740 (100692)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e769c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8855000 (100693)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a05a500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87ca740 (100694)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e045c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87a0000 (100695)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b85b78c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b880c740 (100696)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4ad140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff80016da6000 (100697)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6e3c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87ff740 (100698)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016984c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b873b740 (100699)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016ff78c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b875b000 (100700)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8963c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87ff000 (100701)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcfc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8772740 (100702)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b89168c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8877740 (100703)
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016dcf8e0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8d53020) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcf8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8787740 (100704)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e044140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b881a740 (100705)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e39e140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b873b000 (100706)
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff800169848e0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b86f8560) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800169848c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8731740 (100707)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e28500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87dd000 (100708)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcf500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b883b740 (100709)
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e4aeca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e2fc560) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4aec80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8818740 (100710)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e28140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8845740 (100711)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a14b8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b878a000 (100712)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e2b1c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8764740 (100713)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcf140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b870a740 (100714)
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e4ae8e0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e2fc020) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4ae8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b878a740 (100715)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a05a140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff80013a2d000 (100716)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b85b7500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8716740 (100717)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e844500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87f6000 (100718)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a021c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87ca000 (100719)
exclusive sleep mutex mbuf (UMA zone) r = 0 (0xfffffe00df57a3d0) locked @ /usr/src/sys/vm/uma_core.c:701
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016ff7ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8ad0aa0) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016ff7c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8797740 (100720)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8d578c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8857000 (100721)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e45bc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8766740 (100722)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e7698c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87bb740 (100723)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6c7140) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8878740 (100724)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a14b500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8751000 (100725)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016ff7500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8789740 (100726)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e29c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b880a000 (100727)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8d57c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87a0740 (100728)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8916500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b873d000 (100729)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e769500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87ac740 (100730)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a0218c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87e7740 (100731)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e45b8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8837740 (100732)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b88b2500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b8743000 (100733)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8aeac80) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87db000 (100734)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4d28c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b875d740 (100735)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8570500) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 857 (iperf) thread 0xfffff800b87dc740 (100736)
exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a2bf8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044
Process 0 (kernel) thread 0xfffff80016a34000 (100413)  **gve rxq 0**
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e56d160) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff80016f3caa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b626d000 (100415)  **gve rxq 2**
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff800b8963ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8960020) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b626e740 (100416)  **gve rxq 3**
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016dcfca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002eab9560) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b626e000 (100417)  **gve rxq 4**
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e56dca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002ee5a020) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b626f740 (100418)  **gve rxq 5**  
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016984ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b86f8aa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b6270000 (100421)  **gve rxq 8**  
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b88af560) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b74c5740 (100422)  **gve rxq 9**
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e6e3ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8ae5aa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b74c6740 (100424)  **gve rxq 11**
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff800b8916520) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8914560) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b74c6000 (100425)  **gve rxq 12**
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e9f2560) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b74c7740 (100426)  **gve rxq 13**
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e4d28e0) locked @ /usr/src/sys/netinet/tcp_output.c:350
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8556aa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437
Process 0 (kernel) thread 0xfffff800b74c7000 (100427)  **gve rxq 14**
exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e6bfaa0) locked @ /usr/src/sys/netinet/in_pcb.c:1421
```

and the ps output of driver threads:

```
100413                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 0]
100414                   L      *tcpinp  0xfffff80016ab20c0  [gve0 rxq 1]
100415                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 2]
100416                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 3]
100417                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 4]
100418                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 5]
100419                   L      *tcpinp  0xfffff80016ab3480  [gve0 rxq 6]
100420                   L      *tcpinp  0xfffff800b87c9840  [gve0 rxq 7]
100421                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 8]
100422                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 9]
100423                   L      *tcpinp  0xfffff80016ab2c00  [gve0 rxq 10]
100424                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 11]
100425                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 12]
100426                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 13]
100427                   L      *mbuf    0xfffff800b877e9c0  [gve0 rxq 14]
100428                   D       -       0xfffff800b82c7400  [gve0 rxq 15]
100429                   D       -       0xfffff800b82c7200  [gve0 txq 0]
100430                   D       -       0xfffff800b82c7000  [gve0 txq 0 xmit]
100431                   D       -       0xfffff800b828ad00  [gve0 txq 1]
100432                   D       -       0xfffff800b828ab00  [gve0 txq 1 xmit]
100433                   L      *mbuf    0xfffff800b877e9c0  [gve0 txq 2]
100434                   D       -       0xfffff800b828a700  [gve0 txq 2 xmit]
100435                   D       -       0xfffff800b828a500  [gve0 txq 3]
100436                   D       -       0xfffff800b828a300  [gve0 txq 3 xmit]
100437                   L      *mbuf    0xfffff800b877e9c0  [gve0 txq 4]
100438                   Run     CPU 11                      [gve0 txq 4 xmit]
100439                   D       -       0xfffff800b828bc00  [gve0 txq 5]
100440                   D       -       0xfffff800b828ba00  [gve0 txq 5 xmit]
100441                   D       -       0xfffff800b828b800  [gve0 txq 6]
100442                   D       -       0xfffff800b828b600  [gve0 txq 6 xmit]
100443                   L      *mbuf    0xfffff800b877e9c0  [gve0 txq 7]
100444                   D       -       0xfffff800b828b200  [gve0 txq 7 xmit]
100445                   D       -       0xfffff800b828b000  [gve0 txq 8]
100446                   D       -       0xfffff800b828cd00  [gve0 txq 8 xmit]
100447                   D       -       0xfffff800b828cb00  [gve0 txq 9]
100448                   D       -       0xfffff800b828c900  [gve0 txq 9 xmit]
100449                   D       -       0xfffff800b828c700  [gve0 txq 10]
100450                   D       -       0xfffff800b828c500  [gve0 txq 10 xmit]
100451                   D       -       0xfffff800b828c300  [gve0 txq 11]
100452                   D       -       0xfffff800b828c100  [gve0 txq 11 xmit]
100453                   L      *mbuf    0xfffff800b877e9c0  [gve0 txq 12]
100454                   D       -       0xfffff8002e54dc00  [gve0 txq 12 xmit]
100455                   D       -       0xfffff8002e54da00  [gve0 txq 13]
100456                   D       -       0xfffff8002e54d800  [gve0 txq 13 xmit]
100457                   D       -       0xfffff8002e54d600  [gve0 txq 14]
100458                   D       -       0xfffff8002e54d400  [gve0 txq 14 xmit]
100459                   L      *mbuf    0xfffff800b877e9c0  [gve0 txq 15]
100460                   D       -       0xfffff800b84fc400  [gve0 txq 15 xmit]
```

ddb backtrace of one of the locked driver threads:

```
db> trace 100413
Tracing pid 0 tid 100413 td 0xfffff80016a34000
sched_switch() at sched_switch+0x5d2/frame 0xfffffe018a1fe4d0
mi_switch() at mi_switch+0x171/frame 0xfffffe018a1fe4f0
__mtx_lock_sleep() at __mtx_lock_sleep+0x1ba/frame 0xfffffe018a1fe580
__mtx_lock_flags() at __mtx_lock_flags+0xe1/frame 0xfffffe018a1fe5d0
cache_alloc() at cache_alloc+0x237/frame 0xfffffe018a1fe630
cache_alloc_retry() at cache_alloc_retry+0x23/frame 0xfffffe018a1fe670
tcp_m_copym() at tcp_m_copym+0x227/frame 0xfffffe018a1fe710
tcp_default_output() at tcp_default_output+0x146c/frame 0xfffffe018a1fe8e0
tcp_do_segment() at tcp_do_segment+0x24c9/frame 0xfffffe018a1fe9c0
tcp_input_with_port() at tcp_input_with_port+0x1135/frame 0xfffffe018a1feb10
tcp_input() at tcp_input+0xb/frame 0xfffffe018a1feb20
ip_input() at ip_input+0x2ae/frame 0xfffffe018a1feb80
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe018a1febe0
ether_demux() at ether_demux+0x17a/frame 0xfffffe018a1fec10
ether_nh_input() at ether_nh_input+0x3f8/frame 0xfffffe018a1fec60
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe018a1fecc0
ether_input() at ether_input+0xe5/frame 0xfffffe018a1fed20
tcp_lro_flush() at tcp_lro_flush+0x34f/frame 0xfffffe018a1fed50
tcp_lro_rx_done() at tcp_lro_rx_done+0x37/frame 0xfffffe018a1fed70
tcp_lro_flush_all() at tcp_lro_flush_all+0x16e/frame 0xfffffe018a1fedc0
gve_rx_cleanup_tq_dqo() at gve_rx_cleanup_tq_dqo+0x854/frame 0xfffffe018a1fee40
```

Now in trying to find the resource loop, I initially failed because most driver threads are waiting on the uma zone lock (the lock that cache_alloc presumably needs) and this lock's address in the wchan column is 0xfffff800b877e9c0. But 0xfffff800b877e9c0 does not show up in the `show alllocks` output at all. However, there _is_ an owner for the uma zone lock there:

```
Process 857 (iperf) thread 0xfffff800b87ca000 (100719)
exclusive sleep mutex mbuf (UMA zone) r = 0 (0xfffffe00df57a3d0) locked @ /usr/src/sys/vm/uma_core.c:701
exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016ff7ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350
```

In holding this lock, this iperf thread is probably holding up the driver threads. I'm not sure what to make of the fact that the addresses dont match: 0xfffffe00df57a3d0 (iperf) vs 0xfffff800b877e9c0 (gve). Especially given that we know both iperf and gve are getting at the mbuf zone. 


As advised in comment #4 and comment #5, I looked at the uma, page and vmstat outputs:

```
[root@FreeBSD14 ~]# vmstat -z | grep -E "^ITEM|mbuf"
ITEM                   SIZE   LIMIT     USED     FREE      REQ FAIL SLEEP XDOM
mbuf_packet:            256,      0,   16768,    9902,9837509268,   0,   0,   0
mbuf:                   256,6513896,   24752,   56286,224742586944,   0,   0,   0
mbuf_cluster:          2048,1017796,   26670,     254,   26670,   0,   0,   0
mbuf_jumbo_page:       4096, 508898,   21827,   25925,104428955976,   0,   0,   0
mbuf_jumbo_9k:         9216, 150784,       0,       0,       0,   0,   0,   0
mbuf_jumbo_16k:       16384,  84816,       0,       0,       0,   0,   0,   0
```

```
[root@FreeBSD14 ~]# netstat -m
41520/66188/107708 mbufs in use (current/cache/total)
16768/10156/26924/1017796 mbuf clusters in use (current/cache/total/max)
16768/9902 mbuf+clusters out of packet secondary zone in use (current/cache)
21827/25925/47752/508898 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/150784 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84816 16k jumbo clusters in use (current/cache/total/max)
131224K/140559K/271783K bytes allocated to network (current/cache/total)
...
```

```
db> show page
vm_cnt.v_free_count: 3261805
vm_cnt.v_inactive_count: 156959
vm_cnt.v_active_count: 36342
vm_cnt.v_laundry_count: 0
vm_cnt.v_wire_count: 649216
vm_cnt.v_free_reserved: 5367
vm_cnt.v_free_min: 25722
vm_cnt.v_free_target: 86787
vm_cnt.v_inactive_target: 130180


db> show uma
              Zone   Size    Used    Free    Requests  Sleeps  Bucket  Total Mem    XFree
         FFS inode   1168  349865     331      864424       0       8  409028928        0
             VNODE    600  349906     665      864497       0      16  210342600        0
   mbuf_jumbo_page   4096   21827   25925 104428955976       0     254  195592192        0
       FFS2 dinode    256  349865    1120      864424       0      62   89852160        0
      mbuf_cluster   2048   26670     254       26670       0     254   55140352        0
       S VFS Cache    104  356473    7124      870718       0     126   37814088        0
         VM OBJECT    264  124446    1314      174315       0      30   33200640        0
        256 Bucket   2048    1149   10785   522000068       0     254   24440832        0
              mbuf    256   24752   56286 224742586945       0     254   20745728        0
          BUF TRIE    144    6267   99041       12431       0      62   15164352        0
       UMA Slabs 0    112  120097      29      156673       0     126   13454112        0
        RADIX NODE    144   59629    5103      154477       0      62    9321408        0
         vmem btag     56  135753    3691      135883       0     254    7808864        0
            socket    960     112    8016        2267       0     254    7802880        0
              pbuf   2512       0    2962           0       0       0    7440544        0
       mbuf_packet    256   16768    9902  9837509269       0     254    6827520        0
      malloc-65536  65536      44      33         700       0       1    5046272        0
       L VFS Cache    320    6359    4237       19741       0      30    3390720        0
           DIRHASH   1024    2408     280        2408       0      16    2752512        0
       malloc-4096   4096     547      67        4542       0       2    2514944        0
     FPU_save_area   2696     692      85        2026       0       4    2094792        0
      malloc-16384  16384      19     105         304       0       1    2031616        0
        malloc-128    128   11601    3031       83012       0     126    1872896        0
             unpcb    256      45    6825        1881       0     254    1758720        0
        128 Bucket   1024     466    1097      755733       0      18    1600512        0
            THREAD   1824     690     142         931       0       8    1517568        0
      malloc-32768  32768      36      10         157       0       1    1507328        0
        malloc-256    256    2141    1579       15213       0      62     952320        0
         64 Bucket    512     624    1056      206913       0      30     860160        0
         MAP ENTRY     96    3269    5425      163701       0     126     834624        0
         malloc-64     64    5034    7692      306248       0     254     814464        0
         32 Bucket    256     616    2324       55277       0      62     752640        0
       malloc-1024   1024     206     466       12113       0      16     688128        0
             NAMEI   1024       0     608     1314258       0      16     622592        0
         UMA Zones   4608     127       3         127       0       1     599040        0
       malloc-2048   2048      49     215         528       0       8     540672        0
              PROC   1376     113     272        2881       0       8     529760        0
          lkpicurr    168       2    3118           2       0      62     524160        0
         filedesc0   1072     114     362        2882       0       8     510272        0
          2 Bucket     32   10694    4804      139800       0     254     495936        0
        malloc-512    512     494     466         887       0      30     491520        0
              PGRP    120      21    3906         582       0     126     471240        0
          ksiginfo    112     267    3909        1064       0     126     467712        0
             g_bio    400       0    1140      570886       0      30     456000        0
        malloc-384    384     561     549        1297       0      30     426240        0
           VMSPACE    616      98     592        2866       0      16     425040        0
    ertt_txseginfo     40     989    9515 12981364553       0     254     420160        0
         malloc-32     32    5719    7007       14479       0     254     407232        0
              pipe    728      78     472        1557       0      16     400400        0
          4 Bucket     48     113    8035         570       0     254     391104        0
        SLEEPQUEUE     88     833    3583         833       0     126     388608        0
         tcp_inpcb   1304      65     232         265       0       8     387288        0
               PWD     40       8    8981      249907       0     254     359560        0
         16 Bucket    144     376    2032      100944       0      62     346752        0
         TURNSTILE    136     833    1603         833       0      62     331296        0
          8 Bucket     80     334    3766      109633       0     126     328000        0
          rl_entry     40     225    7956         283       0     254     327240        0
             Files     80     308    3742      335446       0     126     324000        0
            clpbuf   2512       0     128        2197       0       4     321536        0
          sackhole     32      10    9440    14771020       0     254     302400        0
              ertt     72      65    3967         265       0     126     290304        0
            DEVCTL   1024       0     276         187       0       0     282624        0
         malloc-16     16   10380    6620       91137       0     254     272000        0
   tcp_inpcb ports r32      65    8125         265       0     254     262080        0
       malloc-8192   8192      27       4         195       0       1     253952        0
           ttyoutq    256     105     825         423       0      62     238080        0
         udp_inpcb    424       2     538          87       0      30     228960        0
              kenv    258       9     681        1932       0      30     178020        0
            lkpimm     56       1    3095           1       0     254     173376        0
            cpuset    200      67     749          67       0      62     163200        0
            ttyinq    160     199     801         810       0      62     160000        0
           netlink   2048       0      64          39       0       8     131072        0
       Mountpoints   2944       9      23           9       0       4      94208        0
           pcpu-64     64     459     821         459       0     254      81920        0
     routing nhops    256       7     308          12       0      62      80640        0
            pcpu-8      8    3823    5905        3825       0     254      77824        0
           rtentry    168      10     302          12       0      62      52416        0
          UMA Kegs    384     101       2         101       0      30      39552        0
           pcpu-16     16      32    2016          32       0     254      32768        0
             ripcb    392       0      63           3       0      30      24696        0
           SMR CPU     32       7     760           7       0     254      24544        0
        SMR SHARED     24       7     760           7       0     254      18408        0
 epoch_record pcpu    256       4      60           4       0      62      16384        0
        TMPFS node    232       1      67           1       0      62      15776        0
              vmem   1856       1       7           1       0       8      14848        0
   udp_inpcb ports r32       1     377           1       0     254      12096        0
         hostcache     64       0     126           1       0     120       8064        0
      vtnet_tx_hdr     24       0     167           0       0     254       4008        0
       UMA Slabs 1    176      18       4          18       0      62       3872        0
        KMAP ENTRY     96      31       8          34       0       0       3744        0
       FFS1 dinode    128       0       0           0       0     126          0        0
             swblk    136       0       0           0       0      62          0        0
          swpctrie    144       0       0           0       0      62          0        0
            da_ccb    536       0       0           0       0      16          0        0
           ada_ccb    272       0       0           0       0      30          0        0
tfo_ccache_entries     80       0       0           0       0     126          0        0
               tfo      4       0       0           0       0     254          0        0
          syncache    168       0       0           0       0     120          0        0
               ipq     56       0       0           0       0     248          0        0
   tcp_log_id_node    120       0       0           0       0     126          0        0
 tcp_log_id_bucket    176       0       0           0       0      62          0        0
           tcp_log    416       0       0           0       0     254          0        0
          tcpreass     48       0       0           0       0     254          0        0
udplite_inpcb ports r32       0       0           0       0     254          0        0
     udplite_inpcb    424       0       0           0       0      30          0        0
       ripcb ports r32       0       0           0       0     254          0        0
    IPsec SA lft_c     16       0       0           0       0     254          0        0
             KNOTE    160       0       0           0       0      62          0        0
            itimer    352       0       0           0       0      30          0        0
            AIOLIO    272       0       0           0       0      30          0        0
             AIOCB    552       0       0           0       0      16          0        0
               AIO    208       0       0           0       0      62          0        0
           NCLNODE    760       0       0           0       0      16          0        0
     LTS VFS Cache    360       0       0           0       0      30          0        0
     STS VFS Cache    144       0       0           0       0      62          0        0
           cryptop    280       0       0           0       0      30          0        0
  linux_dma_object     32       0       0           0       0     254          0        0
  linux_dma_pctrie    144       0       0           0       0      62          0        0
   IOMMU_MAP_ENTRY    104       0       0           0       0     126          0        0
    mbuf_jumbo_16k  16384       0       0           0       0     254          0        0
     mbuf_jumbo_9k   9216       0       0           0       0     254          0        0
      audit_record   1280       0       0           0       0       8          0        0
         domainset     40       0       0           0       0     254          0        0
        MAC labels     40       0       0           0       0     254          0        0
            vnpbuf   2512       0       0           0       0      16          0        0
           nfspbuf   2512       0       0           0       0       4          0        0
            swwbuf   2512       0       0           0       0       2          0        0
            swrbuf   2512       0       0           0       0       4          0        0
          umtx_shm     88       0       0           0       0     126          0        0
           umtx pi     96       0       0           0       0     126          0        0
rangeset pctrie nodes    144       0       0           0       0      62          0        0
           pcpu-32     32       0       0           0       0     254          0        0
            pcpu-4      4       0       0           0       0     254          0        0
            fakepg    104       0       0           0       0     126          0        0
          UMA Hash    256       0       0           0       0      62          0        0
```

But I'm not sure if the counts hint at a leak or if they are just representing the memory footprint of the blocked threads.
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2024-09-26 14:10:40 UTC
(In reply to shailend from comment #6)
The memory utilization is low, so this is not a low memory deadlock.

We have an iperf thread which is holding a UMA zone lock and an inpcb lock, and it looks like this is blocking most everything else.  So, what is the iperf thread (100719 in the last paste) doing?  Could you please attach full procstat -kka output here, as a separate file?
Comment 8 shailend 2024-09-26 17:03:56 UTC
Created attachment 253834 [details]
procstat_kka

This is the output of procstat -kka, after the onset of a deadlock, with a single iperf thread holding onto the uma zone lock
Comment 9 shailend 2024-09-26 17:14:06 UTC
(In reply to Mark Johnston from comment #7)

Also the trace for the uma zone lock holding iperf thread:

```
db> trace 100719
Tracing pid 857 tid 100719 td 0xfffff800b87ca000
sched_switch() at sched_switch+0x5d2/frame 0xfffffe01a4e8c720
mi_switch() at mi_switch+0x171/frame 0xfffffe01a4e8c740
ipi_bitmap_handler() at ipi_bitmap_handler+0x86/frame 0xfffffe01a4e8c770
Xipi_intr_bitmap_handler() at Xipi_intr_bitmap_handler+0xb3/frame 0xfffffe01a4e8c770
--- interrupt, rip = 0xffffffff80ed8bed, rsp = 0xfffffe01a4e8c840, rbp = 0xfffffe01a4e8c870 ---
zone_fetch_bucket() at zone_fetch_bucket+0x8d/frame 0xfffffe01a4e8c870
cache_alloc() at cache_alloc+0x261/frame 0xfffffe01a4e8c8d0
cache_alloc_retry() at cache_alloc_retry+0x23/frame 0xfffffe01a4e8c910
tcp_m_copym() at tcp_m_copym+0x227/frame 0xfffffe01a4e8c9b0
tcp_default_output() at tcp_default_output+0x146c/frame 0xfffffe01a4e8cb80
tcp_usr_send() at tcp_usr_send+0x5a2/frame 0xfffffe01a4e8cc30
sosend_generic() at sosend_generic+0x5fe/frame 0xfffffe01a4e8cce0
sousrsend() at sousrsend+0x79/frame 0xfffffe01a4e8cd40
dofilewrite() at dofilewrite+0x7f/frame 0xfffffe01a4e8cd90
sys_write() at sys_write+0xb7/frame 0xfffffe01a4e8ce00
amd64_syscall() at amd64_syscall+0x147/frame 0xfffffe01a4e8cf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01a4e8cf30
```
Comment 10 shailend 2024-09-26 20:48:51 UTC
Superficially it looks like that the iperf thread 100719 was interrupted by an ipi while it held the uma zone lock. It is the only iperf thread in the "run" state, the rest are in "stop".
Comment 11 shailend 2024-09-26 22:37:22 UTC
Just a more succinct view of iperf thread 100719's central role in this deadlock:

```
db> show lockchain 100413
thread 100413 (pid 0, gve0 rxq 0) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100719 (pid 857, iperf) is on a run queue

db> show lockchain 100414
thread 100414 (pid 0, gve0 rxq 1) is blocked on lock 0xfffff800b86f8560 (rw) "tcpinp"
thread 100706 (pid 857, iperf) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100719 (pid 857, iperf) is on a run queue

db> show lockchain 100433
thread 100433 (pid 0, gve0 txq 2) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100719 (pid 857, iperf) is on a run queue

db> show lockchain 100419
thread 100419 (pid 0, gve0 rxq 6) is blocked on lock 0xfffff800b8d53020 (rw) "tcpinp"
thread 100703 (pid 857, iperf) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100719 (pid 857, iperf) is on a run queue
```
Comment 12 Andrew Gallatin freebsd_committer freebsd_triage 2024-09-26 22:37:43 UTC
Comment on attachment 253834 [details]
procstat_kka

Are we absolutely certain that this is a deadlock and not a livelock?  If you look at network counters, etc, seconds apart do you see any movement at all?  Is that iperf killable?

I'm asking this because while most of the uma locks are in the allocation path, but some are in the free path (0 100421 kernel              gve0 rxq 8          mi_switch+0x171 __mtx_lock_sleep+0x1ba __mtx_lock_flags+0xe1 zone_put_bucket+0x75 cache_free+0x2da uma_zfree_arg+0x21c mb_free_ext+0x126 m_free+0xda m_freem+0x48 tcp_do_segment+0x1e6b tcp_input_with_port+0x1135 tcp_input+0xb ip_input+0x2ae netisr_dispatch_src+0xad ether_demux+0x17a ether_nh_input+0x3f8 netisr_dispatch_src+0xad ether_input+0xe5),  and it appears that the iperf thread that holds that zone lock is actually doing something.

I've seen degenerate, live-locking, behavior in the past when allocations and frees overwhelm the per-cpu caches and there is contention for the zone lock.  Throughput drops by 80% or more, and the system becomes incredibly sluggish and its easy to think it is a deadlock.
Comment 13 shailend 2024-09-26 23:04:18 UTC
(In reply to Andrew Gallatin from comment #12)

Hmmm interesting. In this case though, I'm sure nothing is traversing the networking stack, and no cpu is being consumed. The offending thread seems to be in a weird state, almost like it went awol after it got that ipi.  

```
db> show thread 100719
Thread 100719 at 0xfffff800b87ca000:
 proc (pid 857): 0xfffffe018a340580
 name: iperf
 pcb: 0xfffff800b87ca520
 stack: 0xfffffe01a4e89000-0xfffffe01a4e8cfff
 flags: 0x5  pflags: 0x100
 state: RUNQ
 priority: 4
 container lock: sched lock 11 (0xfffffe001fd73440)
 last voluntary switch: 55821.710 s ago
 last involuntary switch: 55821.710 s ago
```
Comment 14 Konstantin Belousov freebsd_committer freebsd_triage 2024-09-26 23:16:17 UTC
(In reply to shailend from comment #13)
What does 'show pcpu 11' show?
Comment 15 shailend 2024-09-27 03:14:48 UTC
(In reply to Konstantin Belousov from comment #14)
Unfortunately I have lost access to the VMs in this repro and will need to make a fresh repro. I'll post the "show pcpu" for the new repro, hopefully it too enters a deadlock this way, with one uma-zone-lock-holding thread playing spoilsport. 

Forgot to answer one question on comment #12: No I wasn't able to kill iperf, it stuck around even after a `kill -9`.
Comment 16 Mark Johnston freebsd_committer freebsd_triage 2024-09-27 10:36:38 UTC
This smells a bit like a thread disabled interrupts and then went off-CPU somehow.  The iperf thread is stuck in the runqueue of a CPU and nothing gets scheduled there, so it doesn't run.

If this is not enough of a hint, we could try enabling scheduler tracing:
- add the following to your kernel config:
options KTR
options KTR_COMPILE=(KTR_SCHED)
options KTR_ENTRIES=(256*1024)
- before starting your test, enable KTR tracing:
# sysctl debug.ktr.enable=0x20000000
- once the hang occurs, use ktrdump (or "show ktr" in DDB) to dump scheduler traces
Comment 17 Konstantin Belousov freebsd_committer freebsd_triage 2024-09-27 14:59:19 UTC
(In reply to Mark Johnston from comment #16)
I doubt that system would stay silent about a CPU with disabled interrupts,
our IPI code does not tolerate such condition.

In fact, I asked about pcpu because for me it more sounds as if idle thread
does not exit an idle loop.  Note that this stuff occurring in the VM, and
we e.g. MWAIT emulation might be tricky.  Would the CPU in question sit in
idle thread, I suggest to change the idling method either to hlt or loop.
Comment 18 shailend 2024-10-02 23:17:37 UTC
(In reply to Konstantin Belousov from comment #14)

Although I do not have access to the VMs to do `show pcpu`, I checked my notes to find this `ps` entry:

100438                   Run     CPU 11                      [gve0 txq 4 xmit]

The packet transmitting thread is hogging the cpu and preventing iperf from ever running to release the uma lock. The "gve0 txq 4 xmit" is running forever because it is waiting on the tx cleanup thread to make room on the ring, and that thread is not doing anything because it is waiting on the uma zone lock.  

I did another repro, and the situation is similar:

```
db> show lockchain 100416
thread 100416 (pid 0, gve0 rxq 0) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
db> show lockchain 100423
thread 100423 (pid 0, gve0 rxq 7) is blocked on lock 0xfffff8010447daa0 (rw) "tcpinp"
thread 100736 (pid 860, iperf) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
db> show lockchain 100452
thread 100452 (pid 0, gve0 txq 10) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
```

Here 100708 is the offending iperf thread. Lets see its state:

```
db> show thread 100708
Thread 100708 at 0xfffff800a86bd000:
 proc (pid 860): 0xfffffe01a439bac0
 name: iperf
 pcb: 0xfffff800a86bd520
 stack: 0xfffffe01a4dc1000-0xfffffe01a4dc4fff
 flags: 0x5  pflags: 0x100
 state: RUNQ
 priority: 4
 container lock: sched lock 31 (0xfffffe001bee8440)
 last voluntary switch: 11510.470 s ago
 last involuntary switch: 11510.470 s ago
```

And now lets see whats happening on cpu 31:

```
db> show pcpu 31
cpuid        = 31
dynamic pcpu = 0xfffffe009a579d80
curthread    = 0xfffff800a8501740: pid 0 tid 100453 critnest 0 "gve0 txq 10 xmit"
curpcb       = 0xfffff800a8501c60
fpcurthread  = none
idlethread   = 0xfffff80003b04000: tid 100034 "idle: cpu31"
self         = 0xffffffff8242f000
curpmap      = 0xffffffff81b79c50
tssp         = 0xffffffff8242f384
rsp0         = 0xfffffe01a4ca8000
kcr3         = 0xffffffffffffffff
ucr3         = 0xffffffffffffffff
scr3         = 0x0
gs32p        = 0xffffffff8242f404
ldt          = 0xffffffff8242f444
tss          = 0xffffffff8242f434
curvnet      = 0
spin locks held:
```

Sure enough a driver transmit thread is hogging the cpu. And to seal the loop, lets see what this queue's cleanup thread is doing:

```
db> show lockchain 100452
thread 100452 (pid 0, gve0 txq 10) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
```

In summary this is the usual loop:

iperf thread (with uma zone lock) ---sched--->  gve tx xmit thread ---for room---> gve tx cleanup thread -----uma zone lock----> iperf thread  

There is clearly a problematic behavior in the driver transmit thread (gve_xmit_br): this taskqueue should not enqueue itself, and should rather let the cleanup taskqueue wake it up when room is made in the ring, so I'll work on that.  

But I also want to confirm that it is not problematic for an iperf thread to be knocked off the cpu with the zone lock held: is it not a critical enough lock to disallow that? (I am not very familiar with schedulers to know if this is a naive question).
Comment 19 Konstantin Belousov freebsd_committer freebsd_triage 2024-10-03 01:56:11 UTC
(In reply to shailend from comment #18)
Locks (except spinlocks) do not have any magic properties WRT disabling
scheduling.  So it is absolutely fine for a thread owning a lock to be
put off CPU if there is another thread with higher priority.  OTOH, for
locks like mutexes and rwlocks (that use turnstiles) the system propagates
priorities from lock contenders to lock owners to avoid priority inversion.

Which raises the question, what is the priority of that running tx thread
vs the priority of the lock owner?  Could it be that the tx thread was
assigned too high priority, causing this live-lock?
Comment 20 shailend 2024-10-03 03:33:54 UTC
(In reply to Konstantin Belousov from comment #19)

Thanks for the explanation. The iperf thread owning the lock and the driver thread looping on the cpu both have priority 4. The driver thread was created with `taskqueue_create_fast`.
Comment 21 Konstantin Belousov freebsd_committer freebsd_triage 2024-10-03 04:08:02 UTC
(In reply to shailend from comment #20)
Then, this is especially looks like a live-lock.

User thread should not have the priority 4, it is in the range of priorities of
the interrupt threads.  Since the iperf thread owns the lock, and driver thread
contends for the lock, most likely it is the result of priority propagation.
In other words, iperf thread must drop the lock to allow the system to make
a progress.  Perhaps it should do so after processing some fixed amount of work.
Comment 22 Mark Johnston freebsd_committer freebsd_triage 2024-10-04 15:27:20 UTC
To be clear, the problem is that gve_xmit_br() requeues itself when gve_xmit() is full (i.e., returns ENOBUFS)?  Shouldn't it be queuing a cleanup task?
Comment 23 shailend 2024-10-04 16:37:06 UTC
(In reply to Mark Johnston from comment #22)

Yup gve_xmit_br enqueueing itself is the problem. Since the cleanup task gve_tx_cleanup_tq already runs off of interrupts, I am thinking of fixing this by having gve_xmit_br marking itself as stopped when it sees ENOBUFS and the cleanup thread wake it up when room is made.
Comment 24 shailend 2024-10-15 20:49:15 UTC
Fixed in https://reviews.freebsd.org/D47138  

Thanks a lot for all the help @markj, @kib, and @gallatin!
Comment 25 Mark Johnston freebsd_committer freebsd_triage 2024-10-15 21:07:52 UTC
(In reply to shailend from comment #24)
Does that diff depend on the other two gve diffs which had been posted previously?  That is, in what order should they be reviewed?
Comment 26 shailend 2024-10-15 21:50:22 UTC
(In reply to Mark Johnston from comment #25)

It does, I found this bug only on top of the two prior changes:

https://reviews.freebsd.org/D46690
https://reviews.freebsd.org/D46691

and I figured fixing it once makes for less churn.  

So https://reviews.freebsd.org/D47138 needs to be reviewed after the two above patches.
Comment 27 Mark Johnston freebsd_committer freebsd_triage 2024-10-29 14:19:45 UTC
Closing this since the problem is now understood and only affects uncommitted code.