In https://reviews.freebsd.org/D46690 I am adding support for a newer queue format to gve that supports ~200Gbps of throughput. While stress testing these changes I ran into a deadlock that I'm unable to find a root cause for. This is reproduced by having two VMs sending bidirectional tcp traffic to each other, and the deadlock sets in after a few hours. I am listing the stack traces of the driver taskqueues after the deadlock sets in for two different occurrences of the deadlock: ``` [root@FreeBSD14 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done 0 100416 kernel gve0 rxq 0 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100417 kernel gve0 rxq 1 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100418 kernel gve0 rxq 2 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100419 kernel gve0 rxq 3 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100420 kernel gve0 rxq 4 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100421 kernel gve0 rxq 5 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_active 0 100422 kernel gve0 rxq 6 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100423 kernel gve0 rxq 7 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100424 kernel gve0 rxq 8 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100425 kernel gve0 rxq 9 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_active 0 100426 kernel gve0 rxq 10 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100427 kernel gve0 rxq 11 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100428 kernel gve0 rxq 12 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100429 kernel gve0 rxq 13 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100430 kernel gve0 rxq 14 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 0 100431 kernel gve0 rxq 15 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100434 kernel gve0 txq 1 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100448 kernel gve0 txq 8 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100456 kernel gve0 txq 12 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100458 kernel gve0 txq 13 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline ``` ``` [root@FreeBSD14 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done 0 100433 kernel gve0 rxq 0 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100434 kernel gve0 rxq 1 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100435 kernel gve0 rxq 2 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100436 kernel gve0 rxq 3 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100437 kernel gve0 rxq 4 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100438 kernel gve0 rxq 5 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100439 kernel gve0 rxq 6 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100440 kernel gve0 rxq 7 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100441 kernel gve0 rxq 8 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked 0 100442 kernel gve0 rxq 9 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked 0 100443 kernel gve0 rxq 10 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100444 kernel gve0 rxq 11 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100445 kernel gve0 rxq 12 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100446 kernel gve0 rxq 13 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100447 kernel gve0 rxq 14 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked 0 100448 kernel gve0 rxq 15 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all 0 100457 kernel gve0 txq 4 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100463 kernel gve0 txq 7 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100465 kernel gve0 txq 8 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100467 kernel gve0 txq 9 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100469 kernel gve0 txq 10 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem gve_handle_packet_completion gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline ``` I tried to look into any incorrect kpi use in the driver (https://reviews.freebsd.org/D46690) but couldn't find any offenses. It is rather straightforward: in the Rx paths it inputs cluster mbufs and m_freems transmitted mbufs in the Tx path. Thus I'm hesitantly suspecting some race condition in the uma stack given the traces. I reproduced the issue with INVARIANTS and WITNESS but they didn't catch anything. The issue also reproduces with a slightly different datapath mode in the driver (https://reviews.freebsd.org/D46691). In this mode, it is not cluster mbufs that are input in the Rx path usually, but rather mbufs with external storage attached. The stack traces of the driver taskqueues after the deadlock look like so: ``` [root@FreeBSD13 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done 0 100451 kernel gve0 rxq 1 mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 0 100452 kernel gve0 rxq 2 mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100453 kernel gve0 rxq 3 mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100454 kernel gve0 rxq 4 mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 0 100455 kernel gve0 rxq 5 mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 0 100457 kernel gve0 rxq 7 mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 0 100458 kernel gve0 rxq 8 mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 0 100459 kernel gve0 rxq 9 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush 0 100460 kernel gve0 rxq 10 mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 0 100461 kernel gve0 rxq 11 mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100462 kernel gve0 rxq 12 mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry tcp_m_copym tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100463 kernel gve0 rxq 13 mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100464 kernel gve0 rxq 14 mi_switch __rw_wlock_hard in_pcblookup_hash in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_flush_all gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit 0 100465 kernel gve0 rxq 15 mi_switch __mtx_lock_sleep cache_alloc cache_alloc_retry m_get2 gve_rx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100466 kernel gve0 txq 0 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100468 kernel gve0 txq 1 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100470 kernel gve0 txq 2 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100472 kernel gve0 txq 3 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100474 kernel gve0 txq 4 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100480 kernel gve0 txq 7 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100482 kernel gve0 txq 8 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100484 kernel gve0 txq 9 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100486 kernel gve0 txq 10 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100488 kernel gve0 txq 11 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100490 kernel gve0 txq 12 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100492 kernel gve0 txq 13 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100494 kernel gve0 txq 14 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100496 kernel gve0 txq 15 mi_switch __mtx_lock_sleep zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline ``` I've reproduced this in 14-1 and 13-1. Any guidance or help in uncovering the source of this deadlock would be very helpful!
If you have the deadlock, there must be a resource acquisition loop, which you should be able to unwind by looking at threads waiting for a resource, then finding the owner of the resource, then seeing what it is doing etc. That said, do you run with INVARIANTS and WITNESS enabled? If not, you must. For instance, only INVARIANTS-enabled kernel would detect an attempt to take a blockable lock inside smr section. Similarly, WITNESS might directly point to the resource loop for locks.
Actually I did run it with INVARIANTS and WITNESS and other options listed on https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks and the deadlock reproduces without any findings from INVARIANTS or WITNESS. `procstat -k ${tid?}` is the only only tool I know to look at the deadlocked threads and that hints at some locking issue in the uma zone cache (as per the per-thread procstat -k in comment #0). Are there some other ways I can look at the locked threads and more definitively figure out what resource they are blocked on and who owns it?
If you have WITNESS configured, then you can get an overview of the locks ownership on the system, using the 'show alllocks' ddb command. This should allow you to see lock owners, including the shared locks (like read-locked rwlocks). Another hint is that ddb 'ps' command output wchan ia the address of lock (or something very close to it) where the thread blocked. This should be enough to see the loops. Basically, take any blocked thread, see which lock it needs ('ps'), then find a thread that owns the lock. Now look at the backtrace of the lock owner and see why it does not make progress, e.g. typically waiting for yet another lock, and so on.
(In reply to Konstantin Belousov from comment #3) It'd also be useful to see output from "show page" and "show uma" from DDB. Perhaps there's a kernel memory leak somewhere?
(In reply to Mark Johnston from comment #4) To expand on this, the pattern of stack traces look very much like a degenerate behavior in the UMA memory allocator when the per-cpu caches are empty, and the allocation must go back to the zone, and you wind up taking a lock on every allocation. I'd check to see if there is a mbuf leak. The output of vmstat -z would be helpful here.
I reproduced this with invariants+witness+ddb, it takes much longer to hit the deadlock due to lowered throughput due to invariants and witness. Backtraces of locked driver threads: ``` [root@FreeBSD14 ~]# for locktid in $(procstat -ta | grep gve | grep lock | cut -d" " -f6); do procstat -k $locktid | grep $locktid; done 0 100413 kernel gve0 rxq 0 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100414 kernel gve0 rxq 1 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100415 kernel gve0 rxq 2 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100416 kernel gve0 rxq 3 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100417 kernel gve0 rxq 4 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 0 100418 kernel gve0 rxq 5 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done 0 100419 kernel gve0 rxq 6 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100420 kernel gve0 rxq 7 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100421 kernel gve0 rxq 8 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input 0 100422 kernel gve0 rxq 9 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 0 100423 kernel gve0 rxq 10 mi_switch __rw_wlock_hard _rw_wlock_cookie _inp_smr_lock in_pcblookup_hash_smr in_pcblookup_mbuf tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all gve_rx_cleanup_tq_dqo 0 100424 kernel gve0 rxq 11 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_default_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_lro_flush tcp_lro_rx_done tcp_lro_flush_all 0 100425 kernel gve0 rxq 12 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input 0 100426 kernel gve0 rxq 13 mi_switch __mtx_lock_sleep __mtx_lock_flags cache_alloc cache_alloc_retry tcp_m_copym tcp_default_output tcp_output tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input tcp_push_and_replace 0 100427 kernel gve0 rxq 14 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg mb_free_ext m_free m_freem tcp_do_segment tcp_input_with_port tcp_input ip_input netisr_dispatch_src ether_demux ether_nh_input netisr_dispatch_src ether_input 0 100433 kernel gve0 txq 2 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100437 kernel gve0 txq 4 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100443 kernel gve0 txq 7 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100453 kernel gve0 txq 12 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline 0 100459 kernel gve0 txq 15 mi_switch __mtx_lock_sleep __mtx_lock_flags zone_put_bucket cache_free uma_zfree_arg m_free m_freem gve_tx_cleanup_tq_dqo taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline ``` show alllocks output: ``` db> show alllocks Process 857 (iperf) thread 0xfffff800168c7000 (100673) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4d2c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800168c8740 (100674) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a020140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800168c8000 (100675) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e288c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800168cf740 (100676) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8ae9140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800168cf000 (100677) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e56dc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8845000 (100678) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b85708c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8770000 (100679) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a0cb8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8716000 (100680) exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016dbe8e0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e5e2020) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925 exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dbe8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8730000 (100681) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a14bc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87f5000 (100682) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e13ec80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87ab000 (100683) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e25e140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8827000 (100684) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b88b28c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b874e000 (100685) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6c78c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8750000 (100686) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e56d140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8764000 (100687) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6c7500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b875d000 (100688) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dbe500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b880c000 (100689) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e044500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8789000 (100690) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e2b0140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87e9740 (100691) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8d3ac80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87f6740 (100692) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e769c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8855000 (100693) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a05a500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87ca740 (100694) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e045c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87a0000 (100695) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b85b78c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b880c740 (100696) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4ad140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff80016da6000 (100697) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6e3c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87ff740 (100698) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016984c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b873b740 (100699) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016ff78c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b875b000 (100700) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8963c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87ff000 (100701) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcfc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8772740 (100702) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b89168c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8877740 (100703) exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016dcf8e0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8d53020) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925 exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcf8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8787740 (100704) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e044140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b881a740 (100705) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e39e140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b873b000 (100706) exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff800169848e0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b86f8560) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925 exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800169848c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8731740 (100707) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e28500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87dd000 (100708) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcf500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b883b740 (100709) exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e4aeca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e2fc560) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925 exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4aec80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8818740 (100710) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e28140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8845740 (100711) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a14b8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b878a000 (100712) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e2b1c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8764740 (100713) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016dcf140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b870a740 (100714) exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e4ae8e0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e2fc020) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925 exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4ae8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b878a740 (100715) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a05a140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff80013a2d000 (100716) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b85b7500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8716740 (100717) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e844500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87f6000 (100718) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a021c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87ca000 (100719) exclusive sleep mutex mbuf (UMA zone) r = 0 (0xfffffe00df57a3d0) locked @ /usr/src/sys/vm/uma_core.c:701 exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016ff7ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8ad0aa0) locked @ /usr/src/sys/netinet/tcp_usrreq.c:925 exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016ff7c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8797740 (100720) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8d578c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8857000 (100721) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e45bc80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8766740 (100722) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e7698c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87bb740 (100723) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e6c7140) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8878740 (100724) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a14b500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8751000 (100725) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016ff7500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8789740 (100726) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff80016e29c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b880a000 (100727) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8d57c80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87a0740 (100728) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8916500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b873d000 (100729) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e769500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87ac740 (100730) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a0218c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87e7740 (100731) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e45b8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8837740 (100732) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b88b2500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b8743000 (100733) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8aeac80) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87db000 (100734) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8002e4d28c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b875d740 (100735) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff800b8570500) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 857 (iperf) thread 0xfffff800b87dc740 (100736) exclusive sx so_snd_sx (so_snd_sx) r = 0 (0xfffff8011a2bf8c0) locked @ /usr/src/sys/kern/uipc_socket.c:4044 Process 0 (kernel) thread 0xfffff80016a34000 (100413) **gve rxq 0** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e56d160) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff80016f3caa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b626d000 (100415) **gve rxq 2** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff800b8963ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8960020) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b626e740 (100416) **gve rxq 3** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016dcfca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002eab9560) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b626e000 (100417) **gve rxq 4** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e56dca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002ee5a020) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b626f740 (100418) **gve rxq 5** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016984ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b86f8aa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b6270000 (100421) **gve rxq 8** exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b88af560) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b74c5740 (100422) **gve rxq 9** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e6e3ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8ae5aa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b74c6740 (100424) **gve rxq 11** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff800b8916520) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8914560) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b74c6000 (100425) **gve rxq 12** exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e9f2560) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b74c7740 (100426) **gve rxq 13** exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff8002e4d28e0) locked @ /usr/src/sys/netinet/tcp_output.c:350 exclusive rw tcpinp (tcpinp) r = 0 (0xfffff800b8556aa0) locked @ /usr/src/sys/netinet/in_pcb.c:1437 Process 0 (kernel) thread 0xfffff800b74c7000 (100427) **gve rxq 14** exclusive rw tcpinp (tcpinp) r = 0 (0xfffff8002e6bfaa0) locked @ /usr/src/sys/netinet/in_pcb.c:1421 ``` and the ps output of driver threads: ``` 100413 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 0] 100414 L *tcpinp 0xfffff80016ab20c0 [gve0 rxq 1] 100415 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 2] 100416 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 3] 100417 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 4] 100418 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 5] 100419 L *tcpinp 0xfffff80016ab3480 [gve0 rxq 6] 100420 L *tcpinp 0xfffff800b87c9840 [gve0 rxq 7] 100421 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 8] 100422 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 9] 100423 L *tcpinp 0xfffff80016ab2c00 [gve0 rxq 10] 100424 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 11] 100425 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 12] 100426 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 13] 100427 L *mbuf 0xfffff800b877e9c0 [gve0 rxq 14] 100428 D - 0xfffff800b82c7400 [gve0 rxq 15] 100429 D - 0xfffff800b82c7200 [gve0 txq 0] 100430 D - 0xfffff800b82c7000 [gve0 txq 0 xmit] 100431 D - 0xfffff800b828ad00 [gve0 txq 1] 100432 D - 0xfffff800b828ab00 [gve0 txq 1 xmit] 100433 L *mbuf 0xfffff800b877e9c0 [gve0 txq 2] 100434 D - 0xfffff800b828a700 [gve0 txq 2 xmit] 100435 D - 0xfffff800b828a500 [gve0 txq 3] 100436 D - 0xfffff800b828a300 [gve0 txq 3 xmit] 100437 L *mbuf 0xfffff800b877e9c0 [gve0 txq 4] 100438 Run CPU 11 [gve0 txq 4 xmit] 100439 D - 0xfffff800b828bc00 [gve0 txq 5] 100440 D - 0xfffff800b828ba00 [gve0 txq 5 xmit] 100441 D - 0xfffff800b828b800 [gve0 txq 6] 100442 D - 0xfffff800b828b600 [gve0 txq 6 xmit] 100443 L *mbuf 0xfffff800b877e9c0 [gve0 txq 7] 100444 D - 0xfffff800b828b200 [gve0 txq 7 xmit] 100445 D - 0xfffff800b828b000 [gve0 txq 8] 100446 D - 0xfffff800b828cd00 [gve0 txq 8 xmit] 100447 D - 0xfffff800b828cb00 [gve0 txq 9] 100448 D - 0xfffff800b828c900 [gve0 txq 9 xmit] 100449 D - 0xfffff800b828c700 [gve0 txq 10] 100450 D - 0xfffff800b828c500 [gve0 txq 10 xmit] 100451 D - 0xfffff800b828c300 [gve0 txq 11] 100452 D - 0xfffff800b828c100 [gve0 txq 11 xmit] 100453 L *mbuf 0xfffff800b877e9c0 [gve0 txq 12] 100454 D - 0xfffff8002e54dc00 [gve0 txq 12 xmit] 100455 D - 0xfffff8002e54da00 [gve0 txq 13] 100456 D - 0xfffff8002e54d800 [gve0 txq 13 xmit] 100457 D - 0xfffff8002e54d600 [gve0 txq 14] 100458 D - 0xfffff8002e54d400 [gve0 txq 14 xmit] 100459 L *mbuf 0xfffff800b877e9c0 [gve0 txq 15] 100460 D - 0xfffff800b84fc400 [gve0 txq 15 xmit] ``` ddb backtrace of one of the locked driver threads: ``` db> trace 100413 Tracing pid 0 tid 100413 td 0xfffff80016a34000 sched_switch() at sched_switch+0x5d2/frame 0xfffffe018a1fe4d0 mi_switch() at mi_switch+0x171/frame 0xfffffe018a1fe4f0 __mtx_lock_sleep() at __mtx_lock_sleep+0x1ba/frame 0xfffffe018a1fe580 __mtx_lock_flags() at __mtx_lock_flags+0xe1/frame 0xfffffe018a1fe5d0 cache_alloc() at cache_alloc+0x237/frame 0xfffffe018a1fe630 cache_alloc_retry() at cache_alloc_retry+0x23/frame 0xfffffe018a1fe670 tcp_m_copym() at tcp_m_copym+0x227/frame 0xfffffe018a1fe710 tcp_default_output() at tcp_default_output+0x146c/frame 0xfffffe018a1fe8e0 tcp_do_segment() at tcp_do_segment+0x24c9/frame 0xfffffe018a1fe9c0 tcp_input_with_port() at tcp_input_with_port+0x1135/frame 0xfffffe018a1feb10 tcp_input() at tcp_input+0xb/frame 0xfffffe018a1feb20 ip_input() at ip_input+0x2ae/frame 0xfffffe018a1feb80 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe018a1febe0 ether_demux() at ether_demux+0x17a/frame 0xfffffe018a1fec10 ether_nh_input() at ether_nh_input+0x3f8/frame 0xfffffe018a1fec60 netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfffffe018a1fecc0 ether_input() at ether_input+0xe5/frame 0xfffffe018a1fed20 tcp_lro_flush() at tcp_lro_flush+0x34f/frame 0xfffffe018a1fed50 tcp_lro_rx_done() at tcp_lro_rx_done+0x37/frame 0xfffffe018a1fed70 tcp_lro_flush_all() at tcp_lro_flush_all+0x16e/frame 0xfffffe018a1fedc0 gve_rx_cleanup_tq_dqo() at gve_rx_cleanup_tq_dqo+0x854/frame 0xfffffe018a1fee40 ``` Now in trying to find the resource loop, I initially failed because most driver threads are waiting on the uma zone lock (the lock that cache_alloc presumably needs) and this lock's address in the wchan column is 0xfffff800b877e9c0. But 0xfffff800b877e9c0 does not show up in the `show alllocks` output at all. However, there _is_ an owner for the uma zone lock there: ``` Process 857 (iperf) thread 0xfffff800b87ca000 (100719) exclusive sleep mutex mbuf (UMA zone) r = 0 (0xfffffe00df57a3d0) locked @ /usr/src/sys/vm/uma_core.c:701 exclusive sleep mutex so_snd (so_snd) r = 0 (0xfffff80016ff7ca0) locked @ /usr/src/sys/netinet/tcp_output.c:350 ``` In holding this lock, this iperf thread is probably holding up the driver threads. I'm not sure what to make of the fact that the addresses dont match: 0xfffffe00df57a3d0 (iperf) vs 0xfffff800b877e9c0 (gve). Especially given that we know both iperf and gve are getting at the mbuf zone. As advised in comment #4 and comment #5, I looked at the uma, page and vmstat outputs: ``` [root@FreeBSD14 ~]# vmstat -z | grep -E "^ITEM|mbuf" ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP XDOM mbuf_packet: 256, 0, 16768, 9902,9837509268, 0, 0, 0 mbuf: 256,6513896, 24752, 56286,224742586944, 0, 0, 0 mbuf_cluster: 2048,1017796, 26670, 254, 26670, 0, 0, 0 mbuf_jumbo_page: 4096, 508898, 21827, 25925,104428955976, 0, 0, 0 mbuf_jumbo_9k: 9216, 150784, 0, 0, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 84816, 0, 0, 0, 0, 0, 0 ``` ``` [root@FreeBSD14 ~]# netstat -m 41520/66188/107708 mbufs in use (current/cache/total) 16768/10156/26924/1017796 mbuf clusters in use (current/cache/total/max) 16768/9902 mbuf+clusters out of packet secondary zone in use (current/cache) 21827/25925/47752/508898 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/150784 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84816 16k jumbo clusters in use (current/cache/total/max) 131224K/140559K/271783K bytes allocated to network (current/cache/total) ... ``` ``` db> show page vm_cnt.v_free_count: 3261805 vm_cnt.v_inactive_count: 156959 vm_cnt.v_active_count: 36342 vm_cnt.v_laundry_count: 0 vm_cnt.v_wire_count: 649216 vm_cnt.v_free_reserved: 5367 vm_cnt.v_free_min: 25722 vm_cnt.v_free_target: 86787 vm_cnt.v_inactive_target: 130180 db> show uma Zone Size Used Free Requests Sleeps Bucket Total Mem XFree FFS inode 1168 349865 331 864424 0 8 409028928 0 VNODE 600 349906 665 864497 0 16 210342600 0 mbuf_jumbo_page 4096 21827 25925 104428955976 0 254 195592192 0 FFS2 dinode 256 349865 1120 864424 0 62 89852160 0 mbuf_cluster 2048 26670 254 26670 0 254 55140352 0 S VFS Cache 104 356473 7124 870718 0 126 37814088 0 VM OBJECT 264 124446 1314 174315 0 30 33200640 0 256 Bucket 2048 1149 10785 522000068 0 254 24440832 0 mbuf 256 24752 56286 224742586945 0 254 20745728 0 BUF TRIE 144 6267 99041 12431 0 62 15164352 0 UMA Slabs 0 112 120097 29 156673 0 126 13454112 0 RADIX NODE 144 59629 5103 154477 0 62 9321408 0 vmem btag 56 135753 3691 135883 0 254 7808864 0 socket 960 112 8016 2267 0 254 7802880 0 pbuf 2512 0 2962 0 0 0 7440544 0 mbuf_packet 256 16768 9902 9837509269 0 254 6827520 0 malloc-65536 65536 44 33 700 0 1 5046272 0 L VFS Cache 320 6359 4237 19741 0 30 3390720 0 DIRHASH 1024 2408 280 2408 0 16 2752512 0 malloc-4096 4096 547 67 4542 0 2 2514944 0 FPU_save_area 2696 692 85 2026 0 4 2094792 0 malloc-16384 16384 19 105 304 0 1 2031616 0 malloc-128 128 11601 3031 83012 0 126 1872896 0 unpcb 256 45 6825 1881 0 254 1758720 0 128 Bucket 1024 466 1097 755733 0 18 1600512 0 THREAD 1824 690 142 931 0 8 1517568 0 malloc-32768 32768 36 10 157 0 1 1507328 0 malloc-256 256 2141 1579 15213 0 62 952320 0 64 Bucket 512 624 1056 206913 0 30 860160 0 MAP ENTRY 96 3269 5425 163701 0 126 834624 0 malloc-64 64 5034 7692 306248 0 254 814464 0 32 Bucket 256 616 2324 55277 0 62 752640 0 malloc-1024 1024 206 466 12113 0 16 688128 0 NAMEI 1024 0 608 1314258 0 16 622592 0 UMA Zones 4608 127 3 127 0 1 599040 0 malloc-2048 2048 49 215 528 0 8 540672 0 PROC 1376 113 272 2881 0 8 529760 0 lkpicurr 168 2 3118 2 0 62 524160 0 filedesc0 1072 114 362 2882 0 8 510272 0 2 Bucket 32 10694 4804 139800 0 254 495936 0 malloc-512 512 494 466 887 0 30 491520 0 PGRP 120 21 3906 582 0 126 471240 0 ksiginfo 112 267 3909 1064 0 126 467712 0 g_bio 400 0 1140 570886 0 30 456000 0 malloc-384 384 561 549 1297 0 30 426240 0 VMSPACE 616 98 592 2866 0 16 425040 0 ertt_txseginfo 40 989 9515 12981364553 0 254 420160 0 malloc-32 32 5719 7007 14479 0 254 407232 0 pipe 728 78 472 1557 0 16 400400 0 4 Bucket 48 113 8035 570 0 254 391104 0 SLEEPQUEUE 88 833 3583 833 0 126 388608 0 tcp_inpcb 1304 65 232 265 0 8 387288 0 PWD 40 8 8981 249907 0 254 359560 0 16 Bucket 144 376 2032 100944 0 62 346752 0 TURNSTILE 136 833 1603 833 0 62 331296 0 8 Bucket 80 334 3766 109633 0 126 328000 0 rl_entry 40 225 7956 283 0 254 327240 0 Files 80 308 3742 335446 0 126 324000 0 clpbuf 2512 0 128 2197 0 4 321536 0 sackhole 32 10 9440 14771020 0 254 302400 0 ertt 72 65 3967 265 0 126 290304 0 DEVCTL 1024 0 276 187 0 0 282624 0 malloc-16 16 10380 6620 91137 0 254 272000 0 tcp_inpcb ports r32 65 8125 265 0 254 262080 0 malloc-8192 8192 27 4 195 0 1 253952 0 ttyoutq 256 105 825 423 0 62 238080 0 udp_inpcb 424 2 538 87 0 30 228960 0 kenv 258 9 681 1932 0 30 178020 0 lkpimm 56 1 3095 1 0 254 173376 0 cpuset 200 67 749 67 0 62 163200 0 ttyinq 160 199 801 810 0 62 160000 0 netlink 2048 0 64 39 0 8 131072 0 Mountpoints 2944 9 23 9 0 4 94208 0 pcpu-64 64 459 821 459 0 254 81920 0 routing nhops 256 7 308 12 0 62 80640 0 pcpu-8 8 3823 5905 3825 0 254 77824 0 rtentry 168 10 302 12 0 62 52416 0 UMA Kegs 384 101 2 101 0 30 39552 0 pcpu-16 16 32 2016 32 0 254 32768 0 ripcb 392 0 63 3 0 30 24696 0 SMR CPU 32 7 760 7 0 254 24544 0 SMR SHARED 24 7 760 7 0 254 18408 0 epoch_record pcpu 256 4 60 4 0 62 16384 0 TMPFS node 232 1 67 1 0 62 15776 0 vmem 1856 1 7 1 0 8 14848 0 udp_inpcb ports r32 1 377 1 0 254 12096 0 hostcache 64 0 126 1 0 120 8064 0 vtnet_tx_hdr 24 0 167 0 0 254 4008 0 UMA Slabs 1 176 18 4 18 0 62 3872 0 KMAP ENTRY 96 31 8 34 0 0 3744 0 FFS1 dinode 128 0 0 0 0 126 0 0 swblk 136 0 0 0 0 62 0 0 swpctrie 144 0 0 0 0 62 0 0 da_ccb 536 0 0 0 0 16 0 0 ada_ccb 272 0 0 0 0 30 0 0 tfo_ccache_entries 80 0 0 0 0 126 0 0 tfo 4 0 0 0 0 254 0 0 syncache 168 0 0 0 0 120 0 0 ipq 56 0 0 0 0 248 0 0 tcp_log_id_node 120 0 0 0 0 126 0 0 tcp_log_id_bucket 176 0 0 0 0 62 0 0 tcp_log 416 0 0 0 0 254 0 0 tcpreass 48 0 0 0 0 254 0 0 udplite_inpcb ports r32 0 0 0 0 254 0 0 udplite_inpcb 424 0 0 0 0 30 0 0 ripcb ports r32 0 0 0 0 254 0 0 IPsec SA lft_c 16 0 0 0 0 254 0 0 KNOTE 160 0 0 0 0 62 0 0 itimer 352 0 0 0 0 30 0 0 AIOLIO 272 0 0 0 0 30 0 0 AIOCB 552 0 0 0 0 16 0 0 AIO 208 0 0 0 0 62 0 0 NCLNODE 760 0 0 0 0 16 0 0 LTS VFS Cache 360 0 0 0 0 30 0 0 STS VFS Cache 144 0 0 0 0 62 0 0 cryptop 280 0 0 0 0 30 0 0 linux_dma_object 32 0 0 0 0 254 0 0 linux_dma_pctrie 144 0 0 0 0 62 0 0 IOMMU_MAP_ENTRY 104 0 0 0 0 126 0 0 mbuf_jumbo_16k 16384 0 0 0 0 254 0 0 mbuf_jumbo_9k 9216 0 0 0 0 254 0 0 audit_record 1280 0 0 0 0 8 0 0 domainset 40 0 0 0 0 254 0 0 MAC labels 40 0 0 0 0 254 0 0 vnpbuf 2512 0 0 0 0 16 0 0 nfspbuf 2512 0 0 0 0 4 0 0 swwbuf 2512 0 0 0 0 2 0 0 swrbuf 2512 0 0 0 0 4 0 0 umtx_shm 88 0 0 0 0 126 0 0 umtx pi 96 0 0 0 0 126 0 0 rangeset pctrie nodes 144 0 0 0 0 62 0 0 pcpu-32 32 0 0 0 0 254 0 0 pcpu-4 4 0 0 0 0 254 0 0 fakepg 104 0 0 0 0 126 0 0 UMA Hash 256 0 0 0 0 62 0 0 ``` But I'm not sure if the counts hint at a leak or if they are just representing the memory footprint of the blocked threads.
(In reply to shailend from comment #6) The memory utilization is low, so this is not a low memory deadlock. We have an iperf thread which is holding a UMA zone lock and an inpcb lock, and it looks like this is blocking most everything else. So, what is the iperf thread (100719 in the last paste) doing? Could you please attach full procstat -kka output here, as a separate file?
Created attachment 253834 [details] procstat_kka This is the output of procstat -kka, after the onset of a deadlock, with a single iperf thread holding onto the uma zone lock
(In reply to Mark Johnston from comment #7) Also the trace for the uma zone lock holding iperf thread: ``` db> trace 100719 Tracing pid 857 tid 100719 td 0xfffff800b87ca000 sched_switch() at sched_switch+0x5d2/frame 0xfffffe01a4e8c720 mi_switch() at mi_switch+0x171/frame 0xfffffe01a4e8c740 ipi_bitmap_handler() at ipi_bitmap_handler+0x86/frame 0xfffffe01a4e8c770 Xipi_intr_bitmap_handler() at Xipi_intr_bitmap_handler+0xb3/frame 0xfffffe01a4e8c770 --- interrupt, rip = 0xffffffff80ed8bed, rsp = 0xfffffe01a4e8c840, rbp = 0xfffffe01a4e8c870 --- zone_fetch_bucket() at zone_fetch_bucket+0x8d/frame 0xfffffe01a4e8c870 cache_alloc() at cache_alloc+0x261/frame 0xfffffe01a4e8c8d0 cache_alloc_retry() at cache_alloc_retry+0x23/frame 0xfffffe01a4e8c910 tcp_m_copym() at tcp_m_copym+0x227/frame 0xfffffe01a4e8c9b0 tcp_default_output() at tcp_default_output+0x146c/frame 0xfffffe01a4e8cb80 tcp_usr_send() at tcp_usr_send+0x5a2/frame 0xfffffe01a4e8cc30 sosend_generic() at sosend_generic+0x5fe/frame 0xfffffe01a4e8cce0 sousrsend() at sousrsend+0x79/frame 0xfffffe01a4e8cd40 dofilewrite() at dofilewrite+0x7f/frame 0xfffffe01a4e8cd90 sys_write() at sys_write+0xb7/frame 0xfffffe01a4e8ce00 amd64_syscall() at amd64_syscall+0x147/frame 0xfffffe01a4e8cf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01a4e8cf30 ```
Superficially it looks like that the iperf thread 100719 was interrupted by an ipi while it held the uma zone lock. It is the only iperf thread in the "run" state, the rest are in "stop".
Just a more succinct view of iperf thread 100719's central role in this deadlock: ``` db> show lockchain 100413 thread 100413 (pid 0, gve0 rxq 0) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100719 (pid 857, iperf) is on a run queue db> show lockchain 100414 thread 100414 (pid 0, gve0 rxq 1) is blocked on lock 0xfffff800b86f8560 (rw) "tcpinp" thread 100706 (pid 857, iperf) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100719 (pid 857, iperf) is on a run queue db> show lockchain 100433 thread 100433 (pid 0, gve0 txq 2) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100719 (pid 857, iperf) is on a run queue db> show lockchain 100419 thread 100419 (pid 0, gve0 rxq 6) is blocked on lock 0xfffff800b8d53020 (rw) "tcpinp" thread 100703 (pid 857, iperf) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100719 (pid 857, iperf) is on a run queue ```
Comment on attachment 253834 [details] procstat_kka Are we absolutely certain that this is a deadlock and not a livelock? If you look at network counters, etc, seconds apart do you see any movement at all? Is that iperf killable? I'm asking this because while most of the uma locks are in the allocation path, but some are in the free path (0 100421 kernel gve0 rxq 8 mi_switch+0x171 __mtx_lock_sleep+0x1ba __mtx_lock_flags+0xe1 zone_put_bucket+0x75 cache_free+0x2da uma_zfree_arg+0x21c mb_free_ext+0x126 m_free+0xda m_freem+0x48 tcp_do_segment+0x1e6b tcp_input_with_port+0x1135 tcp_input+0xb ip_input+0x2ae netisr_dispatch_src+0xad ether_demux+0x17a ether_nh_input+0x3f8 netisr_dispatch_src+0xad ether_input+0xe5), and it appears that the iperf thread that holds that zone lock is actually doing something. I've seen degenerate, live-locking, behavior in the past when allocations and frees overwhelm the per-cpu caches and there is contention for the zone lock. Throughput drops by 80% or more, and the system becomes incredibly sluggish and its easy to think it is a deadlock.
(In reply to Andrew Gallatin from comment #12) Hmmm interesting. In this case though, I'm sure nothing is traversing the networking stack, and no cpu is being consumed. The offending thread seems to be in a weird state, almost like it went awol after it got that ipi. ``` db> show thread 100719 Thread 100719 at 0xfffff800b87ca000: proc (pid 857): 0xfffffe018a340580 name: iperf pcb: 0xfffff800b87ca520 stack: 0xfffffe01a4e89000-0xfffffe01a4e8cfff flags: 0x5 pflags: 0x100 state: RUNQ priority: 4 container lock: sched lock 11 (0xfffffe001fd73440) last voluntary switch: 55821.710 s ago last involuntary switch: 55821.710 s ago ```
(In reply to shailend from comment #13) What does 'show pcpu 11' show?
(In reply to Konstantin Belousov from comment #14) Unfortunately I have lost access to the VMs in this repro and will need to make a fresh repro. I'll post the "show pcpu" for the new repro, hopefully it too enters a deadlock this way, with one uma-zone-lock-holding thread playing spoilsport. Forgot to answer one question on comment #12: No I wasn't able to kill iperf, it stuck around even after a `kill -9`.
This smells a bit like a thread disabled interrupts and then went off-CPU somehow. The iperf thread is stuck in the runqueue of a CPU and nothing gets scheduled there, so it doesn't run. If this is not enough of a hint, we could try enabling scheduler tracing: - add the following to your kernel config: options KTR options KTR_COMPILE=(KTR_SCHED) options KTR_ENTRIES=(256*1024) - before starting your test, enable KTR tracing: # sysctl debug.ktr.enable=0x20000000 - once the hang occurs, use ktrdump (or "show ktr" in DDB) to dump scheduler traces
(In reply to Mark Johnston from comment #16) I doubt that system would stay silent about a CPU with disabled interrupts, our IPI code does not tolerate such condition. In fact, I asked about pcpu because for me it more sounds as if idle thread does not exit an idle loop. Note that this stuff occurring in the VM, and we e.g. MWAIT emulation might be tricky. Would the CPU in question sit in idle thread, I suggest to change the idling method either to hlt or loop.
(In reply to Konstantin Belousov from comment #14) Although I do not have access to the VMs to do `show pcpu`, I checked my notes to find this `ps` entry: 100438 Run CPU 11 [gve0 txq 4 xmit] The packet transmitting thread is hogging the cpu and preventing iperf from ever running to release the uma lock. The "gve0 txq 4 xmit" is running forever because it is waiting on the tx cleanup thread to make room on the ring, and that thread is not doing anything because it is waiting on the uma zone lock. I did another repro, and the situation is similar: ``` db> show lockchain 100416 thread 100416 (pid 0, gve0 rxq 0) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100708 (pid 860, iperf) is on a run queue db> show lockchain 100423 thread 100423 (pid 0, gve0 rxq 7) is blocked on lock 0xfffff8010447daa0 (rw) "tcpinp" thread 100736 (pid 860, iperf) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100708 (pid 860, iperf) is on a run queue db> show lockchain 100452 thread 100452 (pid 0, gve0 txq 10) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100708 (pid 860, iperf) is on a run queue ``` Here 100708 is the offending iperf thread. Lets see its state: ``` db> show thread 100708 Thread 100708 at 0xfffff800a86bd000: proc (pid 860): 0xfffffe01a439bac0 name: iperf pcb: 0xfffff800a86bd520 stack: 0xfffffe01a4dc1000-0xfffffe01a4dc4fff flags: 0x5 pflags: 0x100 state: RUNQ priority: 4 container lock: sched lock 31 (0xfffffe001bee8440) last voluntary switch: 11510.470 s ago last involuntary switch: 11510.470 s ago ``` And now lets see whats happening on cpu 31: ``` db> show pcpu 31 cpuid = 31 dynamic pcpu = 0xfffffe009a579d80 curthread = 0xfffff800a8501740: pid 0 tid 100453 critnest 0 "gve0 txq 10 xmit" curpcb = 0xfffff800a8501c60 fpcurthread = none idlethread = 0xfffff80003b04000: tid 100034 "idle: cpu31" self = 0xffffffff8242f000 curpmap = 0xffffffff81b79c50 tssp = 0xffffffff8242f384 rsp0 = 0xfffffe01a4ca8000 kcr3 = 0xffffffffffffffff ucr3 = 0xffffffffffffffff scr3 = 0x0 gs32p = 0xffffffff8242f404 ldt = 0xffffffff8242f444 tss = 0xffffffff8242f434 curvnet = 0 spin locks held: ``` Sure enough a driver transmit thread is hogging the cpu. And to seal the loop, lets see what this queue's cleanup thread is doing: ``` db> show lockchain 100452 thread 100452 (pid 0, gve0 txq 10) is blocked on lock 0xfffffe00df57a3d0 (sleep mutex) "mbuf" thread 100708 (pid 860, iperf) is on a run queue ``` In summary this is the usual loop: iperf thread (with uma zone lock) ---sched---> gve tx xmit thread ---for room---> gve tx cleanup thread -----uma zone lock----> iperf thread There is clearly a problematic behavior in the driver transmit thread (gve_xmit_br): this taskqueue should not enqueue itself, and should rather let the cleanup taskqueue wake it up when room is made in the ring, so I'll work on that. But I also want to confirm that it is not problematic for an iperf thread to be knocked off the cpu with the zone lock held: is it not a critical enough lock to disallow that? (I am not very familiar with schedulers to know if this is a naive question).
(In reply to shailend from comment #18) Locks (except spinlocks) do not have any magic properties WRT disabling scheduling. So it is absolutely fine for a thread owning a lock to be put off CPU if there is another thread with higher priority. OTOH, for locks like mutexes and rwlocks (that use turnstiles) the system propagates priorities from lock contenders to lock owners to avoid priority inversion. Which raises the question, what is the priority of that running tx thread vs the priority of the lock owner? Could it be that the tx thread was assigned too high priority, causing this live-lock?
(In reply to Konstantin Belousov from comment #19) Thanks for the explanation. The iperf thread owning the lock and the driver thread looping on the cpu both have priority 4. The driver thread was created with `taskqueue_create_fast`.
(In reply to shailend from comment #20) Then, this is especially looks like a live-lock. User thread should not have the priority 4, it is in the range of priorities of the interrupt threads. Since the iperf thread owns the lock, and driver thread contends for the lock, most likely it is the result of priority propagation. In other words, iperf thread must drop the lock to allow the system to make a progress. Perhaps it should do so after processing some fixed amount of work.
To be clear, the problem is that gve_xmit_br() requeues itself when gve_xmit() is full (i.e., returns ENOBUFS)? Shouldn't it be queuing a cleanup task?
(In reply to Mark Johnston from comment #22) Yup gve_xmit_br enqueueing itself is the problem. Since the cleanup task gve_tx_cleanup_tq already runs off of interrupts, I am thinking of fixing this by having gve_xmit_br marking itself as stopped when it sees ENOBUFS and the cleanup thread wake it up when room is made.
Fixed in https://reviews.freebsd.org/D47138 Thanks a lot for all the help @markj, @kib, and @gallatin!
(In reply to shailend from comment #24) Does that diff depend on the other two gve diffs which had been posted previously? That is, in what order should they be reviewed?
(In reply to Mark Johnston from comment #25) It does, I found this bug only on top of the two prior changes: https://reviews.freebsd.org/D46690 https://reviews.freebsd.org/D46691 and I figured fixing it once makes for less churn. So https://reviews.freebsd.org/D47138 needs to be reviewed after the two above patches.
Closing this since the problem is now understood and only affects uncommitted code.