I have been doing stress testing with SPECSFS on AWS with SRIOV feature with ixv driver and getting into issues. Further debugging showed this problem in the driver. When mbuf chain in the ixgbe_xmit() path contains large number of segments, mapping for dma through bus_dmamap_load_mbuf_sg() will fail with EFBIG and will try the mapping after m_defrag(). If the total size of the data is more than 64K mapping will fail again and ixgbe_xmit() will fail. This failed packet will be put back to the queue and the xmit failure and retry will continue indefinitely. Proposed fix is to drop the packet when mapping fails with EFBIG after defragmentation. diff --git a/sys/dev/ixgbe/ix_txrx.c b/sys/dev/ixgbe/ix_txrx.c index 35c1ddd..d9254e2 100644 --- a/sys/dev/ixgbe/ix_txrx.c +++ b/sys/dev/ixgbe/ix_txrx.c @@ -389,8 +389,11 @@ retry: } *m_headp = m; goto retry; - } else - return (error); + } + txr->no_tx_dma_setup++; + m_freem(*m_headp); + *m_headp = NULL; + return (error); case ENOMEM: txr->no_tx_dma_setup++; return (error);
Do you know what sort of packets are causing this? We shouldn't be running into problems with TCP -- the TSO code is theoretically aware of the device's limitations. If it's UDP packets which are getting stuck then yes, the answer is to drop them.
Yes, they were UDP packets.
The proposed patch doesn't apply, I assume the reported issue has been already fixed. Closing, if this is still relevant, please reopen.