Bug 252835 - ice(4) driver queue timeout is too short
Summary: ice(4) driver queue timeout is too short
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2021-01-19 17:27 UTC by Brian Poole
Modified: 2021-01-19 20:48 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Poole 2021-01-19 17:27:24 UTC
Hello,

I have been testing an Intel E810-CQDA 2x100GbE network card using FreeBSD-12.2 (GERNRIC+INVARIANTS). When using both ports in netmap mode with 16 queues, I have observed a panic after seeing messages like:

ice1: Rx queue 13 disable timeout

(seen on both ice0 and ice1, different queue numbers) print to the console. Digging into the cause of that error message, I discovered it is printed when ice_is_rxq_ready() returns ETIMEDOUT because the queue is not in a consistent state after looping ICE_Q_WAIT_RETRY_LIMIT(5) times with a delay of 10us each loop.

For testing, I increased ICE_Q_WAIT_RETRY_LIMIT to 500 (total delay of 5ms possible) and returned the index from ice_is_rxq_ready() when the state became consistent so I could view the actual delays required.

A few minute test starting and stopping pkt-gen instances on both ice0 and ice1 gave me the following data. Second column is delay in us; first column is count of rows.

4952 0
2561 10
 186 20
  92 40
  78 30
  22 50
   5 60
   2 130
   1 70
   1 90
   1 100
   1 120
   1 150
   1 190

So the default timeout of 50us does cover the majority of queue commands but in just a few minutes I observed the driver needing up to 190us for a queue to become consistent. This testing was performed on an AMD Threadripper 3990x supporting PCIe4.