Bug 252835 - ice(4) driver queue timeout is too short
Summary: ice(4) driver queue timeout is too short
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Intel FreeBSD
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2021-01-19 17:27 UTC by Brian Poole
Modified: 2023-02-09 19:28 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Poole 2021-01-19 17:27:24 UTC
Hello,

I have been testing an Intel E810-CQDA 2x100GbE network card using FreeBSD-12.2 (GERNRIC+INVARIANTS). When using both ports in netmap mode with 16 queues, I have observed a panic after seeing messages like:

ice1: Rx queue 13 disable timeout

(seen on both ice0 and ice1, different queue numbers) print to the console. Digging into the cause of that error message, I discovered it is printed when ice_is_rxq_ready() returns ETIMEDOUT because the queue is not in a consistent state after looping ICE_Q_WAIT_RETRY_LIMIT(5) times with a delay of 10us each loop.

For testing, I increased ICE_Q_WAIT_RETRY_LIMIT to 500 (total delay of 5ms possible) and returned the index from ice_is_rxq_ready() when the state became consistent so I could view the actual delays required.

A few minute test starting and stopping pkt-gen instances on both ice0 and ice1 gave me the following data. Second column is delay in us; first column is count of rows.

4952 0
2561 10
 186 20
  92 40
  78 30
  22 50
   5 60
   2 130
   1 70
   1 90
   1 100
   1 120
   1 150
   1 190

So the default timeout of 50us does cover the majority of queue commands but in just a few minutes I observed the driver needing up to 190us for a queue to become consistent. This testing was performed on an AMD Threadripper 3990x supporting PCIe4.
Comment 1 Piotr Kubaj freebsd_committer freebsd_triage 2023-02-06 17:48:13 UTC
Can you provide a detailed procedure for reproduction?
Comment 2 Brian Poole 2023-02-09 19:09:56 UTC
(In reply to Piotr Kubaj from comment #1)

Hello and sorry for the delay.

I have attempted to reproduce this issue but no longer have access to that hardware from two years ago and have moved to newer versions of FreeBSD. This week I tested FreeBSD 12.3, 12.4, and 13.1 with the default limit of 5. I ran both ports of the card with 16 netmap queues per port. I restarted each pkt-gen multiple times, often while under a load of incoming packets, but could not cause a panic.

Besides the OS version, the card NVM firmware version has also changed. During the original issue I was using 2.15 while my card now has 2.50.
Comment 3 Eric Joyner freebsd_committer freebsd_triage 2023-02-09 19:25:32 UTC
It's possible then that this may have been a firmware issue that got fixed; the hardware spec indicates that the delay in the hardware should never exceed ~10us.
Comment 4 Piotr Kubaj freebsd_committer freebsd_triage 2023-02-09 19:28:43 UTC
Nice, then it could be that your issue was fixed in some version upgrade. Newer FreeBSD versions always brought newer ice versions.

BTW, right now the firmware is at version 4.0 or 4.1 (I'm not sure).

Since you can't reproduce it, I'm closing this issue for now. Reopen it, if it happens again.