Hello, I have been testing an Intel E810-CQDA 2x100GbE network card using FreeBSD-12.2 (GERNRIC+INVARIANTS). When using both ports in netmap mode with 16 queues, I have observed a panic after seeing messages like: ice1: Rx queue 13 disable timeout (seen on both ice0 and ice1, different queue numbers) print to the console. Digging into the cause of that error message, I discovered it is printed when ice_is_rxq_ready() returns ETIMEDOUT because the queue is not in a consistent state after looping ICE_Q_WAIT_RETRY_LIMIT(5) times with a delay of 10us each loop. For testing, I increased ICE_Q_WAIT_RETRY_LIMIT to 500 (total delay of 5ms possible) and returned the index from ice_is_rxq_ready() when the state became consistent so I could view the actual delays required. A few minute test starting and stopping pkt-gen instances on both ice0 and ice1 gave me the following data. Second column is delay in us; first column is count of rows. 4952 0 2561 10 186 20 92 40 78 30 22 50 5 60 2 130 1 70 1 90 1 100 1 120 1 150 1 190 So the default timeout of 50us does cover the majority of queue commands but in just a few minutes I observed the driver needing up to 190us for a queue to become consistent. This testing was performed on an AMD Threadripper 3990x supporting PCIe4.
Can you provide a detailed procedure for reproduction?
(In reply to Piotr Kubaj from comment #1) Hello and sorry for the delay. I have attempted to reproduce this issue but no longer have access to that hardware from two years ago and have moved to newer versions of FreeBSD. This week I tested FreeBSD 12.3, 12.4, and 13.1 with the default limit of 5. I ran both ports of the card with 16 netmap queues per port. I restarted each pkt-gen multiple times, often while under a load of incoming packets, but could not cause a panic. Besides the OS version, the card NVM firmware version has also changed. During the original issue I was using 2.15 while my card now has 2.50.
It's possible then that this may have been a firmware issue that got fixed; the hardware spec indicates that the delay in the hardware should never exceed ~10us.
Nice, then it could be that your issue was fixed in some version upgrade. Newer FreeBSD versions always brought newer ice versions. BTW, right now the firmware is at version 4.0 or 4.1 (I'm not sure). Since you can't reproduce it, I'm closing this issue for now. Reopen it, if it happens again.