Bug 225408 - re_rxeof erroneously returns EAGAIN, resulting in unnecessary processing
Summary: re_rxeof erroneously returns EAGAIN, resulting in unnecessary processing
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-23 18:14 UTC by aaron.styx
Modified: 2018-02-07 20:53 UTC (History)
2 users (show)

See Also:


Attachments
if (maxpkt == 0) return EAGAIN; (250 bytes, patch)
2018-01-23 18:14 UTC, aaron.styx
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description aaron.styx 2018-01-23 18:14:44 UTC
Created attachment 190006 [details]
if (maxpkt == 0) return EAGAIN;

In trying to diagnose a separate issue, I discovered that the interrupt processing of re_int_task was leaving interrupts disabled far more often than seemed appropriate. The return value from re_rxeof was returning EAGAIN when handling stray packets on the network, and returning 0 when under heavy traffic load.

If my understanding of the re_rxeof function is correct, it is supposed to process at most 16 packets before returning. The variable maxpkt starts at 16 and is decremented each time a packet is processed. If maxpkt gets to zero, the routine returns and *should* tell the caller whether or not there is more data to process. However, the check of 'if (maxpkt) return EAGAIN;' will return EAGAIN when the loop has cleared the entire receive ring before processing 16 packets; i.e.: when it's done processing everything available. I believe it should be returning EAGAIN only when it has processed 16 packets and there is still more to do. 

The attached patch changes the check to return EAGAIN when maxpkt reaches 0. If there were exactly 16 packets to process, this could mean it would also erroneously return EAGAIN, so there may be a better way to check if the ring is empty. 

The driver still works as is, but it does spend more time processing than is necessary when there is really nothing to do. Under load it may not enqueue the processing task when it should, but further interrupts should generally keep things moving.
Comment 1 Mark Johnston freebsd_committer 2018-02-07 20:53:26 UTC
I also noticed this while adding netdump hooks to re(4). The code seems broken even with your patch applied.

In the legacy interrupt task handler we have:

2580         status = CSR_READ_2(sc, RL_ISR);                                                                                                                 
2581         CSR_WRITE_2(sc, RL_ISR, status);
...
2596         if (status & (RL_ISR_RX_OK|RL_ISR_RX_ERR|RL_ISR_FIFO_OFLOW))                                                                                     
2597                 rval = re_rxeof(sc, NULL);
...
2629         if ((CSR_READ_2(sc, RL_ISR) & RL_INTRS_CPLUS) || rval) {                                                                                         
2630                 taskqueue_enqueue(taskqueue_fast, &sc->rl_inttask);                                                                                      
2631                 return;                                                                                                                                  
2632         }                                                                                                                                                
2633                                                                                                                                                          
2634         CSR_WRITE_2(sc, RL_IMR, RL_INTRS_CPLUS);

with rval initialized to 0.

So, we initially clear the bits in the ISR by writing status back, and pull packets off the rx ring if we had received a packet. If another interrupt came in while processing, or rx_rxeof() returned EAGAIN, we'll reschedule the task handler, else we're done and we unmask interrupts. In the case where rx_rxeof() hit the processing limit, though, we'll only call rx_rxeof() again if another rx interrupt was raised. So it seems that this EAGAIN check really isn't accomplishing much.