Bug 227100 - [epair] epair interface stops working when it reaches the hardware queue limit
Summary: [epair] epair interface stops working when it reaches the hardware queue limit
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2018-03-30 04:21 UTC by Reshad Patuck
Modified: 2018-03-30 07:24 UTC (History)
2 users (show)

See Also:


Attachments
the output of netstat and dtrace (1.83 KB, text/plain)
2018-03-30 04:21 UTC, Reshad Patuck
no flags Details
add SDT in epair (3.62 KB, patch)
2018-03-30 04:25 UTC, Reshad Patuck
no flags Details | Diff
source of patched if_epair (28.86 KB, text/plain)
2018-03-30 04:26 UTC, Reshad Patuck
no flags Details
dtrace script to get enqueued error (182 bytes, text/plain)
2018-03-30 04:27 UTC, Reshad Patuck
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Reshad Patuck 2018-03-30 04:21:56 UTC
Created attachment 191964 [details]
the output of netstat and dtrace

When the epair interface reaches the hardware queue limit, epairs stop transferring data.

This bug refers to this mailing list conversation https://lists.freebsd.org/pipermail/freebsd-net/2018-March/050077.html

So far using the patch/if_epair source file attached to this bug we can tell that the error occurs in this block of code

```
	if ((epair_dpcpu->epair_drv_flags & IFF_DRV_OACTIVE) != 0) {
		/*
		 * Our hardware queue is full, try to fall back
		 * queuing to the ifq but do not call ifp->if_start.
		 * Either we are lucky or the packet is gone.
		 */
		IFQ_ENQUEUE(&ifp->if_snd, m, error);
		if (!error)
			(void)epair_add_ifp_for_draining(ifp);
		
		SDT_PROBE3(if_epair, transmit, epair_transmit_locked, enqueued,
				ifp, m, error);
		return (error);
	}
```

Where the value of the 'error' is 55.

Setting 'net.link.epair.netisr_maxqlen' to a very small value makes this occur faster.

This issue seems to be happening in the wild only on one of my servers.
Other servers under more load in different environments do not seem to exhibit this behaviour.

@Kristof please chime in if I have missed something out.

Attached:
- commands.txt
- epair-sdt-diff.patch 
- epair_transmit_locked:enqueued-error-code.d
- if_epair.c
Comment 1 Reshad Patuck 2018-03-30 04:25:28 UTC
Created attachment 191965 [details]
add SDT in epair

kristof's patch to apply dtrace SDT probes to if_epair
Comment 2 Reshad Patuck 2018-03-30 04:26:03 UTC
Created attachment 191966 [details]
source of patched if_epair
Comment 3 Reshad Patuck 2018-03-30 04:27:12 UTC
Created attachment 191967 [details]
dtrace script to get enqueued error