Bug 253172 - Intel e1000 - Interface Stalls After Media Type is Changed
Summary: Intel e1000 - Interface Stalls After Media Type is Changed
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2021-02-01 22:10 UTC by jcaplan
Modified: 2021-02-11 15:35 UTC (History)
2 users (show)

See Also:


Attachments
proposed patch (396 bytes, patch)
2021-02-11 15:35 UTC, jcaplan
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description jcaplan 2021-02-01 22:10:06 UTC
# Overview

The e1000 interface stalls occasionally after the media type is changed. The only way of recovering it is to flap the interface (ifconfig down then up).

# Steps to Reproduce

1. Bring up an e1000 interface on a QNX target.

2. Run a continuous ping to the target e1000 interface from a directly connected neighbor.

3. Change the media type "ifconfig em0 media 100baseTX mediaopt full-duplex".

4. Sometimes you need to change the media type more than once e.g. "ifconfig em0 media 1000baseT mediaopt full-duplex"

# Actual Results

Ping stops reaching target.

64 bytes from 10.242.241.213: icmp_seq=1 ttl=64 time=0.707 ms
64 bytes from 10.242.241.213: icmp_seq=2 ttl=64 time=0.721 ms
64 bytes from 10.242.241.213: icmp_seq=3 ttl=64 time=0.649 ms
64 bytes from 10.242.241.213: icmp_seq=4 ttl=64 time=0.768 ms
64 bytes from 10.242.241.213: icmp_seq=1 ttl=64 time=8268 ms (DUP!)
64 bytes from 10.242.241.213: icmp_seq=3 ttl=64 time=6219 ms (DUP!)
64 bytes from 10.242.241.213: icmp_seq=9 ttl=64 time=76.1 ms
64 bytes from 10.242.241.213: icmp_seq=10 ttl=64 time=0.541 ms
64 bytes from 10.242.241.213: icmp_seq=11 ttl=64 time=0.604 ms
64 bytes from 10.242.241.213: icmp_seq=12 ttl=64 time=0.754 ms
64 bytes from 10.242.241.213: icmp_seq=13 ttl=64 time=1.02 ms
64 bytes from 10.242.241.213: icmp_seq=14 ttl=64 time=0.676 ms
64 bytes from 10.242.241.213: icmp_seq=15 ttl=64 time=0.609 ms
64 bytes from 10.242.241.213: icmp_seq=16 ttl=64 time=0.591 ms
64 bytes from 10.242.241.213: icmp_seq=1 ttl=64 time=20397 ms (DUP!)
64 bytes from 10.242.241.213: icmp_seq=3 ttl=64 time=18349 ms (DUP!)
64 bytes from 10.242.241.213: icmp_seq=10 ttl=64 time=11204 ms (DUP!)
64 bytes from 10.242.241.213: icmp_seq=12 ttl=64 time=9165 ms (DUP!)
64 bytes from 10.242.241.213: icmp_seq=13 ttl=64 time=8141 ms (DUP!)
64 bytes from 10.242.241.213: icmp_seq=16 ttl=64 time=5101 ms (DUP!)
From 10.242.241.212 icmp_seq=50 Destination Host Unreachable
From 10.242.241.212 icmp_seq=51 Destination Host Unreachable
From 10.242.241.212 icmp_seq=52 Destination Host Unreachable
From 10.242.241.212 icmp_seq=53 Destination Host Unreachable
From 10.242.241.212 icmp_seq=54 Destination Host Unreachable


# Expected Results

Ping continues to reach target (possibly some packets dropped). For instance, in the above log, the first 2 DUPs happen when the interface media is changed but service resumes, then I changed it a second time and there are a lot more DUPs and never starts working again.


# Build Date
FreeBSD bsd-vbox 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r368820: Tue Jan  5 17:30:19 EST 2021     jcaplan@bsd-vbox:/usr/obj/usr/src-head/amd64.amd64/sys/GENERIC  amd64
Comment 1 jcaplan 2021-02-01 22:12:42 UTC
(In reply to jcaplan from comment #0)

Note "Bring up an e1000 interface on a QNX target" should read "FreeBSD Target", I copy pasted from our internal bug tracker.
Comment 2 jcaplan 2021-02-10 17:44:08 UTC
I have observed an underflow in iflib_completed_tx_reclaim where

txq->ift_in_use < reclaim

and as a result

txq->ift_in_use -= reclaim

wraps around to a very high number. once you bring the interface down and up you reset txq->ift_in_use and everything is fine again.

For e1000, changing media type appears to reset the hardware, so might be best to avoid the situation in the first place with something like

static int
em_if_media_change(if_ctx_t ctx)
{
	struct adapter *adapter = iflib_get_softc(ctx);
	struct ifmedia *ifm = iflib_get_media(ctx);

	if_t ifp = iflib_get_ifp(ctx);
	if (ifp->if_flags & IFF_UP) {
		device_printf(iflib_get_dev(ctx),
		    "%s: cannot change media type while link state is up\n",
		    __func__);
		return EINVAL;
Comment 3 jcaplan 2021-02-11 15:35:43 UTC
Created attachment 222368 [details]
proposed patch

after a bit more investigating, it looks like iflib_media_change should be calling iflib_stop() before changing media type. then it's not necessary to force the user to down the interface. Attached diff resolves issue for me.