Bug 252518

Summary: iflib/ice panics when netmap/pkt-gen exits
Product: Base System Reporter: Brian Poole <brian90013>
Component: kernAssignee: Vincenzo Maffione <vmaffione>
Status: Closed FIXED    
Severity: Affects Only Me CC: emaste, vmaffione
Priority: --- Flags: vmaffione: maintainer-feedback+
vmaffione: mfc-stable12+
Version: 12.2-RELEASE   
Hardware: amd64   
OS: Any   
Bug Depends on:    
Bug Blocks: 252453    

Description Brian Poole 2021-01-08 14:33:27 UTC
Hello,

I am trying to use the Intel E810 ice driver in netmap mode on FreeBSD-12.2. I can successfully use pkt-gen to receive packets with: 'pkt-gen -i ice0 -f rx'.

However, if I try to run pkt-gen with multiple threads '-p 2', iflib panics when pkt-gen is shutting down. I see two different panics depending on whether there are packets streaming into the E810 when I stop pkt-gen or whether I have stopped all incoming packets before stopping pkt-gen.


I instrumented iflib.c with additional debugging:

Index: iflib.c
===================================================================
--- iflib.c	(revision 368515)
+++ iflib.c	(working copy)
@@ -875,6 +875,11 @@
 	iru_init(&iru, rxq, 0 /* flid */);
 	map = fl->ifl_sds.ifsd_map;
 	nic_i = fl->ifl_pidx;
+
+	device_t dev = ctx->ifc_dev;
+	device_printf(dev, "nic_i: %i, netmap_idx_k2n: %i, nm_i: %i, ring: %i\n",
+		nic_i, netmap_idx_k2n(kring, nm_i), nm_i, ring->ringid);
+
 	MPASS(nic_i == netmap_idx_k2n(kring, nm_i));
 	DBG_COUNTER_INC(fl_refills);
 	while (n > 0) {
@@ -915,6 +920,9 @@
 		ctx->isc_rxd_refill(ctx->ifc_softc, &iru);
 	}
 	fl->ifl_pidx = nic_i;
+
+	device_printf(dev, "nm_i: %i\n", nm_i);
+
 	MPASS(!init || nm_i == 0);
 	MPASS(nm_i == kring->rhead);
 	kring->nr_hwcur = nm_i;


## Case 1: No packets streaming into E810 when pkt-gen is terminated
Started transmitter
Started receiver: pkt-gen -i ice0 -f rx -p 2
Terminated transmitter
Gave pkt-gen CTRL-C

#### Run 1
ice0: nic_i: 720, netmap_idx_k2n: 720, nm_i: 720, ring: 0
ice0: nm_i: 723
ice0: nic_i: 176, netmap_idx_k2n: 176176 nm_i: 0, ring: 1
ice0: nm_i: 256
ice0: nic_i: 256, netmap_idx_k2n: 256, nm_i: 256, ring: 1
ice0: nm_i: 260
ice0: nic_i: 260, netmap_idx_k2n: 260, nm_i: 260, ring: 1
ice0: nm_i: 261
ice0: nic_i: 0, netmap_idx_k2n: 0, nm_i: 261, ring: 1
panic: Assertion !init || nm_i == 0 failed at /usr/src/sys/net/iflib.c:926

#### Run #2
ice0: nic_i: 184, netmap_idx_k2n: 184, nm_i: 184, ring: 1
ice0: nm_i: 230
ice0: nic_i: 980, netmap_idx_k2n: 980, nm_i: 980, ring: 0
ice0: nm_i: 984
ice0: nic_i: 984, netmap_idx_k2n: 984, nm_i: 984, ring: 0
ice0: nm_i: 985
ice0: nic_i: 0, netmap_idx_k2n: 0, nm_i: 230, ring: 1
ice0: nm_i: 230
panic: Assertion !init || nm_i == 0 failed at /usr/src/sys/net/iflib.c:926

#### Run #3
ice0: nic_i: 844, netmap_idx_k2n: 844, nm_i: 884, ring: 0
ice0: nm_i: 852
ice0: nic_i: 188, netmap_idx_k2n: 188, nm_i: 188, ring: 1
ice0: nm_i: 248
ice0: nic_i: 248, netmap_idx_k2n: 248, nm_i: 248, ring: 1
ice0: nm_i: 312
ice0: nic_i: 312, netmap_idx_k2n: 312, nm_i: 312, ring: 1
ice0: nm_i: 399
ice0: nic_i: 0, netmap_idx_k2n: 0, nm_i: 399, ring: 1
ice0: nm_i: 399
panic: Assertion !init || nm_i == 0 failed at /usr/src/sys/net/iflib.c: 926

The panic is iflib.c:918 in the FreeBSD-12.2 source (my debugging adjusted the line numbers). This code is in netmap_fl_refill().


## Case 2: Packets are still hitting the E810 when pkt-gen is terminated
Started transmitter
Started receiver: pkt-gen -i ice0 -f rx -p 2
Gave pkt-gen CTRL-C

## Run #1
ice0: nm_i: 160
ice0: nic_i: 0, netmap_idx_k2n: 960, nm_i: 160, ring: 1
panic: Assertion nic_i == netmap_idx_k2n(kring, nm_i) failed at /usr/src/sys/net/iflib:883

## Run #2
ice0: nm_i: 704
ice0: nic_i: 0, netmap_idx_k2n: 936, nm_i: 704, ring: 1
panic: Assertion nic_i == netmap_idx_k2n(kring, nm_i) failed at /usr/src/sys/net/iflib:883

## Run #3
ice0: nic_i: 264, netmap_idx_k2n: 264, nm_i: 264, ring: 1
ice0: nm_i: 360
ice0: nic_i: 0, netmap_idx_k2n: 948, nm_i: 360, ring: 1
ice0: nm_i: 0
panic: Assertion nic_i == netmap_idx_k2n(kring, nm_i) failed at /usr/src/sys/net/iflib:883

This panic is iflib:878 in the 12.2 sources. The stack trace I saw was:
netmap_fl_refill()
iflib_init_locked()
iflib_netmap_register()
netmap_hw_reg()
netmap_do_unregif()
netmap_dtor()
Comment 1 commit-hook freebsd_committer freebsd_triage 2021-01-09 21:43:27 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3189ba61673af5bd3ed2ee9e33be92bc0b9995ba

commit 3189ba61673af5bd3ed2ee9e33be92bc0b9995ba
Author:     Vincenzo Maffione <vmaffione@FreeBSD.org>
AuthorDate: 2021-01-09 21:35:07 +0000
Commit:     Vincenzo Maffione <vmaffione@FreeBSD.org>
CommitDate: 2021-01-09 21:35:07 +0000

    netmap: iflib: fix asserts in netmap_fl_refill()

    When netmap_fl_refill() is called at initialization time (e.g.,
    during netmap_iflib_register()), nic_i must be 0, since the
    free list is reinitialized. At the end of the refill cycle, nic_i
    must still be zero, because exactly N descriptors (N is the ring size)
    are refilled.
    This patch therefore fixes the assertions to check on nic_i rather
    than on nm_i. The current netmap_reset() may in fact cause nm_i
    to be != 0 while the device is resetting: this may happen when
    multiple non-cooperating processes open different subsets of the
    available netmap rings.

    PR:         252518
    MFC after:  1 week

 sys/net/iflib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 2 commit-hook freebsd_committer freebsd_triage 2021-01-09 22:18:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7ba6ecf216fb15e8b147db268b91d9b82c5a0682

commit 7ba6ecf216fb15e8b147db268b91d9b82c5a0682
Author:     Vincenzo Maffione <vmaffione@FreeBSD.org>
AuthorDate: 2021-01-09 22:07:24 +0000
Commit:     Vincenzo Maffione <vmaffione@FreeBSD.org>
CommitDate: 2021-01-09 22:07:24 +0000

    netmap: refactor netmap_reset

    The netmap_reset() function is meant to be called by the driver
    when they initialize (or re-initialize) a hardware ring.
    However, since the introduction of support for opening (in
    netmap mode) a subset of the available rings, netmap_reset()
    may be called multiple times on actively used rings, causing
    both kring and netmap ring to transition to an inconsistent
    state.
    This changes improves the situation by resetting all the
    indices fields of the kring to 0, as expected after the
    reinitialization of a hardware ring.

    PR:         252518
    MFC after:  1 week

 sys/dev/netmap/netmap.c | 65 +++++++++++++++----------------------------------
 1 file changed, 20 insertions(+), 45 deletions(-)
Comment 3 Vincenzo Maffione freebsd_committer freebsd_triage 2021-01-09 22:22:30 UTC
Thanks for reporting.
Could you please retry with the current HEAD?
(or maybe on 12.x after applying 1d238b07d5d4d9660ae0e08daede6da7e91c7853, 3189ba61673af5bd3ed2ee9e33be92bc0b9995ba and 7ba6ecf216fb15e8b147db268b91d9b82c5a0682)?
Comment 4 Brian Poole 2021-01-11 14:27:31 UTC
Thank you for the commits this weekend!

After applying the three patches above, I can verify that running pkt-gen with multiple queues no longer panics in iflib on exit. I tested with 2-16 netmap threads and observed successful processing and termination in each case.

I believe this issue is now fixed.
Comment 5 Vincenzo Maffione freebsd_committer freebsd_triage 2021-01-11 21:08:06 UTC
Thanks for the confirmation.
Comment 6 Vincenzo Maffione freebsd_committer freebsd_triage 2021-01-11 21:11:09 UTC
MFC needed
Comment 7 commit-hook freebsd_committer freebsd_triage 2021-01-17 13:35:10 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9882b83132007acc0e6cef6d248fb12cdebba278

commit 9882b83132007acc0e6cef6d248fb12cdebba278
Author:     Vincenzo Maffione <vmaffione@FreeBSD.org>
AuthorDate: 2021-01-09 20:54:11 +0000
Commit:     Vincenzo Maffione <vmaffione@FreeBSD.org>
CommitDate: 2021-01-17 12:02:44 +0000

    netmap: iflib: stop krings during interface reset

    When different processes open separate subsets of the
    available rings of a same netmap interface, a device
    reset may be performed while one of the processes
    is actively using some rings (e.g., caused by another
    process executing a nmport_open()).
    With this patch, such situation will cause the
    active process to get a POLLERR, so that it can
    have a chance to detect the situation.
    We also guarantee that no process is running a txsync
    or rxsync (ioctl or poll) while an iflib device reset
    is in progress.

    PR:         252453
    MFC after:  1 week

    (cherry picked from commit 1d238b07d5d4d9660ae0e08daede6da7e91c7853)

    netmap: iflib: fix asserts in netmap_fl_refill()

    When netmap_fl_refill() is called at initialization time (e.g.,
    during netmap_iflib_register()), nic_i must be 0, since the
    free list is reinitialized. At the end of the refill cycle, nic_i
    must still be zero, because exactly N descriptors (N is the ring size)
    are refilled.
    This patch therefore fixes the assertions to check on nic_i rather
    than on nm_i. The current netmap_reset() may in fact cause nm_i
    to be != 0 while the device is resetting: this may happen when
    multiple non-cooperating processes open different subsets of the
    available netmap rings.

    PR:         252518
    MFC after:  1 week

    (cherry picked from commit 3189ba61673af5bd3ed2ee9e33be92bc0b9995ba)

    netmap: refactor netmap_reset

    The netmap_reset() function is meant to be called by the driver
    when they initialize (or re-initialize) a hardware ring.
    However, since the introduction of support for opening (in
    netmap mode) a subset of the available rings, netmap_reset()
    may be called multiple times on actively used rings, causing
    both kring and netmap ring to transition to an inconsistent
    state.
    This changes improves the situation by resetting all the
    indices fields of the kring to 0, as expected after the
    reinitialization of a hardware ring.

    PR:         252518
    MFC after:  1 week

    (cherry picked from commit 7ba6ecf216fb15e8b147db268b91d9b82c5a0682)

    netmap: iflib: enable/disable krings on any interface reinit

    Since 1d238b07d5d4d9660ae0e0, krings are disabled before
    a reinit cycle triggered by iflib_netmap_register.
    However, this operation is actually necessary also for
    any interface reinit triggered by other causes (i.e.,
    ifconfig commands).
    We achieve this goal by moving the krings enable/disable
    operation inside iflib_stop() and iflib_init_locked().

    Once here, this change also removes some redundant operations
    from iflib_netmap_register(), that are already performed by
    iflib_stop().

    PR:             252453
    MFC after:      1 week

    (cherry picked from commit 3d65fd97e85ab807f3baa62aa979ae527914a3ea)

    iflib: fix build failure in case DEV_NETMAP is not defined

    This addresses the build failure introduced by
    3d65fd97e85ab807f3baa62.

    MFC with: 3d65fd97e85ab807f3baa62

    (cherry picked from commit 8aa8484cbf06bb26ad87177fe4aebcaf1519f138)

    netmap: restore hwofs and support it in iflib

    Restore the hwofs functionality temporarily disabled by
    7ba6ecf216fb15e8b147db2 to prevent issues with iflib.
    This patch brings the necessary changes to iflib to
    enable howfs to allow interface restarts without
    disrupting netmap applications actively using its
    rings.
    After this change, it becomes possible for multiple
    non-cooperating netmap applications to use non-overlapping
    subsets of the available netmap rings without clashing
    with each other.

    PR:             252453
    MFC after:      1 week

    (cherry picked from commit 55f0ad5fdee1a779d6889481ba591a819081b9ca)

 sys/dev/netmap/netmap.c | 94 +++++++++++++++++++++++--------------------------
 sys/net/iflib.c         | 67 +++++++++++++++++++++++++----------
 2 files changed, 94 insertions(+), 67 deletions(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2021-01-17 13:35:11 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9882b83132007acc0e6cef6d248fb12cdebba278

commit 9882b83132007acc0e6cef6d248fb12cdebba278
Author:     Vincenzo Maffione <vmaffione@FreeBSD.org>
AuthorDate: 2021-01-09 20:54:11 +0000
Commit:     Vincenzo Maffione <vmaffione@FreeBSD.org>
CommitDate: 2021-01-17 12:02:44 +0000

    netmap: iflib: stop krings during interface reset

    When different processes open separate subsets of the
    available rings of a same netmap interface, a device
    reset may be performed while one of the processes
    is actively using some rings (e.g., caused by another
    process executing a nmport_open()).
    With this patch, such situation will cause the
    active process to get a POLLERR, so that it can
    have a chance to detect the situation.
    We also guarantee that no process is running a txsync
    or rxsync (ioctl or poll) while an iflib device reset
    is in progress.

    PR:         252453
    MFC after:  1 week

    (cherry picked from commit 1d238b07d5d4d9660ae0e08daede6da7e91c7853)

    netmap: iflib: fix asserts in netmap_fl_refill()

    When netmap_fl_refill() is called at initialization time (e.g.,
    during netmap_iflib_register()), nic_i must be 0, since the
    free list is reinitialized. At the end of the refill cycle, nic_i
    must still be zero, because exactly N descriptors (N is the ring size)
    are refilled.
    This patch therefore fixes the assertions to check on nic_i rather
    than on nm_i. The current netmap_reset() may in fact cause nm_i
    to be != 0 while the device is resetting: this may happen when
    multiple non-cooperating processes open different subsets of the
    available netmap rings.

    PR:         252518
    MFC after:  1 week

    (cherry picked from commit 3189ba61673af5bd3ed2ee9e33be92bc0b9995ba)

    netmap: refactor netmap_reset

    The netmap_reset() function is meant to be called by the driver
    when they initialize (or re-initialize) a hardware ring.
    However, since the introduction of support for opening (in
    netmap mode) a subset of the available rings, netmap_reset()
    may be called multiple times on actively used rings, causing
    both kring and netmap ring to transition to an inconsistent
    state.
    This changes improves the situation by resetting all the
    indices fields of the kring to 0, as expected after the
    reinitialization of a hardware ring.

    PR:         252518
    MFC after:  1 week

    (cherry picked from commit 7ba6ecf216fb15e8b147db268b91d9b82c5a0682)

    netmap: iflib: enable/disable krings on any interface reinit

    Since 1d238b07d5d4d9660ae0e0, krings are disabled before
    a reinit cycle triggered by iflib_netmap_register.
    However, this operation is actually necessary also for
    any interface reinit triggered by other causes (i.e.,
    ifconfig commands).
    We achieve this goal by moving the krings enable/disable
    operation inside iflib_stop() and iflib_init_locked().

    Once here, this change also removes some redundant operations
    from iflib_netmap_register(), that are already performed by
    iflib_stop().

    PR:             252453
    MFC after:      1 week

    (cherry picked from commit 3d65fd97e85ab807f3baa62aa979ae527914a3ea)

    iflib: fix build failure in case DEV_NETMAP is not defined

    This addresses the build failure introduced by
    3d65fd97e85ab807f3baa62.

    MFC with: 3d65fd97e85ab807f3baa62

    (cherry picked from commit 8aa8484cbf06bb26ad87177fe4aebcaf1519f138)

    netmap: restore hwofs and support it in iflib

    Restore the hwofs functionality temporarily disabled by
    7ba6ecf216fb15e8b147db2 to prevent issues with iflib.
    This patch brings the necessary changes to iflib to
    enable howfs to allow interface restarts without
    disrupting netmap applications actively using its
    rings.
    After this change, it becomes possible for multiple
    non-cooperating netmap applications to use non-overlapping
    subsets of the available netmap rings without clashing
    with each other.

    PR:             252453
    MFC after:      1 week

    (cherry picked from commit 55f0ad5fdee1a779d6889481ba591a819081b9ca)

 sys/dev/netmap/netmap.c | 94 +++++++++++++++++++++++--------------------------
 sys/net/iflib.c         | 67 +++++++++++++++++++++++++----------
 2 files changed, 94 insertions(+), 67 deletions(-)