Bug 253473

Summary: Possible bug in iflib, observed with vmxnet3 driver on ESXI servers
Product: Base System Reporter: Shrikanth R K <shrikanth07>
Component: kernAssignee: Allan Jude <allanjude>
Status: Closed FIXED    
Severity: Affects Some People CC: dpetrov67, net
Priority: --- Flags: koobs: mfc-stable13+
koobs: mfc-stable12+
Version: 12.1-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Boot log from Juniper VSRX box none

Description Shrikanth R K 2021-02-13 02:49:55 UTC
Created attachment 222401 [details]
Boot log from Juniper VSRX box

Possible bug with iflib code observed with VMXNET3 driver. On a 12.1 RELEASE FreeBSD VM running on ESXi server, doing a "ifconfig vmx0 media autoselect" renders the interface unusable. You can have a ping going to the interface from outside of VM, it stops when you do a media autoselect using ifconfig. 
My real objective was to track the way iflib processes media change so the ifconfig command wasn't a specific need for me. The original instance of this problem was on a Juniper VSRX box running Junos running FreeBSD stable/12 (12.1) underneath. I actually reached out to bryanv@freebsd.org on the problem before I verified it on pristine FreeBSD stable/12 (Junos customizes FreeBSD drivers to work with it's network stack). 

I guess when iflib layer is processing iflib_media_change() it calls iflib_init_locked() while the driver is running and does not bother to check the state while it attempts some of the initialization again which leaves the interface in an inconsistent state. 

root@freebsd:~ #ifconfig vmx0 media autoselect
vmx0: device enable command failed!

To recover you have to flap the interface, ifconfig vmx0 down and then up recovers the interface.

original observation of the issue and the email I sent to Bryan Venteicher
--------------------------------------------------------------------------

Hello Bryan,
 
I need to clarify a certain behavior with vmxnet3 driver and iflib on FreeBSD stable/12.
 
The background on this is during vmxnet3 driver initialization probe/attach happens fine and interface initialization is all good. A subsequent reconfigure on the interface does the following call flow
iflib_if_init → iflib_stop → vmxnet3_stop → iflib_init_locked 
What that does underneath is a proper VMXNET3_CMD_DISABLE, VMXNET3_CMD_RESET and eventually VMXNET3_CMD_ENABLE. No problem there.
 
But on a subsequent ioctl (SIOCSIFMEDIA) it does a VMXNET3_CMD_ENABLE again on an already enabled device which fails and renders the interface in a non working state.  The call flow seems to be 
 
Ifmedia_ioctl → ifm->ifm_change → iflib_media_change → vmxnet3_media_change.
 
vmxnet3_media_change return 0→ iflib_init_locked → vmxnet3_init →vmxnet3_enable_device
 
The problem seems iflib_init_locked() does *not* care to do a iflib_stop() and vmxnet3_media_change() being a dummy function just returns 0 indicating success to iflib_media_change so it proceeds to call iflib_init_locked() without calling iflib_stop(). 
Herein lies the problem and my query, iflib_init_locked ends up calling vmxnet3_init() → vmxnet3_enable_device() which means doing VMXNET3_CMD_ENABLE a second time, the subsequent one resulting in a failure as indicated by the log “device enable command failed!”. (Refer the logs below captured from a boot).
 
The workaround to recover the interface is to flap it i.e ifconfig fxp0 down / up to recover.
 
Another experiment was to do a VMXNET3_CMD_DISABLE in vmxnet3_media_change to “support” the subsequent VMXNET3_CMD_ENABLE from iflib_init_locked. I tried looking around for documentation/datasheet to confirm this behavior but sadly no direct hits. Possibly look at the Linux side of things, not sure but reaching out to you for help.
 
Note that the environment is Junos (Juniper OS) based on FreeBSD stable/12 for one of our products which is an instance running on ESXI servers.
 
Below are the logs from the boot of a VSRX device. The ones marked with ### are some device_printfs added to capture the call flow along with IFMEDIA_DEBUG enabled. 
 
My question is should there be a sequence of VMXNET3_CMD_DISABLE / VMXNET3_CMD_RESET before calling vmxnet3_media_change?
 

Vmxnet3 logs from boot (complete boot log is attached)

------------------------------------------------------

vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x4000-0x400f mem 0xfd5fc000-0xfd5fcfff,0xfd5fd000-0xfd5fdfff,0xfd5fe000-0xfd5fffff irq 18 at device 0.0 on pci3
 
vmx0: Using 512 TX descriptors and 512 RX descriptors
vmx0: Using 4 RX queues 4 TX queues
vmx0: failed to allocate 5 MSI-X vectors, err: 6
vmx0: Using an MSI interrupt
vmx0: LRO is disabled
 
Adding entry for Ethernet autoselect
ifmedia_set: target Ethernet autoselect
ifmedia_set: setting to Ethernet autoselect
 
Reconfiguring vmx0 to fxp0 ...
vmx0: ### iflib_if_init: called
vmx0: ### iflib_if_init_locked: called
vmx0: ### iflib_stop: called
vmx0: ### vmxnet3_stop: called!
vmx0: ### iflib_init_locked: called
vmx0: ### vmxnet3_init: called!
vmx0: ### vmxnet3_reinit_shared_data: mtu: 1514.
vmx0: ### vmxnet3_reinit_queues: called!
vmx0: ### vmxnet3_txinit: called!
vmx0: ### vmxnet3_rxinit: called!
vmx0: ### vmxnet3_enable_device: called!
vmx0: ### vmxnet3_reinit_rxfilters: called!
vmx0: ### iflib_if_ioctl: called: command: 2152753457
vmx0: ### vmxnet3_multi_set: called!
vmx0: ### iflib_if_ioctl: called: command: 3226495287
ifmedia_ioctl: switching fxp0 to Ethernet autoselect
vmx0: ### iflib_media_change: called
vmx0: ### vmxnet3_media_change: called!
vmx0: ### iflib_init_locked: called
vmx0: ### vmxnet3_init: called!
vmx0: ### vmxnet3_reinit_shared_data: mtu: 1514.
vmx0: ### vmxnet3_reinit_queues: called!
vmx0: ### vmxnet3_txinit: called!
vmx0: ### vmxnet3_rxinit: called!
vmx0: ### vmxnet3_reinit_rxfilters: called!
vmx0: device enable command failed!
vmx0: ### iflib_if_ioctl: called: command: 3227543864
vmx0: ### vmxnet3_update_admin_status: called!
vmx0: ### vmxnet3_media_status: called!
vmx0: ### iflib_if_ioctl: called: command: 2152753424
vmx0: ### iflib_if_ioctl: called: command: 2152753457
vmx0: ### vmxnet3_multi_set: called!
Comment 1 commit-hook freebsd_committer freebsd_triage 2021-02-16 19:02:49 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=922cf8ac43adc9983f9a9e05cfd838306c1ef483

commit 922cf8ac43adc9983f9a9e05cfd838306c1ef483
Author:     Allan Jude <allanjude@FreeBSD.org>
AuthorDate: 2021-02-14 18:39:09 +0000
Commit:     Allan Jude <allanjude@FreeBSD.org>
CommitDate: 2021-02-16 19:02:00 +0000

    Use iflib_if_init_locked() during media change instead of iflib_init_locked().

    iflib_init_locked() assumes that iflib_stop() has been called, however,
    it is not called for media changes.
    iflib_if_init_locked() calls stop then init, so fixes the problem.

    PR:     253473
    MFC after:      3 days
    Reviewed by:    markj
    Sponsored by:   Juniper Networks, Inc., Klara, Inc.
    Differential Revision:  https://reviews.freebsd.org/D28667

 sys/net/iflib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 2 commit-hook freebsd_committer freebsd_triage 2021-02-19 20:45:01 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=18f552e647f556dcbbebc627f534d15172e7b6a3

commit 18f552e647f556dcbbebc627f534d15172e7b6a3
Author:     Allan Jude <allanjude@FreeBSD.org>
AuthorDate: 2021-02-14 18:39:09 +0000
Commit:     Allan Jude <allanjude@FreeBSD.org>
CommitDate: 2021-02-19 20:39:16 +0000

    Use iflib_if_init_locked() during media change instead of iflib_init_locked().

    iflib_init_locked() assumes that iflib_stop() has been called, however,
    it is not called for media changes.
    iflib_if_init_locked() calls stop then init, so fixes the problem.

    PR:     253473
    Sponsored by:   Juniper Networks, Inc., Klara, Inc.

    (cherry picked from commit 922cf8ac43adc9983f9a9e05cfd838306c1ef483)

 sys/net/iflib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 3 commit-hook freebsd_committer freebsd_triage 2021-02-19 21:07:05 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=57d2eb8896c2dd9ece7d99415893e2e8a8bfce82

commit 57d2eb8896c2dd9ece7d99415893e2e8a8bfce82
Author:     Allan Jude <allanjude@FreeBSD.org>
AuthorDate: 2021-02-14 18:39:09 +0000
Commit:     Allan Jude <allanjude@FreeBSD.org>
CommitDate: 2021-02-19 21:06:15 +0000

    Use iflib_if_init_locked() during media change instead of iflib_init_locked().

    iflib_init_locked() assumes that iflib_stop() has been called, however,
    it is not called for media changes.
    iflib_if_init_locked() calls stop then init, so fixes the problem.

    PR:     253473
    Sponsored by:   Juniper Networks, Inc., Klara, Inc.

    (cherry picked from commit 922cf8ac43adc9983f9a9e05cfd838306c1ef483)

 sys/net/iflib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 4 commit-hook freebsd_committer freebsd_triage 2021-02-25 20:42:07 UTC
A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1eb60112b22ea0d61f0d1ba4ac0cca56a0c034c8

commit 1eb60112b22ea0d61f0d1ba4ac0cca56a0c034c8
Author:     Allan Jude <allanjude@FreeBSD.org>
AuthorDate: 2021-02-14 18:39:09 +0000
Commit:     Allan Jude <allanjude@FreeBSD.org>
CommitDate: 2021-02-25 20:41:10 +0000

    Use iflib_if_init_locked() during media change instead of iflib_init_locked().

    iflib_init_locked() assumes that iflib_stop() has been called, however,
    it is not called for media changes.
    iflib_if_init_locked() calls stop then init, so fixes the problem.

    PR:     253473
    Sponsored by:   Juniper Networks, Inc., Klara, Inc.
    Approved by:    re (gjb)

    (cherry picked from commit 922cf8ac43adc9983f9a9e05cfd838306c1ef483)

 sys/net/iflib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 5 Andriy Gapon freebsd_committer freebsd_triage 2021-10-25 15:37:38 UTC
(In reply to commit-hook from comment #4)
Allan, should this actually be marked as resolved?
Comment 6 Allan Jude freebsd_committer freebsd_triage 2021-12-04 01:21:00 UTC
Yes, this is resolved, and shipped in 13.0 and 12.3