Bug 239118 - iflib: Panic in ether_output_frame on ESXi
Summary: iflib: Panic in ether_output_frame on ESXi
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Mark Johnston
URL: https://reviews.freebsd.org/D21831
Keywords: crash
Depends on:
Blocks: 240700
  Show dependency treegraph
 
Reported: 2019-07-10 19:00 UTC by Juraj Lutter
Modified: 2019-10-04 02:26 UTC (History)
11 users (show)

See Also:
koobs: mfc-stable12+
koobs: mfc-stable11-


Attachments
savecore crash description (263.75 KB, text/plain)
2019-07-10 20:15 UTC, Juraj Lutter
no flags Details
proposed patch (606 bytes, patch)
2019-07-18 20:19 UTC, Mark Johnston
no flags Details | Diff
patch (2.16 KB, patch)
2019-08-28 17:57 UTC, Marius Strobl
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Juraj Lutter freebsd_committer freebsd_triage 2019-07-10 19:00:43 UTC
There is a reocurring panic with 12-STABLE (at least r349857) running as an ESXi guest:

Jul 10 19:11:45 svc1 kernel: fault code         = supervisor read data, page not present
Jul 10 19:11:45 svc1 kernel: instruction pointer        = 0x20:0xffffffff811a7f0c
Jul 10 19:11:45 svc1 kernel: stack pointer              = 0x28:0xfffffe004c4628f0
Jul 10 19:11:45 svc1 kernel: frame pointer              = 0x28:0xfffffe004c4628f0
Jul 10 19:11:45 svc1 kernel: code segment               = base rx0, limit 0xfffff, type 0x1b
Jul 10 19:11:45 svc1 kernel:                    = DPL 0, pres 1, long 1, def32 0, gran 1
Jul 10 19:11:45 svc1 kernel: processor eflags   = resume, IOPL = 0
Jul 10 19:11:45 svc1 kernel: current process            = 17606 (imap-login)
Jul 10 19:11:45 svc1 kernel: trap number                = 12
Jul 10 19:11:45 svc1 kernel: panic: page fault
Jul 10 19:11:45 svc1 kernel: cpuid = 0
Jul 10 19:11:45 svc1 kernel: time = 1562778547
Jul 10 19:11:45 svc1 kernel: KDB: stack backtrace:
Jul 10 19:11:45 svc1 kernel: #0 0xffffffff80c18257 at kdb_backtrace+0x67
Jul 10 19:11:45 svc1 kernel: #1 0xffffffff80bcbbdd at vpanic+0x19d
Jul 10 19:11:45 svc1 kernel: #2 0xffffffff80bcba33 at panic+0x43
Jul 10 19:11:45 svc1 kernel: #3 0xffffffff810a04bc at trap_fatal+0x39c
Jul 10 19:11:45 svc1 kernel: #4 0xffffffff810a0509 at trap_pfault+0x49
Jul 10 19:11:45 svc1 kernel: #5 0xffffffff8109faff at trap+0x29f
Jul 10 19:11:45 svc1 kernel: #6 0xffffffff8107a7e5 at calltrap+0x8
Jul 10 19:11:45 svc1 kernel: #7 0xffffffff80ce48a3 at iflib_fast_intr_rxtx+0xc3
Jul 10 19:11:45 svc1 kernel: #8 0xffffffff80b8f63a at intr_event_handle+0xba
Jul 10 19:11:45 svc1 kernel: #9 0xffffffff8120bff7 at intr_execute_handlers+0x57
Jul 10 19:11:45 svc1 kernel: #10 0xffffffff81212133 at lapic_handle_intr+0x43
Jul 10 19:11:45 svc1 kernel: #11 0xffffffff8107baa9 at Xapic_isr1+0xd9
Jul 10 19:11:45 svc1 kernel: #12 0xffffffff80ce67c2 at iflib_completed_tx_reclaim+0x62
Jul 10 19:11:45 svc1 kernel: #13 0xffffffff80ce5e2c at iflib_txq_drain+0x6c
Jul 10 19:11:45 svc1 kernel: #14 0xffffffff80cecafe at drain_ring_lockless+0x6e
Jul 10 19:11:45 svc1 kernel: #15 0xffffffff80cec9e1 at ifmp_ring_enqueue+0x2b1
Jul 10 19:11:45 svc1 kernel: #16 0xffffffff80ceb2e0 at iflib_if_transmit+0x90
Jul 10 19:11:45 svc1 kernel: #17 0xffffffff80cd21e4 at ether_output_frame+0xb4
Jul 10 19:11:45 svc1 kernel: Uptime: 1d7h30m15s
Jul 10 19:11:45 svc1 kernel: Dumping 2212 out of 8153 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Jul 10 19:11:45 svc1 kernel: Dump complete

Hardware is:

FreeBSD 12.0-STABLE r349857 GENERIC amd64
FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)
VT(vga): text 80x25
CPU: Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (2294.61-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50654  Family=0x6  Model=0x55  Stepping=4
  Features=0xf83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS>
  Features2=0xfffa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0xd19f6ffb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,FDPEXC,SMEP,BMI2,ERMS,INVPCID,RTM,NFPUSG,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,AVX512CD,AVX512BW,AVX512VL>
  Structured Extended Features2=0x8<PKU>
  Structured Extended Features3=0x2c000000<IBPB,STIBP,ARCH_CAP>
  XSAVE Features=0xb<XSAVEOPT,XSAVEC,XSAVES>
  IA32_ARCH_CAPS=0x4<RSBA>
  TSC: P-state invariant
Hypervisor: Origin = "VMwareVMware"
real memory  = 8589934592 (8192 MB)
avail memory = 8274690048 (7891 MB)

ACPI APIC Table: <PTLTD          APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s)

acpi0: <INTEL 440BX> on motherboard
acpi0: Power Button (fixed)
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
cpu0: <ACPI CPU> numa-domain 0 on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0

vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci4
vmx0: Using 512 TX descriptors and 256 RX descriptors
vmx0: Using 2 RX queues 2 TX queues
vmx0: failed to allocate 3 MSI-X vectors, err: 6
vmx0: Using an MSI interrupt
vmx0: Ethernet address: 00:50:56:a6:9e:f2
vmx0: netmap queues/slots: TX 1/512, RX 1/512

I can provide further information if needed.
Comment 1 Juraj Lutter freebsd_committer freebsd_triage 2019-07-10 20:15:07 UTC
Created attachment 205671 [details]
savecore crash description
Comment 2 Juraj Lutter freebsd_committer freebsd_triage 2019-07-18 19:50:14 UTC
Please! Has anyone had any chance to look at this? Thanks.
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2019-07-18 20:19:45 UTC
Created attachment 205876 [details]
proposed patch

Based on the line numbers, vmxnet3_isc_txd_credits_update() is being reentered.  I think there is a window where vxcr_next is an invalid queue index; if we take an interrupt during that window, we may reference invalid memory.

Please try the attached patch, which attempts to bandage the problem.
Comment 4 Patrick Kelsey freebsd_committer freebsd_triage 2019-07-18 22:41:28 UTC
(In reply to Mark Johnston from comment #3)
> Created attachment 205876 [details]
> proposed patch
> 
> Based on the line numbers, vmxnet3_isc_txd_credits_update() is being
> reentered.  I think there is a window where vxcr_next is an invalid queue
> index; if we take an interrupt during that window, we may reference invalid
> memory.
> 
> Please try the attached patch, which attempts to bandage the problem.

The problem isn't that vmxnet3_isc_txd_credits_update() needs to be re-enterable, the problem is that r347221 (merged to stable/12 in r349112) changed iflib_legacy_setup() in a way that assumes that all iflib drivers operate their interrupts in combined RXTX mode.  The vmx driver is one such driver that does not.  For vmx, this change results in access to the tx state from multiple contexts (interrupt and group task context) whereas the tx state management is designed to only be accessed from a single context (group task context).

iflib_legacy_setup() needs to be fixed.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2019-07-19 00:04:02 UTC
(In reply to Patrick Kelsey from comment #4)
Marius, could you take a look at this?
Comment 6 Juraj Lutter freebsd_committer freebsd_triage 2019-07-24 20:29:09 UTC
Just for the record, the "proposed patch" from Mark did help, panics are now gone.

If iflib should be fixed, I'm willing to help.
Comment 7 Marius Strobl freebsd_committer freebsd_triage 2019-07-31 15:27:41 UTC
I'd say that with an INTx-type interrupt or a single MSI vector, i. e. all the "legacy" interrupt support iflib is about, there's just no other way than to operate in combined RXTX mode (as opposed to multiple MSI-X vectors which can be associated to either RX or TX on a per-vector basis).
Thus, - as actually already written in my private e-mail to pkelsey@ and the submitter on June 25th - I fully agree with markj@ analysis that vmxnet3_isc_txd_credits_update() currently isn't reentrant as well as with his fix (I'd probably code it to be more in line with the index updating in vmxnet3_isc_rxd_available(), though).
However, as I also already wrote in said e-mail, on top of that it isn't obvious to me that it's safe to update txc->vxcr_next and txc->vxcr_gen non-atomically/lockless since r343291 dropped the locking around these operations.
Comment 8 Marius Strobl freebsd_committer freebsd_triage 2019-07-31 15:36:54 UTC
That said, if a driver has a way to further distinguish the interrupt cause, i. e. RX or TX, of a combined RXTX interrupt source for a particular device and wishes to deal with RX and TX separately it certainly may do so on its own, but shouldn't rely on common/generic iflib code for that.
Comment 9 Patrick Kelsey freebsd_committer freebsd_triage 2019-07-31 15:50:24 UTC
(In reply to Marius Strobl from comment #7)
> I'd say that with an INTx-type interrupt or a single MSI vector, i. e. all
> the "legacy" interrupt support iflib is about, there's just no other way
> than to operate in combined RXTX mode (as opposed to multiple MSI-X vectors
> which can be associated to either RX or TX on a per-vector basis).
> Thus, - as actually already written in my private e-mail to pkelsey@ and the
> submitter on June 25th - I fully agree with markj@ analysis that
> vmxnet3_isc_txd_credits_update() currently isn't reentrant as well as with
> his fix (I'd probably code it to be more in line with the index updating in
> vmxnet3_isc_rxd_available(), though).
> However, as I also already wrote in said e-mail, on top of that it isn't
> obvious to me that it's safe to update txc->vxcr_next and txc->vxcr_gen
> non-atomically/lockless since r343291 dropped the locking around these
> operations.

Marius, you are missing the point, which is that you have failed to consider the case of drivers that don't use TX interrupts at all, of which the vmx driver is an example.  Drivers that don't use TX interrupts still only have one queue-related interrupt in legacy mode.

By assuming that all drivers want to (i.e., are designed to) do TX processing during their RX interrupts, even when those drivers have clearly indicated they are not implementing TX mode or RXTX mode hardware interrupts in their interface to iflib, you are breaking drivers such as vmx.

I fully agree that vmxnet3_isc_txd_credits_update() is not re-entrant - because it was explicitly designed on the basis that it does not have to be.  Anyone of course is welcome to redesign the vmx driver under changed design assumptions, and I would caution anyone going down that path that the whole vmx driver should be re-examined in this case.

I think it would be better to fix the iflib legacy mode interrupt setup so it does not assume it can invoke TX processing from the RX interrupt unless the driver has indicated it is using TXRX mode, or perhaps indicated that it is using hardware-based TX interrupts.
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2019-08-13 00:30:34 UTC
Are either of you planning to work on this?
Comment 11 Juraj Lutter freebsd_committer freebsd_triage 2019-08-25 20:52:54 UTC
Do you happen to know if r351276 can fix this?
Comment 12 Marius Strobl freebsd_committer freebsd_triage 2019-08-28 17:57:37 UTC
Created attachment 206985 [details]
patch
Comment 13 Marius Strobl freebsd_committer freebsd_triage 2019-08-28 18:00:44 UTC
Could you please test whether using the patch from attachment #206985 [details] instead of #205876 solves the problem?
Comment 14 Juraj Lutter freebsd_committer freebsd_triage 2019-09-03 08:05:52 UTC
The patch provided was, obviously, against HEAD. I adopted it for stable/12, applied and so far the system is stable.

I will allow 2 more days (as the vacations are now over and that system will be under high load within next days) for final conclusion.
Comment 15 Juraj Lutter freebsd_committer freebsd_triage 2019-09-07 11:03:16 UTC
I guess we are good. System with this patch survived almost one week of normal operation.
Comment 16 Mark Johnston freebsd_committer freebsd_triage 2019-09-23 21:54:33 UTC
Is there any reason not to commit this?  It would need to be MFCed soon for 12.1.
Comment 17 Kubilay Kocak freebsd_committer freebsd_triage 2019-09-24 03:41:40 UTC
^Triage:

- Set to more appropriate default Assignee, keep virtualization CC'd
- Needs a (real) Assignee to be In Progress

Once committed to head and merged, please set maintainer-approval on the attachment to ? re@f.o for an explicit approval request
Comment 18 Mark Johnston freebsd_committer freebsd_triage 2019-09-28 17:16:53 UTC
https://reviews.freebsd.org/D21831
Comment 19 commit-hook freebsd_committer freebsd_triage 2019-09-30 15:59:40 UTC
A commit references this bug:

Author: markj
Date: Mon Sep 30 15:59:08 UTC 2019
New revision: 352906
URL: https://svnweb.freebsd.org/changeset/base/352906

Log:
  Add IFLIB_SINGLE_IRQ_RX_ONLY.

  As of r347221 the iflib legacy interrupt mode setup assumes that drivers
  perform both receive and transmit processing from the interrupt handler.
  This assumption is invalid in the vmxnet3 driver, so introduce the
  IFLIB_SINGLE_IRQ_RX_ONLY flag to make iflib avoid tx processing in the
  interrupt handler.

  PR:		239118
  Reported and tested by:	Juraj Lutter <otis@sk.freebsd.org>
  Obtained from:	marius
  Reviewed by:	gallatin
  MFC after:	3 days
  Differential Revision:	https://reviews.freebsd.org/D21831

Changes:
  head/sys/dev/vmware/vmxnet3/if_vmx.c
  head/sys/net/iflib.c
  head/sys/net/iflib.h
Comment 20 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-01 02:22:14 UTC
Comment on attachment 206985 [details]
patch

^Triage: Obsoleted by https://reviews.freebsd.org/D21831
Comment 21 commit-hook freebsd_committer freebsd_triage 2019-10-03 14:55:43 UTC
A commit references this bug:

Author: markj
Date: Thu Oct  3 14:55:08 UTC 2019
New revision: 353051
URL: https://svnweb.freebsd.org/changeset/base/353051

Log:
  MFC r352906:
  Add IFLIB_SINGLE_IRQ_RX_ONLY.

  PR:	239118

Changes:
_U  stable/12/
  stable/12/sys/dev/vmware/vmxnet3/if_vmx.c
  stable/12/sys/net/iflib.c
  stable/12/sys/net/iflib.h
Comment 22 commit-hook freebsd_committer freebsd_triage 2019-10-03 15:23:48 UTC
A commit references this bug:

Author: markj
Date: Thu Oct  3 15:23:39 UTC 2019
New revision: 353052
URL: https://svnweb.freebsd.org/changeset/base/353052

Log:
  MFS r353051:
  Add IFLIB_SINGLE_IRQ_RX_ONLY.

  PR:		239118
  Approved by:	re (gjb)

Changes:
_U  releng/12.1/
  releng/12.1/sys/dev/vmware/vmxnet3/if_vmx.c
  releng/12.1/sys/net/iflib.c
  releng/12.1/sys/net/iflib.h
Comment 23 Mark Johnston freebsd_committer freebsd_triage 2019-10-03 15:24:31 UTC
(In reply to Juraj Lutter from comment #15)
Thanks for your help in testing this.  The commit will be in 12.1.