Bug 240608 - if_vmx(4): iflib - Panic with INVARIANTS: Memory modified after free (12.1-pre-QA)
Summary: if_vmx(4): iflib - Panic with INVARIANTS: Memory modified after free (12.1-pr...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net mailing list
Keywords: crash, needs-qa
Depends on:
Blocks: 240700
  Show dependency treegraph
Reported: 2019-09-16 07:45 UTC by Harald Schmalzbauer
Modified: 2019-10-14 18:44 UTC (History)
5 users (show)

See Also:
koobs: mfc-stable11?
koobs: mfc-stable12?


Note You need to log in before you can comment on or make changes to this bug.
Description Harald Schmalzbauer 2019-09-16 07:45:31 UTC

testing 12.1-PRERELEASE updates with debug kernel on cold-standby hardware revealed some unexpected panics related to iflib.
Not sure if I shall file individual bug reports or collect them here in one report.
Need to collect the others one after another, so let's start here with the most unexpected, happened during traffic test utilizing if_vmx(4):

panic: Memory modified after free 0xfffff801381d0000(2048) val=0 @ 0xfffff801381d0000

cpuid = 0
time = 1568618749
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0041352670
vpanic() at vpanic+0x19d/frame 0xfffffe00413526c0
panic() at panic+0x43/frame 0xfffffe0041352720
trash_ctor() at trash_ctor+0x49/frame 0xfffffe0041352730
mb_ctor_clust() at mb_ctor_clust+0x18/frame 0xfffffe0041352760
uma_zalloc_arg() at uma_zalloc_arg+0x8a0/frame 0xfffffe00413527e0
m_cljget() at m_cljget+0x8a/frame 0xfffffe0041352810
_iflib_fl_refill() at _iflib_fl_refill+0x2f1/frame 0xfffffe0041352900
_task_fn_rx() at _task_fn_rx+0xb29/frame 0xfffffe00413529f0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xf9/frame 0xfffffe0041352a40
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x88/frame 0xfffffe0041352a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0041352ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0041352ab0

#9  0xffffffff805cf4ca in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_shutdown.c:866
#10 0xffffffff805cf273 in panic (fmt=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_shutdown.c:804
#11 0xffffffff808da039 in trash_ctor (mem=<value optimized out>, size=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/vm/uma_dbg.c:82
#12 0xffffffff805b2b08 in mb_ctor_clust (mem=0xfffff801381d0000, size=2048, arg=0x0, how=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_mbuf.c:702
#13 0xffffffff808d5030 in uma_zalloc_arg (zone=<value optimized out>, udata=0x0, flags=1)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/vm/uma_core.c:2506
#14 0xffffffff805b18fa in m_cljget (m=0x0, how=1, size=2048)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_mbuf.c:956
#15 0xffffffff80703e41 in _iflib_fl_refill (ctx=0xfffff800028ec800, fl=0xfffff8000293eac0, count=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/iflib.c:2025
#16 0xffffffff806fea59 in _task_fn_rx (context=0xfffff8000293d000)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/iflib.c:2117
#17 0xffffffff80616539 in gtaskqueue_run_locked (queue=0xfffff80002360a00)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/subr_gtaskqueue.c:378
#18 0xffffffff806162f8 in gtaskqueue_thread_loop (arg=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/subr_gtaskqueue.c:559
#19 0xffffffff80596274 in fork_exit (callout=0xffffffff80616270 <gtaskqueue_thread_loop>, arg=0xfffffe000029b008, 
    frame=0xfffffe0041352ac0) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_fork.c:1065
#20 0xffffffff80912c6e in fork_trampoline () at /usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/exception.S:1077
#21 0x0000000000000000 in ?? ()

Hope someone can use that information.  Happily providing more info on request.
Guess I'd better open individual bug reports...

Comment 1 Harald Schmalzbauer 2019-09-16 09:43:04 UTC
I guess this panic with an non-INVARIANTS kernel (wihtout any other debug OPTIONS) is related.
This is happening quiet early after some traffic tests utilizing if_vmx(4).
12.1-prerelease without debug options:

Fatal trap 12: page fault while in kernel mode                                                                                         
cpuid = 1; apic id = 01                                                                                                                
fault virtual address   = 0x0                                                                                                          
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff806ffdc2                                                                                      
stack pointer           = 0x28:0xfffffe0040557900                                                                                      
frame pointer           = 0x28:0xfffffe00405579e0                                                                                      
code segment            = base rx0, limit 0xfffff, type 0x1b                                                                           
                        = DPL 0, pres 1, long 1, def32 0, gran 1                                                                       
processor eflags        = interrupt enabled, resume, IOPL = 0                                                                          
current process         = 0 (if_io_tqg_1)                                                                                              
trap number             = 12                                                                                                           
panic: page fault                                                                                                                      
cpuid = 1                                                                                                                              
time = 1568626299                                                                                                                      
KDB: stack backtrace:                                                                                                                  
#0 0xffffffff8061f047 at kdb_backtrace+0x67                                                                                            
#1 0xffffffff805d33bd at vpanic+0x19d                                                                                                  
#2 0xffffffff805d3213 at panic+0x43                                                                                                    
#3 0xffffffff80941d2c at trap_fatal+0x39c                                                                                              
#4 0xffffffff80941d79 at trap_pfault+0x49                                                                                              
#5 0xffffffff8094136f at trap+0x29f                                                                                                    
#6 0xffffffff8091b92c at calltrap+0x8                                                                                                  
#7 0xffffffff8061d904 at gtaskqueue_run_locked+0x144
#8 0xffffffff8061d568 at gtaskqueue_thread_loop+0x98
#9 0xffffffff8059b103 at fork_exit+0x83
#10 0xffffffff8091c96e at fork_trampoline+0xe

#3  0xffffffff805d3213 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_shutdown.c:804
#4  0xffffffff80941d2c in trap_fatal (frame=<value optimized out>, eva=<value optimized out>)                                          
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/trap.c:943                                                          
#5  0xffffffff80941d79 in trap_pfault (frame=0xfffffe0040557840, usermode=0) at RELENG_12/src/sys/amd64/include/pcpu.h:234             
#6  0xffffffff8094136f in trap (frame=0xfffffe0040557840) at /usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/trap.c:443    
#7  0xffffffff8091b92c in calltrap () at /usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/exception.S:289                   
#8  0xffffffff806ffdc2 in _task_fn_rx (context=<value optimized out>)                                                                  
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/iflib.c:2614                                                                
#9  0xffffffff8061d904 in gtaskqueue_run_locked (queue=0xfffff8000206c100)                                                             
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/subr_gtaskqueue.c:378                                                      
#10 0xffffffff8061d568 in gtaskqueue_thread_loop (arg=<value optimized out>)                                                           
    at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/subr_gtaskqueue.c:559                                                      
#11 0xffffffff8059b103 in fork_exit (callout=0xffffffff8061d4d0 <gtaskqueue_thread_loop>, arg=0xfffffe00025fe020,                      
    frame=0xfffffe0040557ac0) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_fork.c:1065                                 
#12 0xffffffff8091c96e in fork_trampoline () at /usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/exception.S:1077           
#13 0x0000000000000000 in ?? ()

Thanks for any help,

Comment 2 Eric Joyner freebsd_committer 2019-10-09 19:30:53 UTC
Maybe your issue was similar to the one found in this bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239118, which got fixed.

Could you retest?
Comment 3 Harald Schmalzbauer 2019-10-10 04:37:27 UTC
(In reply to Eric Joyner from comment #2)

Thanks for the hint, I also thought this could solve/influence the panic, but won't find time before the weekend.  I also found that it only happens when I use a MTU of 9000 bytes - if I don't confuse things from memory…
Will update soon.

Comment 4 Harald Schmalzbauer 2019-10-12 13:28:59 UTC
tested with RC1:
Still same panic as soon as if_vmx(4) get's load _and_ jumbo frames are in use (mtu 9000).

Unfortunately, solid vmxnet3 support is crucial to my setups.
I managed to patch the vmware vmxnet3 guest driver to work with FreeBSD 12, but it also suffers from panics…
The vmware vmxnet3 driver provides doubled transfer rates compared to if_vmx(3), but my skills and time don't last to fix any of both myself.

In case anyone is interested in the vmxnet3 patch I'll provide on request of course, but won't pollute this bug report with anything completely different.

Comment 5 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-13 22:08:16 UTC
(In reply to Harald Schmalzbauer from comment #4)

@Herald Can you confirm (explicitly) that the panic does *not* occur with jumbo frames *disabled/not configured* ?
Comment 6 Harald Schmalzbauer 2019-10-14 18:44:45 UTC
(In reply to Kubilay Kocak from comment #5)

As far as I could briefly test (some iperf3 streams), using if_vmx(4) does _not_ lead to panic if frames are <= 4k (and interface MTU is set to 9000).

Also, using if_igb(4) with interface MTU of 9000 and frames as big as 9k is working fine with FreeBSD-12_RC1.

Maybe 240700-block can be released if man page mentions the 4k frame size limit?