Bug 240837

Summary: crash with 12.1-BETA1
Product: Base System Reporter: Christos Chatzaras <chris>
Component: kernAssignee: Michael Tuexen <tuexen>
Status: Closed FIXED    
Severity: Affects Only Me CC: emaste, rrs, tuexen
Priority: --- Keywords: crash
Version: 12.1-RELEASE   
Hardware: amd64   
OS: Any   
Bug Depends on:    
Bug Blocks: 240700    
Attachments:
Description Flags
core.txt none

Description Christos Chatzaras 2019-09-26 12:07:30 UTC
After upgrading to 12.1-BETA1 and the system was running for few hours I got this panic:

kgdb /boot/kernel/kernel vmcore.0
GNU gdb (GDB) 8.3 [GDB v8.3 for FreeBSD]
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 7; apic id = 07
instruction pointer     = 0x20:0xffffffff80def678
stack pointer           = 0x28:0xfffffe0075de8190
frame pointer           = 0x28:0xfffffe0075de8190
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_7)
trap number             = 9
panic: general protection fault
cpuid = 7
time = 1569498322
KDB: stack backtrace:
#0 0xffffffff80c1bd47 at kdb_backtrace+0x67
#1 0xffffffff80bcf07d at vpanic+0x19d
#2 0xffffffff80bceed3 at panic+0x43
#3 0xffffffff810a6d2c at trap_fatal+0x39c
#4 0xffffffff810a613c at trap+0x6c
#5 0xffffffff8107fdcc at calltrap+0x8
#6 0xffffffff80dea791 at tcp_output+0x271
#7 0xffffffff80de47a1 at tcp_do_segment+0x31f1
#8 0xffffffff80de09a1 at tcp_input+0xdc1
#9 0xffffffff80d59dfb at ip_input+0x13b
#10 0xffffffff80cf194f at netisr_dispatch_src+0xcf
#11 0xffffffff80cd6239 at ether_demux+0x139
#12 0xffffffff80cd74b6 at ether_nh_input+0x346
#13 0xffffffff80cf194f at netisr_dispatch_src+0xcf
#14 0xffffffff80cd664b at ether_input+0x4b
#15 0xffffffff80cee1dd at iflib_rxeof+0xa6d
#16 0xffffffff80ce8635 at _task_fn_rx+0x75
#17 0xffffffff80c1a604 at gtaskqueue_run_locked+0x144
Uptime: 13h31m54s
Dumping 2387 out of 32511 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu.h:234
234     /usr/src/sys/amd64/include/pcpu.h: No such file or directory.
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu.h:234
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bcec78 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bcf0d9 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877
#4  0xffffffff80bceed3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:804
#5  0xffffffff810a6d2c in trap_fatal (frame=0xfffffe0075de80d0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:943
#6  0xffffffff810a613c in trap (frame=0xfffffe0075de80d0) at /usr/src/sys/amd64/amd64/trap.c:221
#7  <signal handler called>
#8  0xffffffff80def678 in tcp_sack_output (tp=0xfffff802ae5407a0, sack_bytes_rexmt=0xfffffe0075de8260) at /usr/src/sys/netinet/tcp_sack.c:846
#9  0xffffffff80dea791 in tcp_output (tp=0xfffff802ae5407a0) at /usr/src/sys/netinet/tcp_output.c:289
#10 0xffffffff80de47a1 in tcp_do_segment (m=0xfffff8075e689c00, th=0xfffff8075e689c7a, so=0xfffff8018577ba38, tp=0xfffff802ae5407a0, drop_hdrlen=52, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2622
#11 0xffffffff80de09a1 in tcp_input (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>) at /usr/src/sys/netinet/tcp_input.c:1395
#12 0xffffffff80d59dfb in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:828
#13 0xffffffff80cf194f in netisr_dispatch_src (proto=1, source=<optimized out>, m=0x4) at /usr/src/sys/net/netisr.c:1122
#14 0xffffffff80cd6239 in ether_demux (ifp=0xfffff80003f89000, m=0xfffffe0075de8260) at /usr/src/sys/net/if_ethersubr.c:879
#15 0xffffffff80cd74b6 in ether_input_internal (ifp=0xfffff80003f89000, m=0xfffffe0075de8260) at /usr/src/sys/net/if_ethersubr.c:667
#16 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:697
#17 0xffffffff80cf194f in netisr_dispatch_src (proto=5, source=<optimized out>, m=0x4) at /usr/src/sys/net/netisr.c:1122
#18 0xffffffff80cd664b in ether_input (ifp=0xfffff80003f89000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:787
#19 0xffffffff80cee1dd in iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2835
#20 0xffffffff80ce8635 in _task_fn_rx (context=0xfffff80003f87000) at /usr/src/sys/net/iflib.c:3775
#21 0xffffffff80c1a604 in gtaskqueue_run_locked (queue=0xfffff80003690c00) at /usr/src/sys/kern/subr_gtaskqueue.c:378
#22 0xffffffff80c1a268 in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:559
#23 0xffffffff80b8f6d3 in fork_exit (callout=0xffffffff80c1a1d0 <gtaskqueue_thread_loop>, arg=0xfffffe00041fd0b0, frame=0xfffffe0075de89c0) at /usr/src/sys/kern/kern_fork.c:1065
#24 <signal handler called>
(kgdb)
Comment 1 Christos Chatzaras 2019-09-26 15:36:44 UTC
Had the same crash in another server.

For now I disable SACK to see if I get more crashes or not.

sysctl net.inet.tcp.sack.enable=0
Comment 2 Christos Chatzaras 2019-09-26 15:46:11 UTC
Created attachment 207861 [details]
core.txt
Comment 3 Michael Tuexen freebsd_committer freebsd_triage 2019-09-27 15:48:50 UTC
The problem was fixed for head in https://svnweb.freebsd.org/changeset/base/352386
, which was MFCed to stable/12 in https://svnweb.freebsd.org/changeset/base/352508. I missed to MFS the fix to releng.12.1, which was branched at r352480.

What happened is that overflowing the sackblks[] changed sackhint.nexthole to an invalid value which was not NULL. From the core provided:

  sackblks = {{
      start = 0xc1f54a52, 
      end = 0xc1f54ffe
    }, {
      start = 0xc1f5229e, 
      end = 0xc1f5284a
    }, {
      start = 0xc1f5229e, 
      end = 0xc1f5284a
    }, {
      start = 0xc1f5229e, 
      end = 0xc1f5284a
    }, {
      start = 0xc1f5229e, 
      end = 0xc1f5284a
    }, {
      start = 0xc1f51746, 
      end = 0xc1f51cf2
    }}, 
  sackhint = {
    nexthole = 0xc1f5119ac1f50bee, 
    sack_bytes_rexmit = 0x0, 
    last_sack_ack = 0x3fe9f863, 
    ispare = 0x0, 
    sacked_bytes = 0xb65, 
    _pad1 = {0x0}, 
    _pad = {0x0}
  },

Since I can't get any changes in BETA2 anymore, the fix will be in BETA3 or RC1.
Comment 4 commit-hook freebsd_committer freebsd_triage 2019-09-30 04:54:37 UTC
A commit references this bug:

Author: tuexen
Date: Mon Sep 30 04:54:02 UTC 2019
New revision: 352886
URL: https://svnweb.freebsd.org/changeset/base/352886

Log:
  MFS r352508:
  Don't write to memory outside of the allocated array for SACK blocks.

  PR:			240837
  Approved by:		re (delphij@)
  Obtained from:		rrs@
  Sponsored by:		Netflix, Inc.

Changes:
_U  releng/12.1/
  releng/12.1/sys/netinet/tcp_sack.c