Bug 206904 - tailq crash/nd inet6
Summary: tailq crash/nd inet6
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords: crash, needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2016-02-04 01:37 UTC by Larry Rosenman
Modified: 2016-10-15 23:28 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable10?
koobs: mfc-stable9?


Attachments
full core.txt (202.37 KB, text/plain)
2016-02-04 01:37 UTC, Larry Rosenman
no flags Details
another one (186.47 KB, text/plain)
2016-02-05 01:02 UTC, Larry Rosenman
no flags Details
and a 3rd (234.82 KB, text/plain)
2016-02-05 01:02 UTC, Larry Rosenman
no flags Details
Another one (218.33 KB, text/plain)
2016-02-07 02:38 UTC, Larry Rosenman
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Larry Rosenman freebsd_committer freebsd_triage 2016-02-04 01:37:48 UTC
Created attachment 166529 [details]
full core.txt

Got the following panic:

borg.lerctr.org dumped core - see /var/crash/vmcore.20

Tue Feb  2 20:59:14 CST 2016

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r294926: Wed Jan 27 12:37:06 CST 2016     root@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64

panic: Bad tailq NEXT(0xffffffff81e8b5f8->tqh_last) != NULL

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: Bad tailq NEXT(0xffffffff81e8b5f8->tqh_last) != NULL
cpuid = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe2e025122c0
vpanic() at vpanic+0x182/frame 0xfffffe2e02512340
panic() at panic+0x43/frame 0xfffffe2e025123a0
nd6_ra_input() at nd6_ra_input+0x13da/frame 0xfffffe2e02512680
icmp6_input() at icmp6_input+0x97e/frame 0xfffffe2e02512820
ip6_input() at ip6_input+0xc3c/frame 0xfffffe2e02512900
netisr_dispatch_src() at netisr_dispatch_src+0x81/frame 0xfffffe2e02512960
ether_demux() at ether_demux+0x15e/frame 0xfffffe2e02512990
ether_nh_input() at ether_nh_input+0x344/frame 0xfffffe2e025129d0
netisr_dispatch_src() at netisr_dispatch_src+0x81/frame 0xfffffe2e02512a30
ether_input() at ether_input+0x4f/frame 0xfffffe2e02512a60
if_input() at if_input+0xa/frame 0xfffffe2e02512a70
em_rxeof() at em_rxeof+0x2f5/frame 0xfffffe2e02512ae0
em_handle_que() at em_handle_que+0x40/frame 0xfffffe2e02512b20
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfffffe2e02512b80
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe2e02512bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe2e02512bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2e02512bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 8h40m34s
Dumping 3340 out of 64467 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

core *IS* available.
Comment 1 Larry Rosenman freebsd_committer freebsd_triage 2016-02-04 01:38:29 UTC
I can reliably reproduce this by rebooting my pfSense router that is doing the rtadv's.
Comment 2 Larry Rosenman freebsd_committer freebsd_triage 2016-02-05 01:02:31 UTC
Created attachment 166582 [details]
another one
Comment 3 Larry Rosenman freebsd_committer freebsd_triage 2016-02-05 01:02:49 UTC
Created attachment 166583 [details]
and a 3rd
Comment 4 Larry Rosenman freebsd_committer freebsd_triage 2016-02-05 01:03:26 UTC
vmcore's are ALL available, and I can give a @FreeBSD.org dev access.
Comment 5 Larry Rosenman freebsd_committer freebsd_triage 2016-02-07 02:38:48 UTC
Created attachment 166686 [details]
Another one
Comment 6 Larry Rosenman freebsd_committer freebsd_triage 2016-02-10 10:16:08 UTC
this seems to be at the root of my tcp6 issues.  I've put a bunch more core.txt's at:

http://www.lerctr.org/~ler/FreeBSD/

I've also put dmesg, loader.conf, rc.conf, sysctl.conf there. 

I'd really like to get to the bottom of this.
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2016-02-11 18:18:28 UTC
I'm working on this.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2016-02-11 20:29:11 UTC
Larry's confirmed that the patch here fixes the crash:

https://people.freebsd.org/~markj/patches/defrouter_locking.diff

I'm going to commit some trivial cleanup portions of that patch and put the rest
up for review.

It's a bit incomplete. In particular, defrouter_reset() is not locked.
Comment 9 commit-hook freebsd_committer freebsd_triage 2016-02-25 20:12:39 UTC
A commit references this bug:

Author: markj
Date: Thu Feb 25 20:12:05 UTC 2016
New revision: 296063
URL: https://svnweb.freebsd.org/changeset/base/296063

Log:
  Lock the NDP default router list and count defrouter references.

  This addresses a number of race conditions that can cause crashes as a
  result of unsynchronized access to the list.

  PR:		206904
  Tested by:	Larry Rosenman <ler@lerctr.org>,
  		Kevin Bowling <kevin.bowling@kev009.com>
  MFC after:	2 months
  Differential Revision: https://reviews.freebsd.org/D5315

Changes:
  head/sys/netinet6/nd6.c
  head/sys/netinet6/nd6.h
  head/sys/netinet6/nd6_nbr.c
  head/sys/netinet6/nd6_rtr.c
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2016-10-15 23:28:43 UTC
This was MFCed to stable/10 in r303458.