Bug 231428 - 12.0-ALPHA6 crashes with gif (IPv4 in IPv4) over vtnet
Summary: 12.0-ALPHA6 crashes with gif (IPv4 in IPv4) over vtnet
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Andrey V. Elsukov
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2018-09-17 13:06 UTC by Lev A. Serebryakov
Modified: 2019-06-26 18:43 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lev A. Serebryakov freebsd_committer freebsd_triage 2018-09-17 13:06:00 UTC
I have very fresh 12.0-ALPHA6/r338707 installed as VierualBox guest with vtnet(8) virtual NIC which is bridged with host NIC.

I'm creating gif-based tubbel to other FreeBSD system with standard mantra:

#ifconfig gif0 create <my-int> <his-int>
#ifconfig gif0 tunnel <my-ext> <his-ext>

After that "ping <his-int>" works, but when I run "iperf3 -c <his-int>"  on this ALPHA6 (with "iperf -s" running on other host) system crashes immediately:

anic: Assertion !in_epoch(net_epoch_preempt) && !mtx_owned(&(&(tcbinfo))->ipi_lock) failed at /data/src/sys/netinet/tcp_input.c:803
cpuid = 0
time = 1537187018
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe000044a310
vpanic() at vpanic+0x1a3/frame 0xfffffe000044a370
panic() at panic+0x43/frame 0xfffffe000044a3d0
tcp_input() at tcp_input+0x16a9/frame 0xfffffe000044a520
ip_input() at ip_input+0x126/frame 0xfffffe000044a5a0
netisr_dispatch_src() at netisr_dispatch_src+0x83/frame 0xfffffe000044a600
gif_input() at gif_input+0x2db/frame 0xfffffe000044a640
in_gif_input() at in_gif_input+0x73/frame 0xfffffe000044a680
encap_input() at encap_input+0x1cf/frame 0xfffffe000044a6f0
encap4_input() at encap4_input+0x28/frame 0xfffffe000044a720
ip_input() at ip_input+0x126/frame 0xfffffe000044a7a0
netisr_dispatch_src() at netisr_dispatch_src+0x83/frame 0xfffffe000044a800
ether_demux() at ether_demux+0x15e/frame 0xfffffe000044a830
ether_nh_input() at ether_nh_input+0x373/frame 0xfffffe000044a880
netisr_dispatch_src() at netisr_dispatch_src+0x83/frame 0xfffffe000044a8e0
ether_input() at ether_input+0x42/frame 0xfffffe000044a900
vtnet_rxq_eof() at vtnet_rxq_eof+0x736/frame 0xfffffe000044a9a0
vtnet_rx_vq_intr() at vtnet_rx_vq_intr+0x58/frame 0xfffffe000044a9d0
vtpci_legacy_intr() at vtpci_legacy_intr+0xb0/frame 0xfffffe000044aa10
ithread_loop() at ithread_loop+0x140/frame 0xfffffe000044aa70
fork_exit() at fork_exit+0x84/frame 0xfffffe000044aab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000044aab0

It is 100% reproducible for me.
Comment 1 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-09-17 14:02:59 UTC
Encapsulated inbound traffic is handled by ip_encap subsystem, it invokes gif_input while it is in net_epoch_preempt section. This is why INP_INFO_UNLOCK_ASSERT() triggers. I think we can solve this issue be relaxing this KASSERT to require only !mtx_owned(). But this probably can hide some problems with locking. Or we can use another epoch for ip_encap subsystem. Matt, what you think?
Comment 2 Matthew Macy 2018-09-19 07:20:45 UTC
It can probably be relaxed. I will look tomorrow.
Comment 3 Matthew Macy 2018-09-24 01:30:21 UTC
(In reply to Andrey V. Elsukov from comment #1)

I think the assertion is generally useful. Andrey - is there any way to infer that we're in encap context? In which case we could just set TI_RLOCKED to true and everything would work.


-M
Comment 4 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-09-24 07:57:12 UTC
(In reply to Matthew Macy from comment #3)
> I think the assertion is generally useful. Andrey - is there any way to
> infer that we're in encap context? In which case we could just set
> TI_RLOCKED to true and everything would work.

I think currently there is no way to infer that we are in encap context.
The call path is like this:

ip_input()
  encap_input()
    gif_input()
      netisr_dispatch()
        ip_input()
          tcp_input()

Starting from encap_input we are in encap context.
Comment 5 Matt Macy freebsd_committer freebsd_triage 2018-09-24 21:34:37 UTC
Alright. Let's at least be specific then. Add a:
 #define INP_INFO_WUNLOCK_ASSERT(ipi)		mtx_assert(&(ipi)->ipi_lock, MA_NOTOWNED)


Replace INP_INFO_UNLOCK_ASSERT(&V_tcbinfo) on line 803 with INP_INFO_WUNLOCK_ASSERT(...)
Comment 6 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-09-25 10:32:12 UTC
(In reply to Matt Macy from comment #5)
> Alright. Let's at least be specific then. Add a:
>  #define INP_INFO_WUNLOCK_ASSERT(ipi)		mtx_assert(&(ipi)->ipi_lock,
> MA_NOTOWNED)
> 
> 
> Replace INP_INFO_UNLOCK_ASSERT(&V_tcbinfo) on line 803 with
> INP_INFO_WUNLOCK_ASSERT(...)

tcp_input(), tcp_input_data() and siftr_findinpcb() use INP_INFO_UNLOCK_ASSERT() in several places. I think these places also should be revised.
Comment 7 commit-hook freebsd_committer freebsd_triage 2018-10-01 10:46:17 UTC
A commit references this bug:

Author: ae
Date: Mon Oct  1 10:46:01 UTC 2018
New revision: 339039
URL: https://svnweb.freebsd.org/changeset/base/339039

Log:
  Add INP_INFO_WUNLOCK_ASSERT() macro and use it instead of
  INP_INFO_UNLOCK_ASSERT() in TCP-related code. For encapsulated traffic
  it is possible, that the code is running in net_epoch_preempt section,
  and INP_INFO_UNLOCK_ASSERT() is very strict assertion for such case.

  PR:		231428
  Reviewed by:	mmacy, tuexen
  Approved by:	re (kib)
  Differential Revision:	https://reviews.freebsd.org/D17335

Changes:
  head/sys/netinet/in_pcb.h
  head/sys/netinet/siftr.c
  head/sys/netinet/tcp_hpts.c
  head/sys/netinet/tcp_input.c