Bug 248958

Summary: pptpd + vlan panic
Product: Base System Reporter: Emmanuel Vadot <manu>
Component: kernAssignee: Kristof Provost <kp>
Status: Closed FIXED    
Severity: Affects Some People CC: emaste, glebius, kp, markj
Priority: --- Keywords: panic
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
patch none

Description Emmanuel Vadot freebsd_committer 2020-08-28 08:15:17 UTC
Tested on :
FreeBSD 13.0-CURRENT #0 r364846: Thu Aug 27 05:10:55 UTC 2020
    root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

No modification, just installed from the latest snapshot (FreeBSD-13.0-CURRENT-amd64-20200827-r364846-memstick.img)

Step to reproduce (No need of a pptp peer) :
root@test:~ # sysrc network_interfaces=ix0
network_interfaces: auto -> ix0
root@test:~ # sysrc cloned_interfaces="vlan666"
cloned_interfaces:  -> vlan666
root@test:~ # sysrc ifconfig_ix0="up polling"
ifconfig_ix0: DHCP -> up polling
root@test:~ # sysrc ifconfig_vlan666="vlan 666 vlandev ix0"
ifconfig_vlan666:  -> vlan 666 vlandev ix0
root@test:~ # sysrc ppp_enable="YES"
ppp_enable: NO -> YES
root@test:~ # sysrc ppp_mode="ddial"
ppp_mode: auto -> ddial
root@test:~ # sysrc ppp_nat="NO"
ppp_nat: YES -> NO
root@test:~ # sysrc ppp_profile="crash"
ppp_profile: papchap -> crash
root@test:~ # cat >> /etc/ppp/ppp.conf
crash: 
     set device PPPoE:vlan666
     set authname tutu@titi
     set authkey zerosec
     set dial
     set login
     set log phase tun command
     add default HISADDR
^D
root@test:~ # service netif restart
root@test:~ # service ppp restart
Stopping PPP profile: crash.
Starting PPP profile: crash.
root@test:~ # WARNING: attempt to domain_add(netgraph) after domainfinalize()
panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/net/if_vlan.c:1156
cpuid = 2
time = 1598602352
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0066d41400
vpanic() at vpanic+0x182/frame 0xfffffe0066d41450
panic() at panic+0x43/frame 0xfffffe0066d414b0
vlan_transmit() at vlan_transmit+0x165/frame 0xfffffe0066d41500
ether_output_frame() at ether_output_frame+0xa2/frame 0xfffffe0066d41530
ng_apply_item() at ng_apply_item+0xa8/frame 0xfffffe0066d415b0
ng_snd_item() at ng_snd_item+0x2cf/frame 0xfffffe0066d415f0
ng_pppoe_rcvmsg() at ng_pppoe_rcvmsg+0xc9c/frame 0xfffffe0066d41660
ng_apply_item() at ng_apply_item+0x3e3/frame 0xfffffe0066d416e0
ng_snd_item() at ng_snd_item+0x2cf/frame 0xfffffe0066d41720
ngc_send() at ngc_send+0x19b/frame 0xfffffe0066d417c0
sosend_generic() at sosend_generic+0x49f/frame 0xfffffe0066d41870
sosend() at sosend+0x66/frame 0xfffffe0066d418a0
kern_sendit() at kern_sendit+0x1ec/frame 0xfffffe0066d41930
sendit() at sendit+0x1d8/frame 0xfffffe0066d41980
sys_sendto() at sys_sendto+0x4d/frame 0xfffffe0066d419d0
amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe0066d41af0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0066d41af0
--- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x8007f291a, rsp = 0x7fffffffd8a8, rbp = 0x7fffffffd8f0 ---
KDB: enter: panic
[ thread pid 2314 tid 100146 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10b7ad6(%rip)
db>
Comment 1 Mark Johnston freebsd_committer 2020-08-28 12:55:40 UTC
Created attachment 217588 [details]
patch

There is a general problem in that netgraph is not fully VNET-aware and does not set the current VNET when calling into the network stack in some cases.  See r363735 for a recent example.  I think this patch will fix it, though it's really just a bandaid.
Comment 2 Kristof Provost freebsd_committer 2020-08-29 08:58:35 UTC
I have this: https://reviews.freebsd.org/D26226

I wouldn't expect setting the VNET (which may also be required though) to fix an assertion failure because we're not in NET_EPOCH.
Comment 3 Gleb Smirnoff freebsd_committer 2020-08-29 14:49:26 UTC
> https://reviews.freebsd.org/D26226

Right now reviews.freebsd.org is returning 502 for me, so I can't comment there.

I believe the patch is incorrect there. In the new epoch synced network stack the packet flow of netgraph shall all be covered by epoch. And eventually netgraph internal synchronization shall be converted to epoch. Thus, this is responsibility of an entrance nodes to enter the epoch, not the exit nodes.

ng_socket does that. However, there is one peculiarity, see r358193.

In this particular case what happens in ng_pppoe is that control message call (not covered by epoch) sends a packet down the node, which should be covered by epoch. So, this is ng_pppoe's ng_pppoe_rcvmsg() that needs to enter epoch until some better solution is devised to address the problem described in r358193.
Comment 4 Kristof Provost freebsd_committer 2020-08-29 22:08:50 UTC
(In reply to Gleb Smirnoff from comment #3)
Thanks. I've updated the patch.
Comment 5 commit-hook freebsd_committer 2020-09-02 11:49:40 UTC
A commit references this bug:

Author: kp
Date: Wed Sep  2 11:49:23 UTC 2020
New revision: 365246
URL: https://svnweb.freebsd.org/changeset/base/365246

Log:
  ng_ether: Enter NET_EPOCH where required

  We must enter NET_EPOCH before calling ether_output_frame(). Several of the
  functions it calls (pfil_run_hooks, if_transmit) expect to be running in the
  NET_EPOCH.

  While here remove an unneeded EPOCH entry (which wasn't wide enough to cover
  BRIDGE_INPUT).

  PR:		248958
  Reviewed by:	glebius, bz (previous version), melifaro (previous version)
  Tested by:	manu
  Differential Revision:	https://reviews.freebsd.org/D26226

Changes:
  head/sys/netgraph/ng_ether.c
  head/sys/netgraph/ng_pppoe.c
Comment 6 Mark Linimon freebsd_committer freebsd_triage 2020-09-05 01:29:11 UTC
^Triage: assign to committer for possible MFC consideration.