Bug 276862

Summary: vnet doesnt release iface after running openvpn
Product: Base System Reporter: Peter Much <pmc>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: New ---    
Severity: Affects Only Me CC: eugen, grahamperrin, kib, wigneddoom, zlei
Priority: ---    
Version: 13.3-STABLE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273418

Description Peter Much 2024-02-07 04:03:44 UTC
This problem appeared with 13.3-BETA1, and I could reproduce it with
stable/13-n257197-0efd4b792290 GENERIC

Reproduce:
 - create a vnet jail and a pair of ng_eiface
 - configure one of the eiface als vnet.interface
 - start the jail
 - install openvpn (2.6.8)
 - start the openvon with some config, and terminate it again
 - stop the jail

Normally the vnet.interface should now be back on the host, but now it isnt.

It is possible to move the interface back to the host manually, with ifconfig -vnet, before stopping the jail.
In either case the jail will never fully terminate, and stay in 'dying' state.
Comment 1 Alexander Fedorov 2024-02-07 16:11:24 UTC
May you provide more information?

1. ngctl
2. ifconfig in jail and host
Comment 2 Peter Much 2024-02-07 16:41:55 UTC
(In reply to Alexander Fedorov from comment #1)

host # ngctl l -l
There are 3 total nodes:
  Name: ntele1u         Type: eiface          ID: 00000004   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook      
  ----------      ---------       ---------    -------         ---------      
  ether           ntele1l         eiface       00000008        ether          

  Name: ntele1l         Type: eiface          ID: 00000008   Num hooks: 1
  Local hook      Peer name       Peer type    Peer ID         Peer hook      
  ----------      ---------       ---------    -------         ---------      
  ether           ntele1u         eiface       00000004        ether          

  Name: ngctl10456      Type: socket          ID: 0000000c   Num hooks: 0

host # ifconfig
vtnet0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
        ether 06:1d:92:01:04:01
        inet 192.168.**.** netmask 0xffffffe0 broadcast 192.168.**.**
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vtnet1: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
        ether 00:a0:98:19:3b:56
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
ntele1u: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=28<VLAN_MTU,JUMBO_MTU>
        ether 06:1d:92:09:02:05
        hwaddr 58:9c:fc:00:79:5d
        inet 192.168.99.17 netmask 0xfffffffc broadcast 192.168.99.19
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

jail# ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet 127.0.0.1 netmask 0xff000000
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
ntele1l: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=28<VLAN_MTU,JUMBO_MTU>
        ether 06:1d:92:09:02:06
        hwaddr 58:9c:fc:10:ff:b2
        inet 192.168.99.18 netmask 0xfffffffc broadcast 192.168.99.19
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
---------------------------------------------

I tried to set 
   vnet.interface=vtnet1 
and this also got lost.
Comment 3 Alexander Fedorov 2024-02-07 17:34:15 UTC
Show kldstat
Comment 4 Peter Much 2024-02-07 17:45:50 UTC
I got a step further: it has noting to do with openvpn. Neither with netgraph.

Just do this:
  - service jail start
  - jexec cat /dev/tun42
    ^C
  - jexec ifconfig tun42 destroy
  - service jail stop
Comment 5 Peter Much 2024-02-07 17:52:08 UTC
(In reply to Alexander Fedorov from comment #3)

# kldstat
Id Refs Address                Size Name
 1   10 0xffffffff80200000  2142e30 kernel
 2    1 0xffffffff82510000     39c0 ng_socket.ko
 3    2 0xffffffff82514000     aac8 netgraph.ko
 4    1 0xffffffff8251f000     2210 ng_eiface.ko
 5    1 0xffffffff82522000     2a08 mac_ntpd.ko
Comment 6 Peter Much 2024-02-07 18:06:31 UTC
I have a suitable workaround for now, adding the concerned tunX to cloned_interfaces.
Comment 7 Peter Much 2024-02-07 23:56:07 UTC
The issue appears because of change e900c81ede851f52
which fixes PR 273418
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2024-02-08 04:46:57 UTC
If you destroy the vnet-scoped tunX interface with `ifconfig tunX destroy`,
does the vnet go away?
Comment 9 Peter Much 2024-02-08 05:41:14 UTC
(In reply to Konstantin Belousov from comment #8)

I'm not sure I understand your question, but I think the correct answer is No.

I always destroy my tun devices before terminating the jail, because openvpn runs in chroot and fails to do so. And I've learned that one has to be 100% tidy for vnet to work smoothly.


I just looked what Your code precisely does here in my testsuite:

ifconfig tun42 create
     runs tun_clone_create(), clone_create() returns 1.
  -> no problem.

cat /dev/tun43
     runs tunclone(), clone_create() returns 1;
     runs tun_clone_create(), clone_create() returns 0;
     dev_ref() sets dev->si_refcount to 3.
  -> jid is now indestructible.
Comment 10 Peter Much 2024-02-23 20:50:30 UTC
So, what do we do with this?

Shall we declare it the recommended workaround to explicitely create the tun devices before starting openvpn instances,
or should we try and figure out what went wrong with the former bugfix?

Sorry to interrupt, but I have a bit too many dangling issues currently...