Created attachment 268484 [details] core.1.1.txt - first halt uname -sri FreeBSD 15.0-RELEASE-p2 GENERIC swapinfo Device 1K-blocks Used Avail Capacity /dev/da0p3 4194264 0 4194264 0% Details of diagnostics in attached reports, two halts, two files. My config are: Two ocserv servers, with own pid files, own configs, one on socket ${INET_IF}:4444, another on ${INET_IF}:5555 After connect via client software something more, than 10 users, kernel halt on panic with tuncreate() problem. Short analyze: Basic info. Panic: page fault (page error) Address of halt: 0x8 (read by zero point + movement) Proccess: ocserv (PID 4055) Place of halt: /usr/src/sys/net/if_tuntap.c:1013, function tuncreate() Execution stack: tuncreate() → tun_clone_create() → if_clone_create() → tunclone() → devfs_lookup() → namei() → vn_open_cred() → sys_openat() Problem is, trying to create from ocserv TUN-interface (/dev/tun34) started to create point for the pointer, that dosen't exists tuncreate(). First crash (core.1.1.txt) Place of halt: __mtx_lock_sleep() in tuncreate():1043 Address: 0x488 type: mutex block Second crash core.2.txt Place of halt: tuncreate():1013 Address: 0x8 type: Page fault (NULL deref) Shared: Creation of tun-interfaces via ocserv Both crashes are on the one driver of TUN-devices, but in different places of code — IMHO, that's the system problem in tuntap FreeBSD 15.0-RELEASE-p2.
Created attachment 268485 [details] second fault file diagnostics
If you need any information from server, configs, core dumps, other information, feel free to write me here or to email. Also, you can touch me at @McUrex telegram instant messaging. Thank you!
Without second ocserv process, system was working a day, tomorrow I started another proccess for now there was OpenVPN client with config with tun1024 interface. And page fault, reboot with panic again. --- start cut --- #8 __mtx_lock_sleep (c=c@entry=0xfffff800ae4bfb20) at kern_mutex.c:614 #9 tunoutput (ifp=<optimized out>, m0=0xfffff800ae24f200) at if_tuntap.c:1482 #10 ip_output (m=0xfffff800ae24f200) at ip_output.c:814 #11 udp_send (...) at udp_usrreq.c:1520 --- stop cut ---- VM faults: 22,894,682 (critical big?) Copy-on-write faults: 9,252,252 Network interrupts: ~1.3 million/seconds TUN active interfaces: 15+ (tun0, tun1, tun2, tun16, tun34, tun48, tun50...) Two much for tun?
Created attachment 268522 [details] another panic and reboot
Can you tell me a little more about how this system is setup? Any vnets or anything unusual?
This is VMWare virtual machine, with config of: CPU: 2 Cores per socket: 1 Number of sockets: 2 Memory: 16 GB HDD: 350 GB, VM default policy, Paravirtual (SCSI) Please, look for a attached files, for more common information, if you need something special - just ask.
Created attachment 268531 [details] Kyle ask. ifconfig, netstat -rn, vmstat -z, sockstat -4, pkg info, ipfw, 2 ocserv configs,sysctl,openvpn config.
(In reply to mcurex from comment #4) I think we're dealing with two separate issues in this PR, and I don't see offhand how we could have hit this one. On the other hand, thinking about it: we only destroy the mutex after doing an if_detach + if_free, but is it the case (CC glebius@) that the ifnet isn't fully detached until an epoch elapses, so we very well could have some packets lingering and should defer-reclaim those last bits? For the other one: I don't really know how ocserv works, but I think I could see a massive oversight in that one might be able to destroy a tun device that's not yet fully constructed.
Created attachment 268562 [details] Check ifnet's flag first (In reply to Kyle Evans from comment #8) > On the other hand, thinking about it: we only destroy the mutex after doing an > if_detach + if_free, but is it the case (CC glebius@) that the ifnet isn't fully > detached until an epoch elapses, so we very well could have some packets lingering > and should defer-reclaim those last bits? I see one problem with the driver. For the outbound traffic, the path is `net stack` -> `drivers`. Then the ifnet's flag `if_flags` should be checked firstly, and then the driver's one.
(In reply to Zhenlei Huang from comment #9) The above patch is not sufficient. tun_destroy() calls if_detach() without setting if_flags &= ~IFF_UP firstly. In if_detach(), if_down() is called after NET_EPOCH_WAIT(), but actually it should been before. See also https://reviews.freebsd.org/D49359 .