A 13.2-STABLE/amd64 server sometimes runs flawlessly for several weeks, but sometimes panices with same backtrace at midnight after newsyslog rotates and compresses logs then sends SIGHUP to the syslogd that closes and reopens all channels including /dev/console. The kernel panices on open(/dev/console) system call sometimes. The system was source-updated from 12.4-STABLE/amd64 to the commit https://cgit.freebsd.org/src/commit/?h=stable/13&id=8711fd210 This is regression since 12.4-STABLE. # conscontrol Configured: ttyv0 Available: uart,ttyv0,gdb Muting: off # sysctl kern.vty kern.vty: vt # last | grep boot | head -5 boot time Mon Aug 28 00:08 boot time Sun Aug 27 00:07 boot time Sat Aug 26 00:09 boot time Tue Aug 22 00:08 boot time Mon Aug 21 00:09 And I have 5 crashdumps with same backtrace. Custom kernel has debugging options that point to use-after-free (0xdeadc0dedeadc0de, see below). options KDB # Enable kernel debugger support. options KDB_UNATTENDED options KDB_TRACE options DDB # Support DDB. options GDB # Support remote GDB. options INVARIANTS # Enable calls of extra sanity checkin options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by IN options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speedoptions The backtrace: #0 __curthread () at /data/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /data/src/sys/kern/kern_shutdown.c:396 #2 0xffffffff80c0dd43 in kern_reboot (howto=260) at /data/src/sys/kern/kern_shutdown.c:484 #3 0xffffffff80c0e1af in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0152e8d600) at /data/src/sys/kern/kern_shutdown.c:923 #4 0xffffffff80c0df33 in panic (fmt=<unavailable>) at /data/src/sys/kern/kern_shutdown.c:847 #5 0xffffffff811178b7 in trap_fatal (frame=0xfffffe0152e8d690, eva=0) at /data/src/sys/amd64/amd64/trap.c:942 #6 <signal handler called> #7 devfs_populate_loop (dm=dm@entry=0xfffff8044020a000, cleanup=cleanup@entry=0) at /data/src/sys/fs/devfs/devfs_devs.c:533 #8 0xffffffff80a9a0fa in devfs_populate (dm=dm@entry=0xfffff8044020a000) at /data/src/sys/fs/devfs/devfs_devs.c:677 #9 0xffffffff80a9f318 in devfs_populate_vp (vp=0xfffff804401d9988) at /data/src/sys/fs/devfs/devfs_vnops.c:359 #10 0xffffffff80a9d61b in devfs_lookup (ap=0xfffffe0152e8da30) at /data/src/sys/fs/devfs/devfs_vnops.c:1187 #11 0xffffffff80cecbb1 in VOP_LOOKUP (dvp=0xfffff804401d9988, vpp=0xfffffe0152e8dd10, cnp=0xfffffe0152e8dd38) at ./vnode_if.h:69 #12 lookup (ndp=ndp@entry=0xfffffe0152e8dcb8) at /data/src/sys/kern/vfs_lookup.c:1092 #13 0xffffffff80cebba2 in namei (ndp=ndp@entry=0xfffffe0152e8dcb8) at /data/src/sys/kern/vfs_lookup.c:617 #14 0xffffffff80d11f90 in vn_open_cred (ndp=ndp@entry=0xfffffe0152e8dcb8, flagp=flagp@entry=0xfffffe0152e8ddd4, cmode=cmode@entry=0, vn_open_flags=vn_open_flags@entry=16, cred=0xfffff80440282500, fp=0xfffff8005a296af0) at /data/src/sys/kern/vfs_vnops.c:328 #15 0xffffffff80d08c58 in kern_openat (td=0xfffffe003753b000, fd=-100, path=0x1fc80b443e60 <error: Cannot access memory at address 0x1fc80b443e60>, pathseg=UIO_USERSPACE, flags=6, mode=<optimized out>) at /data/src/sys/kern/vfs_syscalls.c:1158 #16 0xffffffff81118283 in syscallenter (td=<optimized out>) at /data/src/sys/amd64/amd64/../../kern/subr_syscall.c:190 #17 amd64_syscall (td=0xfffffe003753b000, traced=0) at /data/src/sys/amd64/amd64/trap.c:1183 #18 <signal handler called> #19 0x00001fc80cc4504a in ?? () Backtrace stopped: Cannot access memory at address 0x1fc80b443d58 (kgdb) frame 15 #15 0xffffffff80d08c58 in kern_openat (td=0xfffffe003753b000, fd=-100, path=0x1fc80b443e60 <error: Cannot access memory at address 0x1fc80b443e60>, pathseg=UIO_USERSPACE, flags=6, mode=<optimized out>) at /data/src/sys/kern/vfs_syscalls.c:1158 1158 error = vn_open_cred(&nd, &flags, cmode, VN_OPEN_WANTIOCTLCAPS, (kgdb) p nd $4 = {ni_dirp = 0x1fc80b443e60 <error: Cannot access memory at address 0x1fc80b443e60>, ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xfffffe0152e8ddb0, ni_startdir = 0x0, ni_rootdir = 0xfffff80003f36000, ni_topdir = 0x0, ni_dirfd = -100, ni_lcf = 0, ni_filecaps = { fc_rights = {cr_rights = {0, 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0}, ni_vp = 0x0, ni_dvp = 0xfffff804401d9988, ni_resflags = 1, ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1, ni_next = 0xfffff8039aa8200c "", ni_cnd = {cn_origflags = 8683588, cn_flags = 344227908, cn_thread = 0xfffffe003753b000, cn_cred = 0xfffff80440282500, cn_nameiop = LOOKUP, cn_lkflags = 532480, cn_pnbuf = 0xfffff8039aa82000 "/dev/console", cn_nameptr = 0xfffff8039aa82005 "console", cn_namelen = 7}, ni_cap_tracker = { tqh_first = 0x0, tqh_last = 0xfffffe0152e8dd78}, ni_dvp_seqc = 928231424, ni_vp_seqc = 4294966784} (kgdb) frame 7 #7 devfs_populate_loop (dm=dm@entry=0xfffff8044020a000, cleanup=cleanup@entry=0) at /data/src/sys/fs/devfs/devfs_devs.c:533 533 cdp->cdp_dirents[dm->dm_idx] != NULL) { (kgdb) p cdp->cdp_dirents[dm->dm_idx] Cannot access memory at address 0xdeadc0dedeadc0de
Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 3; apic id = 14 instruction pointer = 0x20:0xffffffff80a9a1a1 stack pointer = 0x28:0xfffffe0152e8d750 frame pointer = 0x28:0xfffffe0152e8d7b0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2031 (syslogd) trap number = 9 panic: general protection fault cpuid = 3 time = 1693170002 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0152e8d570 vpanic() at vpanic+0x152/frame 0xfffffe0152e8d5c0 panic() at panic+0x43/frame 0xfffffe0152e8d620 trap_fatal() at trap_fatal+0x387/frame 0xfffffe0152e8d680 calltrap() at calltrap+0x8/frame 0xfffffe0152e8d680 --- trap 0x9, rip = 0xffffffff80a9a1a1, rsp = 0xfffffe0152e8d750, rbp = 0xfffffe0152e8d7b0 --- devfs_populate_loop() at devfs_populate_loop+0x91/frame 0xfffffe0152e8d7b0 devfs_populate() at devfs_populate+0x4a/frame 0xfffffe0152e8d7d0 devfs_populate_vp() at devfs_populate_vp+0x88/frame 0xfffffe0152e8d810 devfs_lookup() at devfs_lookup+0x2b/frame 0xfffffe0152e8d9e0 lookup() at lookup+0x491/frame 0xfffffe0152e8da80 namei() at namei+0x332/frame 0xfffffe0152e8db30 vn_open_cred() at vn_open_cred+0x4a0/frame 0xfffffe0152e8dca0 kern_openat() at kern_openat+0x298/frame 0xfffffe0152e8de00 amd64_syscall() at amd64_syscall+0x143/frame 0xfffffe0152e8df30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0152e8df30 --- syscall (499, FreeBSD ELF64, openat), rip = 0x1fc80cc4504a, rsp = 0x1fc80b443d58, rbp = 0x1fc80b443e40 --- Uptime: 23h58m28s Dumping 9723 out of 32715 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Adding several people that might have a clue to CC:
What exactly is freed here? Since this is in a multi-line if statement I would not trust the line number. As in can you print cdp itself and then cdp->cdp_dirent if the former is not bogus? cdp in fact should be dumped to check the refcount on it
(In reply to Mateusz Guzik from comment #3) cdp has lots of 0xdeadc0dedeadc0de : (kgdb) frame 7 #7 devfs_populate_loop (dm=dm@entry=0xfffff8044020a000, cleanup=cleanup@entry=0) at /data/src/sys/fs/devfs/devfs_devs.c:533 533 cdp->cdp_dirents[dm->dm_idx] != NULL) { (kgdb) p *cdp $1 = {cdp_c = {si_spare0 = 0xdeadc0dedeadc0de, si_flags = 3735929054, si_atime = { tv_sec = -2401050962867404578, tv_nsec = -2401050962867404578}, si_ctime = { tv_sec = -2401050962867404578, tv_nsec = -2401050962867404578}, si_mtime = { tv_sec = -2401050962867404578, tv_nsec = -2401050962867404578}, si_uid = 3735929054, si_gid = 3735929054, si_mode = 49374, si_cred = 0xdeadc0dedeadc0de, si_drv0 = -559038242, si_refcount = -559038242, si_list = {le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de}, si_clone = {le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de}, si_children = {lh_first = 0xdeadc0dedeadc0de}, si_siblings = {le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de}, si_parent = 0xdeadc0dedeadc0de, si_mountpt = 0xdeadc0dedeadc0de, si_drv1 = 0xdeadc0dedeadc0de, si_drv2 = 0xdeadc0dedeadc0de, si_devsw = 0xdeadc0dedeadc0de, si_iosize_max = -559038242, si_usecount = 16045693110842147038, si_threadcount = 16045693110842147038, __si_u = {__sid_snapdata = 0xdeadc0dedeadc0de}, si_name = "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336"...}, cdp_list = {tqe_next = 0xdeadc0dedeadc0de, tqe_prev = 0xdeadc0dedeadc0de}, cdp_inode = 3735929054, cdp_flags = 3735929054, cdp_inuse = 3735929053, cdp_maxdirent = 3735929054, cdp_dirents = 0xdeadc0dedeadc0de, cdp_dirent0 = 0xdeadc0dedeadc0de, cdp_dtr_list = {tqe_next = 0xdeadc0dedeadc0de, tqe_prev = 0xdeadc0dedeadc0de}, cdp_dtr_cb = 0xdeadc0dedeadc0de, cdp_dtr_cb_arg = 0xdeadc0dedeadc0de, cdp_fdpriv = {lh_first = 0xdeadc0dedeadc0de}, cdp_threadlock = {lock_object = { lo_name = 0xdeadc0dedeadc0de <error: Cannot access memory at address 0xdeadc0dedeadc0de>, lo_flags = 3735929054, lo_data = 3735929054, lo_witness = 0xdeadc0dedeadc0de}, mtx_lock = 16045693110842147038}} (kgdb) p cdp->cdp_dirents $2 = (struct devfs_dirent **) 0xdeadc0dedeadc0de
So cdp is backed by dev, but freeing dev is decoupled from unlinking it from the global cdp list and it may be some of the frees happen when they should not. There is very funky refcount scheme in there which pins cdp in a way which does not pin dev(???). That said I would start with adding a bool to dev indicating that cdp is hanging out on the list (or not). Then assert it is not when freeing.
I'm not familiar with the code and locking schema. But I'm ready to test patches :-) Could you provide some?
Do you have any third-party modules loaded? Are there any changes in src comparing to stock git? Were there unmounts of devfs mounts?
(In reply to Konstantin Belousov from comment #7) > Do you have any third-party modules loaded? No third-party, but lots of ones from base system: Id Refs Address Size Name 1 109 0xffffffff80200000 2165030 kernel 2 1 0xffffffff82366000 3000 alias_nbt.ko 3 9 0xffffffff82369000 175c8 libalias.ko 4 1 0xffffffff82382000 2b10 alias_dummy.ko 5 1 0xffffffff82385000 6e5a88 zfs.ko 6 1 0xffffffff82a6b000 2fe8 alias_irc.ko 7 1 0xffffffff82a6e000 3130 alias_pptp.ko 8 1 0xffffffff82a72000 6d70 ng_mppc.ko 9 2 0xffffffff82a79000 27b8 rc4.ko 10 12 0xffffffff82a7c000 18de8 netgraph.ko 11 1 0xffffffff82a95000 22458 geom_mirror.ko 12 1 0xffffffff82ab8000 7d38 ipfw_nat.ko 13 1 0xffffffff82ac0000 3498 alias_smedia.ko 14 1 0xffffffff82ac4000 2c08 alias_skinny.ko 15 1 0xffffffff82ac7000 3598 alias_ftp.ko 16 1 0xffffffff82d21000 3530 fdescfs.ko 17 1 0xffffffff82d25000 105c0 ipsec.ko 18 1 0xffffffff82d36000 4250 ichsmb.ko 19 1 0xffffffff82d3b000 2180 smbus.ko 20 1 0xffffffff82d3e000 8db8 ioat.ko 21 1 0xffffffff82d47000 3340 uhid.ko 22 1 0xffffffff82d4b000 3380 usbhid.ko 23 1 0xffffffff82d4f000 32b0 hidbus.ko 24 1 0xffffffff82d53000 3320 wmt.ko 25 1 0xffffffff82d57000 4350 ums.ko 26 1 0xffffffff82d5c000 12520 dummynet.ko 27 1 0xffffffff82d6f000 4700 nullfs.ko 28 1 0xffffffff82d74000 2a08 mac_ntpd.ko 29 1 0xffffffff82d77000 31c8 ng_ether.ko 30 1 0xffffffff82d7b000 3148 ng_ipacct.ko 31 1 0xffffffff82d7f000 39c0 ng_socket.ko 32 1 0xffffffff82d83000 2138 ng_tee.ko 33 1 0xffffffff82d86000 2398 ng_iface.ko 34 1 0xffffffff82d89000 61e8 ng_ppp.ko 35 1 0xffffffff82d90000 51e8 ng_l2tp.ko 36 1 0xffffffff82d96000 31e8 ng_ksocket.ko 37 1 0xffffffff82d9a000 2138 ng_tcpmss.ko 38 1 0xffffffff82d9d000 3138 ng_vjc.ko Custom kernel config is GENERIC plus debugging options cited above plus: include GENERIC ident Base13 options IPDIVERT options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=100 options IPFIREWALL_DEFAULT_TO_ACCEPT device enc > Are there any changes in src comparing to stock git? Only one you committed against another panic: https://cgit.freebsd.org/src/commit/?id=7b335e9f690e77841e3eb7dbf3403429b10fe222 It was applied as single patch to the stable/13 tree updated to https://cgit.freebsd.org/src/commit/?h=stable/13&id=8711fd210 because I suspected that problem here, too. I was wrong. > Were there unmounts of devfs mounts? No. All crashes occured just after midnight and all backtraces point to syslogd opening /dev/console, so I'm confident that panic is triggered by syslogd closing/reopening /dev/console after receiving SIGHUP from newsyslogd.
I can upload compressed kernel.debug and crashdump, if needed. Smallest of crashdumps is 5,4G in size, uncompressed. And 483M compressed with xz -9.
It paniced again today generating another crashdump with same backtrace.
It seems I've got same panic today on different server running system built from same sources but completely different hardware: AMD Athlon(tm) II X2 245 Processor vs. Intel(R) Xeon(R) CPU E5620 etc. Its FreeBSD configuration is pretty same. Can't tell for sure, though, because this system ran without any watchod enabled and it hang hard at midnight. It has no console attached and was power-cycled after weekend (it serves small office). Nevertheless, I suspect same problem but this time it panices and failer to reboot all by itself. I've just enabled on-board watchdog and watchdogd: amdsbwd0: <AMD SB600/SB7xx Watchdog Timer> at iomem 0xfec000f0-0xfec000f3,0xfec000f4-0xfec000f7 on isa0 And will restart it with INVARIANTS-enabled kernel.
This AMD machine was source-upgraded from 12.4-STABLE/amd64 to 13.2-STABLE/amd64 on July 2 and since then it suffered a hang one or two times a month: July 27, August 7, August 29 and today September 11.
Per mjg's comment above, would something like this be useful as a starting point for debugging? (WARNING: compile-tested only) diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c index 6d8ce5cc3a63..245dcdc22307 100644 --- a/sys/fs/devfs/devfs_devs.c +++ b/sys/fs/devfs/devfs_devs.c @@ -175,6 +175,8 @@ devfs_free(struct cdev *cdev) struct cdev_priv *cdp; cdp = cdev2priv(cdev); + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0, + ("cdp %p still on active list", cdp)); if (cdev->si_cred != NULL) crfree(cdev->si_cred); devfs_free_cdp_inode(cdp->cdp_inode); @@ -521,6 +523,8 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup) dev_lock(); TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) { KASSERT(cdp->cdp_dirents != NULL, ("NULL cdp_dirents")); + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0, + ("cdp %p should not be on active list", cdp)); /* * If we are unmounting, or the device has been destroyed, @@ -552,6 +556,7 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup) if (!(cdp->cdp_flags & CDP_ACTIVE)) { if (cdp->cdp_inuse > 0) continue; + cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST; TAILQ_REMOVE(&cdevp_list, cdp, cdp_list); dev_unlock(); dev_rel(&cdp->cdp_c); @@ -703,7 +708,9 @@ devfs_create(struct cdev *dev) dev_lock_assert_locked(); cdp = cdev2priv(dev); - cdp->cdp_flags |= CDP_ACTIVE; + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0, + ("cdp %p already on active list", cdp)); + cdp->cdp_flags |= (CDP_ACTIVE | CDP_ON_ACTIVE_LIST); cdp->cdp_inode = alloc_unrl(devfs_inos); dev_refl(dev); TAILQ_INSERT_TAIL(&cdevp_list, cdp, cdp_list); diff --git a/sys/fs/devfs/devfs_int.h b/sys/fs/devfs/devfs_int.h index 26589e0bded6..9cf50c58018d 100644 --- a/sys/fs/devfs/devfs_int.h +++ b/sys/fs/devfs/devfs_int.h @@ -57,6 +57,7 @@ struct cdev_priv { #define CDP_ACTIVE (1 << 0) #define CDP_SCHED_DTR (1 << 1) #define CDP_UNREF_DTR (1 << 2) +#define CDP_ON_ACTIVE_LIST (1 << 3) u_int cdp_inuse; u_int cdp_maxdirent; diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c index e8c8956d36fd..a8c3c2a36db5 100644 --- a/sys/fs/devfs/devfs_vnops.c +++ b/sys/fs/devfs/devfs_vnops.c @@ -1676,6 +1676,9 @@ devfs_revoke(struct vop_revoke_args *ap) dev_lock(); cdp->cdp_inuse--; if (!(cdp->cdp_flags & CDP_ACTIVE) && cdp->cdp_inuse == 0) { + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0, + ("cdp %p already not on active list", cdp)); + cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST; TAILQ_REMOVE(&cdevp_list, cdp, cdp_list); dev_unlock(); dev_rel(&cdp->cdp_c);
every spot which modifies the list has to come with an assert on the flag being appropriately set or unset, apart from the spot which does the actual free i would even go as far as making it a non-assert check, panicking very loudly and *commit* for the time being
No, checking at list addition or removal is to late. Problem is clearly that device is dereferenced while active, i.e. it sounds as if somebody did dev_rel() manually. So IMO the following assert is due, and it is indeed should be added to the tree: diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c index c6efd0d421b1..f91e6e29b35d 100644 --- a/sys/fs/devfs/devfs_devs.c +++ b/sys/fs/devfs/devfs_devs.c @@ -175,6 +175,8 @@ devfs_free(struct cdev *cdev) struct cdev_priv *cdp; cdp = cdev2priv(cdev); + KASSERT((cdp->cdp_flags & CDP_ACTIVE) == 0, + ("cdev %p %s active and freed", cdp, cdev->si_name)); if (cdev->si_cred != NULL) crfree(cdev->si_cred); devfs_free_cdp_inode(cdp->cdp_inode);
i am saying the current scheme is already buggy and needs to be reworked, but before that happens it would best to find the current bug, if only to make sure the rewrite does not just hide it to that end a bunch of asserts i proposed should go a long way in narrowing it down
In which way the current scheme is buggy? Asserting at active list manipulations would not help, the problem seems to be that cdp was freed without removing from the list.
Shouldn't the asserts in my proposed patch already accomplish what we need here? It may not be truly necessary to add a new flag solely tracking active list membership (vs. CDP_ACTIVE), but it seemed worth doing given that a devnode can be marked inactive while still on cdevp_list. Beyond that, the patch already has: 1) An assert that the node isn't on cdevp_list when devfs_free() is called, which is probably the critical case here due to incorrect (un)referencing. 2) Asserts in the other places that add or remove from cdevp_list, to catch other incorrect behavior e.g. trying add a node to the tail of the list when it's already elsewhere on the list.
Well, the new flag CDP_ON_ACTIVE_LIST mostly duplicates the CDP_ACTIVE. Crucial part is assert that CDP_ACTIVE is not set when freeing. Adding yet another flag would not hurt but it is purely debugging flag. If you add CDP_ACTIVE into devfs_free() assert, then the patch can be committed.
Expanded version that spits out devnode names, adds CDP_ACTIVE to devfs_free() assert, and asserts that we don't try to add a node to the freelist while still on the active list: diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c index 6d8ce5cc3a63..1e28db5966a7 100644 --- a/sys/fs/devfs/devfs_devs.c +++ b/sys/fs/devfs/devfs_devs.c @@ -175,6 +175,9 @@ devfs_free(struct cdev *cdev) struct cdev_priv *cdp; cdp = cdev2priv(cdev); + KASSERT((cdp->cdp_flags & (CDP_ACTIVE | CDP_ON_ACTIVE_LIST)) == 0, + ("%s: cdp %p (%s) still on active list", + __func__, cdp, cdev->si_name)); if (cdev->si_cred != NULL) crfree(cdev->si_cred); devfs_free_cdp_inode(cdp->cdp_inode); @@ -521,6 +524,9 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup) dev_lock(); TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) { KASSERT(cdp->cdp_dirents != NULL, ("NULL cdp_dirents")); + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0, + ("%s: cdp %p (%s) should not be on active list", + __func__, cdp, cdp->cdp_c.si_name)); /* * If we are unmounting, or the device has been destroyed, @@ -552,6 +558,7 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup) if (!(cdp->cdp_flags & CDP_ACTIVE)) { if (cdp->cdp_inuse > 0) continue; + cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST; TAILQ_REMOVE(&cdevp_list, cdp, cdp_list); dev_unlock(); dev_rel(&cdp->cdp_c); @@ -703,7 +710,10 @@ devfs_create(struct cdev *dev) dev_lock_assert_locked(); cdp = cdev2priv(dev); - cdp->cdp_flags |= CDP_ACTIVE; + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0, + ("%s: cdp %p (%s) already on active list", + __func__, cdp, dev->si_name)); + cdp->cdp_flags |= (CDP_ACTIVE | CDP_ON_ACTIVE_LIST); cdp->cdp_inode = alloc_unrl(devfs_inos); dev_refl(dev); TAILQ_INSERT_TAIL(&cdevp_list, cdp, cdp_list); diff --git a/sys/fs/devfs/devfs_int.h b/sys/fs/devfs/devfs_int.h index 26589e0bded6..9cf50c58018d 100644 --- a/sys/fs/devfs/devfs_int.h +++ b/sys/fs/devfs/devfs_int.h @@ -57,6 +57,7 @@ struct cdev_priv { #define CDP_ACTIVE (1 << 0) #define CDP_SCHED_DTR (1 << 1) #define CDP_UNREF_DTR (1 << 2) +#define CDP_ON_ACTIVE_LIST (1 << 3) u_int cdp_inuse; u_int cdp_maxdirent; diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c index e8c8956d36fd..a71cfda9fa9a 100644 --- a/sys/fs/devfs/devfs_vnops.c +++ b/sys/fs/devfs/devfs_vnops.c @@ -1676,6 +1676,10 @@ devfs_revoke(struct vop_revoke_args *ap) dev_lock(); cdp->cdp_inuse--; if (!(cdp->cdp_flags & CDP_ACTIVE) && cdp->cdp_inuse == 0) { + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0, + ("%s: cdp %p (%s) not on active list", + __func__, cdp, dev->si_name)); + cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST; TAILQ_REMOVE(&cdevp_list, cdp, cdp_list); dev_unlock(); dev_rel(&cdp->cdp_c); diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c index 866788530e7f..a3af24a43b61 100644 --- a/sys/kern/kern_conf.c +++ b/sys/kern/kern_conf.c @@ -119,6 +119,8 @@ dev_free_devlocked(struct cdev *cdev) cdp = cdev2priv(cdev); KASSERT((cdp->cdp_flags & CDP_UNREF_DTR) == 0, ("destroy_dev() was not called after delist_dev(%p)", cdev)); + KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0, + ("%s: cdp %p (%s) on active list", __func__, cdp, cdev->si_name)); TAILQ_INSERT_HEAD(&cdevp_free_list, cdp, cdp_list); }
(In reply to Jason A. Harmening from comment #20) Please go ahead.
Will do. Testing on -current looks good so far, at least nothing blows up. Eugene, can you apply the patch on your machine and see if the new asserts catch the bad actor? It should apply cleanly against stable/13.
(In reply to Jason A. Harmening from comment #22) Yesterday I grabbed 13.2-RELEASE/amd64 stock installation DVD image, created bhyve virtual machine installing 13.2 from the image, built custom kernel as GENERIC plus DIAGNOSTIC/INVARIANTS/WITNESS but without any patches. I tried to reproduce the problem with naive shell script: #!/bin/sh i=0 while : do i=$(($i+1)) case "$i" in *000) printf "%u\r" $i ;; esac killall -1 syslogd done 997000 iterations did not reproduce the problem. Either it does not present in the release but only in stable/13, or the script is too haive. All I got is a line printed to the console after its termination with Ctrl-C: Expensive timeout(9) function: 0xffffffff80c6a980(0xfffffe00093e6560) 0.417209440s
(In reply to Jason A. Harmening from comment #20) The patch applied cleanly to stable/13. However, panics are irregular. Current uptime is over 7 days but it paniced more often earlier this month: # | fgrep boot boot time Wed Sep 13 00:08 boot time Tue Sep 12 00:09 boot time Sun Sep 10 00:07 boot time Sat Sep 9 00:08 boot time Wed Sep 6 00:08 boot time Tue Sep 5 00:09 So, no estimated time for next one.
(In reply to Eugene Grosbein from comment #23) Also, noted virtual machine had Uptime: 1d2h59m35s when I decided to power it down with "shutdown -p now" and it printed devfs-related LOR to its console: Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining... 0 0 done All buffers synced. lock order reversal: 1st 0xfffff80003cb2810 ufs (ufs, lockmgr) @ /usr/src/sys/kern/vfs_mount.c:1806 2nd 0xfffff800037db628 devfs (devfs, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:3047 lock order devfs -> ufs established at: #0 0xffffffff80c7d76d at witness_checkorder+0x32d #1 0xffffffff80bdaaf2 at lockmgr_lock_flags+0x182 #2 0xffffffff80f5449c at ffs_lock+0x8c #3 0xffffffff80d12364 at _vn_lock+0x54 #4 0xffffffff80cf34ef at vfs_domount_first+0x5af #5 0xffffffff80cefe39 at vfs_domount+0x309 #6 0xffffffff80cee9a5 at vfs_donmount+0x895 #7 0xffffffff80cf24f3 at kernel_mount+0x63 #8 0xffffffff80cf61c4 at parse_mount+0x4d4 #9 0xffffffff80cf4803 at vfs_mountroot+0x7c3 #10 0xffffffff80b96ed3 at start_init+0x23 #11 0xffffffff80bc4fc0 at fork_exit+0x80 #12 0xffffffff810bc94e at fork_trampoline+0xe lock order ufs -> devfs attempted at: #0 0xffffffff80c7e03d at witness_checkorder+0xbfd #1 0xffffffff80bdc525 at lockmgr_xlock+0x55 #2 0xffffffff80d12364 at _vn_lock+0x54 #3 0xffffffff80cfaa4f at vget_finish+0x4f #4 0xffffffff80a9bfff at devfs_allocv+0xbf #5 0xffffffff80a9b7c3 at devfs_root+0x43 #6 0xffffffff80cffa24 at vfs_cache_root_fallback+0x114 #7 0xffffffff80cfba9c at vflush+0x4c #8 0xffffffff80a9b6ce at devfs_unmount+0x3e #9 0xffffffff80cf09c1 at dounmount+0x391 #10 0xffffffff80cfd331 at vfs_unmountall+0xc1 #11 0xffffffff80cce9ec at bufshutdown+0x31c #12 0xffffffff80c0c3a3 at kern_reboot+0x203 #13 0xffffffff80c0c140 at sys_reboot+0x390 #14 0xffffffff810e9405 at amd64_syscall+0x145 #15 0xffffffff810bc1eb at fast_syscall_common+0xf8 Uptime: 1d2h59m35s acpi0: Powering system off
Yesterday I applied the patch from the comment #20 and rebooted with new kernel. And now I got another panic with one of new KASSERTs. Unread portion of the kernel message buffer: panic: devfs_free: cdp 0xfffff804490bf400 (tun0) still on active list cpuid = 5 time = 1695243602 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0152d4d540 vpanic() at vpanic+0x152/frame 0xfffffe0152d4d590 panic() at panic+0x43/frame 0xfffffe0152d4d5f0 devfs_free() at devfs_free+0x92/frame 0xfffffe0152d4d610 devfs_reclaim_vchr() at devfs_reclaim_vchr+0xef/frame 0xfffffe0152d4d640 VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x34/frame 0xfffffe0152d4d660 vgonel() at vgonel+0x3b4/frame 0xfffffe0152d4d6c0 vgone() at vgone+0x2f/frame 0xfffffe0152d4d6e0 devfs_delete() at devfs_delete+0x161/frame 0xfffffe0152d4d740 devfs_populate_loop() at devfs_populate_loop+0x665/frame 0xfffffe0152d4d7b0 devfs_populate() at devfs_populate+0x4a/frame 0xfffffe0152d4d7d0 devfs_populate_vp() at devfs_populate_vp+0x88/frame 0xfffffe0152d4d810 devfs_lookup() at devfs_lookup+0x2b/frame 0xfffffe0152d4d9e0 lookup() at lookup+0x491/frame 0xfffffe0152d4da80 namei() at namei+0x332/frame 0xfffffe0152d4db30 vn_open_cred() at vn_open_cred+0x4a0/frame 0xfffffe0152d4dca0 kern_openat() at kern_openat+0x298/frame 0xfffffe0152d4de00 amd64_syscall() at amd64_syscall+0x143/frame 0xfffffe0152d4df30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0152d4df30 --- syscall (499, FreeBSD ELF64, openat), rip = 0x2dbfc61e004a, rsp = 0x2dbfc3197398, rbp = 0x2dbfc3197480 --- Uptime: 4h57m21s Dumping 2437 out of 32715 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
(kgdb) bt #0 __curthread () at /data/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /data/src/sys/kern/kern_shutdown.c:396 #2 0xffffffff80c0dd43 in kern_reboot (howto=260) at /data/src/sys/kern/kern_shutdown.c:484 #3 0xffffffff80c0e1af in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0152d4d5d0) at /data/src/sys/kern/kern_shutdown.c:923 #4 0xffffffff80c0df33 in panic (fmt=<unavailable>) at /data/src/sys/kern/kern_shutdown.c:847 #5 0xffffffff80a994d2 in devfs_free (cdev=0xfffff804490bf400) at /data/src/sys/fs/devfs/devfs_devs.c:180 #6 0xffffffff80b9dc2d in dev_rel (dev=<unavailable>, dev@entry=0xfffff804490bf400) at /data/src/sys/kern/kern_conf.c:178 #7 0xffffffff80a9ffdf in devfs_reclaim_vchr (ap=<optimized out>) at /data/src/sys/fs/devfs/devfs_vnops.c:1574 #8 0xffffffff811eb084 in VOP_RECLAIM_APV (vop=0xffffffff81ab08a8 <devfs_specops>, a=a@entry=0xfffffe0152d4d670) at vnode_if.c:2180 #9 0xffffffff80cfceb4 in VOP_RECLAIM (vp=0xfffff80464929000) at ./vnode_if.h:1087 #10 vgonel (vp=vp@entry=0xfffff80464929000) at /data/src/sys/kern/vfs_subr.c:4150 #11 0xffffffff80cfd50f in vgone (vp=vp@entry=0xfffff80464929000) at /data/src/sys/kern/vfs_subr.c:3969 #12 0xffffffff80a99c81 in devfs_delete (dm=dm@entry=0xfffff80003f26d00, de=de@entry=0xfffff80440133a00, flags=flags@entry=0) at /data/src/sys/fs/devfs/devfs_devs.c:411 #13 0xffffffff80a9a7a5 in devfs_populate_loop (dm=dm@entry=0xfffff80003f26d00, cleanup=cleanup@entry=0) at /data/src/sys/fs/devfs/devfs_devs.c:551 #14 0xffffffff80a9a12a in devfs_populate (dm=dm@entry=0xfffff80003f26d00) at /data/src/sys/fs/devfs/devfs_devs.c:684 #15 0xffffffff80a9f3a8 in devfs_populate_vp (vp=0xfffff80003fcb5b8) at /data/src/sys/fs/devfs/devfs_vnops.c:359 #16 0xffffffff80a9d67b in devfs_lookup (ap=0xfffffe0152d4da30) at /data/src/sys/fs/devfs/devfs_vnops.c:1187 #17 0xffffffff80cecbb1 in VOP_LOOKUP (dvp=0xfffff80003fcb5b8, vpp=0xfffffe0152d4dd10, cnp=0xfffffe0152d4dd38) at ./vnode_if.h:69 #18 lookup (ndp=ndp@entry=0xfffffe0152d4dcb8) at /data/src/sys/kern/vfs_lookup.c:1092 #19 0xffffffff80cebba2 in namei (ndp=ndp@entry=0xfffffe0152d4dcb8) at /data/src/sys/kern/vfs_lookup.c:617 #20 0xffffffff80d11f90 in vn_open_cred (ndp=ndp@entry=0xfffffe0152d4dcb8, flagp=flagp@entry=0xfffffe0152d4ddd4, cmode=cmode@entry=0, vn_open_flags=vn_open_flags@entry=16, cred=0xfffff8044037dc00, fp=0xfffff804646c30a0) at /data/src/sys/kern/vfs_vnops.c:328 #21 0xffffffff80d08c58 in kern_openat (td=0xfffffe0037918c80, td@entry=<error reading variable: value is not available>, fd=-100, fd@entry=<error reading variable: value is not available>, path=0x2dbfc31974a0 <error: Cannot access memory at address 0x2dbfc31974a0>, path@entry=<error reading variable: value is not available>, pathseg=UIO_USERSPACE, pathseg@entry=<error reading variable: value is not available>, flags=6, flags@entry=<error reading variable: value is not available>, mode=<unavailable>, mode@entry=<error reading variable: value is not available>) at /data/src/sys/kern/vfs_syscalls.c:1158 #22 0xffffffff81117283 in syscallenter (td=<optimized out>) at /data/src/sys/amd64/amd64/../../kern/subr_syscall.c:190 #23 amd64_syscall (td=0xfffffe0037918c80, traced=0) at /data/src/sys/amd64/amd64/trap.c:1183 #24 <signal handler called> #25 0x00002dbfc61e004a in ?? () Backtrace stopped: Cannot access memory at address 0x2dbfc3197398
(kgdb) frame 21 #21 0xffffffff80d08c58 in kern_openat (td=0xfffffe0037918c80, td@entry=<error reading variable: value is not available>, fd=-100, fd@entry=<error reading variable: value is not available>, path=0x2dbfc31974a0 <error: Cannot access memory at address 0x2dbfc31974a0>, path@entry=<error reading variable: value is not available>, pathseg=UIO_USERSPACE, pathseg@entry=<error reading variable: value is not available>, flags=6, flags@entry=<error reading variable: value is not available>, mode=<unavailable>, mode@entry=<error reading variable: value is not available>) at /data/src/sys/kern/vfs_syscalls.c:1158 (kgdb) p nd $1 = {ni_dirp = 0x2dbfc31974a0 <error: Cannot access memory at address 0x2dbfc31974a0>, ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xfffffe0152d4ddb0, ni_startdir = 0x0, ni_rootdir = 0xfffff80443009988, ni_topdir = 0x0, ni_dirfd = -100, ni_lcf = 0, ni_filecaps = { fc_rights = {cr_rights = {0, 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0}, ni_vp = 0x0, ni_dvp = 0xfffff80003fcb5b8, ni_resflags = 1, ni_debugflags = 3, ni_loopcnt = 0, ni_pathlen = 1, ni_next = 0xfffff8068a66b00c "", ni_cnd = {cn_origflags = 8683588, cn_flags = 344227908, cn_thread = 0xfffffe0037918c80, cn_cred = 0xfffff8044037dc00, cn_nameiop = LOOKUP, cn_lkflags = 532480, cn_pnbuf = 0xfffff8068a66b000 "/dev/console", cn_nameptr = 0xfffff8068a66b005 "console", cn_namelen = 7}, ni_cap_tracker = { tqh_first = 0x0, tqh_last = 0xfffffe0152d4dd78}, ni_dvp_seqc = 932285568, ni_vp_seqc = 4294966784}
(In reply to Eugene Grosbein from comment #26) tun0 is open and active network interface used by site-to-site OpenVPN tunnel.
/etc/newsyslog.conf contains a line: /var/log/openvpn.log 640 30 * @T00 J /var/run/openvpn.pid So, newsyslogd sends SIGHUP to the openvpn daemon at midnight. Its logs before panic are lost, but I have its logs for previous midnight when there was no panic: Sep 20 00:00:02 slh openvpn[2314]: SIGHUP[hard,] received, process restarting Sep 20 00:00:02 slh openvpn[2314]: OpenVPN 2.6.5 amd64-portbld-freebsd13.2 [SSL (OpenSSL)] [LZO] [LZ4] [MH/RECVDA] [AEAD] Sep 20 00:00:02 slh openvpn[2314]: library versions: OpenSSL 1.1.1u-freebsd 30 May 2023, LZO 2.10 Sep 20 00:00:02 slh openvpn[2314]: Restart pause, 1 second(s) Sep 20 00:00:03 slh openvpn[2314]: Diffie-Hellman initialized with 2048 bit key ... Sep 20 00:00:03 slh openvpn[2314]: TUN/TAP device /dev/tun0 opened Crashdump information points to some kind of entanglement between openvpn/tun0 and syslogd/console but it is beyond my comprehension.
(In reply to Eugene Grosbein from comment #29) So print out cdp 0xfffff804490bf400
(In reply to Konstantin Belousov from comment #31) (kgdb) frame 5 #5 0xffffffff80a994d2 in devfs_free (cdev=0xfffff804490bf400) at /data/src/sys/fs/devfs/devfs_devs.c:180 180 KASSERT((cdp->cdp_flags & (CDP_ACTIVE | CDP_ON_ACTIVE_LIST)) == 0, (kgdb) l 175 devfs_free(struct cdev *cdev) 176 { 177 struct cdev_priv *cdp; 178 179 cdp = cdev2priv(cdev); 180 KASSERT((cdp->cdp_flags & (CDP_ACTIVE | CDP_ON_ACTIVE_LIST)) == 0, 181 ("%s: cdp %p (%s) still on active list", 182 __func__, cdp, cdev->si_name)); 183 if (cdev->si_cred != NULL) 184 crfree(cdev->si_cred); (kgdb) p cdp $2 = (struct cdev_priv *) 0xfffff804490bf400 (kgdb) p *cdp $3 = {cdp_c = {si_spare0 = 0x0, si_flags = 0, si_atime = {tv_sec = 1695243602, tv_nsec = 0}, si_ctime = {tv_sec = 1695243602, tv_nsec = 0}, si_mtime = {tv_sec = 1695243602, tv_nsec = 0}, si_uid = 66, si_gid = 68, si_mode = 384, si_cred = 0xfffff8044037dc00, si_drv0 = 0, si_refcount = 0, si_list = {le_next = 0xfffff80029050000, le_prev = 0xffffffff81ac7860 <dead_cdevsw+160>}, si_clone = {le_next = 0x0, le_prev = 0xfffff80440bd0a30}, si_children = {lh_first = 0x0}, si_siblings = { le_next = 0x0, le_prev = 0x0}, si_parent = 0x0, si_mountpt = 0x0, si_drv1 = 0x0, si_drv2 = 0x0, si_devsw = 0x0, si_iosize_max = 65536, si_usecount = 0, si_threadcount = 0, __si_u = {__sid_snapdata = 0x0}, si_name = "tun0", '\000' <repeats 251 times>}, cdp_list = { tqe_next = 0xfffff801ef2fb000, tqe_prev = 0xfffff80017c845e0}, cdp_inode = 203, cdp_flags = 8, cdp_inuse = 1, cdp_maxdirent = 0, cdp_dirents = 0xfffff804490bf608, cdp_dirent0 = 0x0, cdp_dtr_list = {tqe_next = 0x0, tqe_prev = 0x0}, cdp_dtr_cb = 0x0, cdp_dtr_cb_arg = 0x0, cdp_fdpriv = {lh_first = 0x0}, cdp_threadlock = {lock_object = { lo_name = 0xffffffff81321590 "devthrd", lo_flags = 16973824, lo_data = 0, lo_witness = 0xfffff8083fd73680}, mtx_lock = 0}}
Yes, this is the usual problem with dev_clone handlers. commit f041428f9d8aea9fe5a33c5c91ac31074d652672 Author: Konstantin Belousov <kib@FreeBSD.org> Date: Thu Sep 21 13:47:14 2023 +0300 tun/tap: correct ref count on cloned cdevs PR: 273418 diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c index f4dbf685fba6..a01bc11aa64c 100644 --- a/sys/net/if_tuntap.c +++ b/sys/net/if_tuntap.c @@ -548,6 +548,7 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, if (i != 0) i = tun_create_device(drv, unit, NULL, &dev, name); if (i == 0) { + dev_ref(dev); tuncreate(dev); struct tuntap_softc *tp = dev->si_drv1; *ifpp = tp->tun_ifp; @@ -611,8 +612,10 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen, i = tun_create_device(drv, u, cred, dev, name); } - if (i == 0) + if (i == 0) { + dev_ref(*dev); if_clone_create(name, namelen, NULL); + } out: CURVNET_RESTORE(); }
(In reply to Konstantin Belousov from comment #33) Should I try it with stable/13?
(In reply to Eugene Grosbein from comment #34) Code seems to be same.
(In reply to Konstantin Belousov from comment #35) It is not, it lacks KPI change present in main: https://cgit.freebsd.org/src/commit/sys/net/if_tuntap.c?id=91ebcbe02a48ebd40edb49283b90f38d246da15a
devfs consumer is same. There was a trivial conflict. Below is the stable/13' equivalent. commit e84128814bcdc854c1670447ba1525711736d7f9 Author: Konstantin Belousov <kib@FreeBSD.org> Date: Thu Sep 21 13:47:14 2023 +0300 tun/tap: correct ref count on cloned cdevs PR: 273418 (cherry picked from commit 07fda41d2a2df0d0d422f30a8a2353de5f0d9c3e) diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c index b81c2149a62f..bad09a3148b9 100644 --- a/sys/net/if_tuntap.c +++ b/sys/net/if_tuntap.c @@ -544,8 +544,10 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params) /* No preexisting struct cdev *, create one */ if (i != 0) i = tun_create_device(drv, unit, NULL, &dev, name); - if (i == 0) + if (i == 0) { + dev_ref(dev); tuncreate(dev); + } return (i); } @@ -605,8 +607,10 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen, i = tun_create_device(drv, u, cred, dev, name); } - if (i == 0) + if (i == 0) { + dev_ref(*dev); if_clone_create(name, namelen, NULL); + } out: CURVNET_RESTORE(); }
Or better, this one. Previous patch would leak refcount on non-cloned cdevs. commit 0da5656dd70cd5a41c7fb3f4f1c5b8d7f72b9801 Author: Konstantin Belousov <kib@FreeBSD.org> Date: Thu Sep 21 13:47:14 2023 +0300 tun/tap: correct ref count on cloned cdevs PR: 273418 (cherry picked from commit 07fda41d2a2df0d0d422f30a8a2353de5f0d9c3e) diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c index b81c2149a62f..723749c0dca9 100644 --- a/sys/net/if_tuntap.c +++ b/sys/net/if_tuntap.c @@ -541,6 +541,8 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params) /* find any existing device, or allocate new unit number */ dev = NULL; i = clone_create(&drv->clones, &drv->cdevsw, &unit, &dev, 0); + if (i == 0) + dev_ref(dev); /* No preexisting struct cdev *, create one */ if (i != 0) i = tun_create_device(drv, unit, NULL, &dev, name); @@ -596,6 +598,8 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen, /* find any existing device, or allocate new unit number */ i = clone_create(&drv->clones, &drv->cdevsw, &u, dev, 0); + if (i == 0) { + dev_ref(*dev); if (i) { if (append_unit) { namelen = snprintf(devname, sizeof(devname), "%s%d",
(In reply to Konstantin Belousov from comment #38) Last chunk either lacks closing } with some code, or has extra opening {
commit 508e5f7d8d852b0f24cf0200d4486997d5d09409 Author: Konstantin Belousov <kib@FreeBSD.org> Date: Thu Sep 21 13:47:14 2023 +0300 tun/tap: correct ref count on cloned cdevs PR: 273418 (cherry picked from commit 07fda41d2a2df0d0d422f30a8a2353de5f0d9c3e) diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c index b81c2149a62f..756aa04482cb 100644 --- a/sys/net/if_tuntap.c +++ b/sys/net/if_tuntap.c @@ -541,6 +541,8 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params) /* find any existing device, or allocate new unit number */ dev = NULL; i = clone_create(&drv->clones, &drv->cdevsw, &unit, &dev, 0); + if (i == 0) + dev_ref(dev); /* No preexisting struct cdev *, create one */ if (i != 0) i = tun_create_device(drv, unit, NULL, &dev, name); @@ -596,6 +598,8 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen, /* find any existing device, or allocate new unit number */ i = clone_create(&drv->clones, &drv->cdevsw, &u, dev, 0); + if (i == 0) + dev_ref(*dev); if (i) { if (append_unit) { namelen = snprintf(devname, sizeof(devname), "%s%d",
(In reply to Konstantin Belousov from comment #40) Thank you very much. I built new kernel with this patch to tuntap in addition to previous one from comment #20. I will boot new kernel at the end of day and will wait.
(In reply to Konstantin Belousov from comment #40) A week has passed running kernel with this your patch. So far, so good. No panics.
Uptime of the system in question is 16 days already.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=27f1ec0be24b45559793e486a4fa5a2e7fdadc17 commit 27f1ec0be24b45559793e486a4fa5a2e7fdadc17 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-09-21 10:47:14 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-10-09 23:36:59 +0000 tun/tap: correct ref count on cloned cdevs Reported and tested by: eugen PR: 273418 Discussed with: jah, kevans Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42008 sys/net/if_tuntap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5230afce76a2de387cc162384ffef4cebf576e97 commit 5230afce76a2de387cc162384ffef4cebf576e97 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-09-21 10:47:14 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-10-16 07:16:01 +0000 tun/tap: correct ref count on cloned cdevs PR: 273418 (cherry picked from commit 27f1ec0be24b45559793e486a4fa5a2e7fdadc17) sys/net/if_tuntap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e900c81ede851f52ab50b541b9a6ef5fbc9bb919 commit e900c81ede851f52ab50b541b9a6ef5fbc9bb919 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-09-21 10:47:14 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-10-16 07:17:23 +0000 tun/tap: correct ref count on cloned cdevs PR: 273418 (cherry picked from commit 27f1ec0be24b45559793e486a4fa5a2e7fdadc17) sys/net/if_tuntap.c | 4 ++++ 1 file changed, 4 insertions(+)
Uptime is 30 days. Closing this.
Sorry to interrupt, and sorry for the bad news, but this fix has a side-effect: PR 276862 In short; it somehow mangles the vimage where openvpn is run, so it doesn't give back the vnet.interface to the host on jail termination. Sorry again, but You're in company - I already spent two nights hunting this down.