273418 – [panic] Repeating kernel panic on open(/dev/console)

Bug 273418 - [panic] Repeating kernel panic on open(/dev/console)

Summary: [panic] Repeating kernel panic on open(/dev/console)

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.2-STABLE
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:	crash, regression

Depends on:
Blocks:

Reported:	2023-08-29 11:53 UTC by Eugene Grosbein
Modified:	2024-02-08 05:11 UTC (History)
CC List:	5 users (show)

See Also:	276862

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eugene Grosbein freebsd_committer

2023-08-29 11:53:38 UTC

A 13.2-STABLE/amd64 server sometimes runs flawlessly for several weeks, but sometimes panices with same backtrace at midnight after newsyslog rotates and compresses logs then sends SIGHUP to the syslogd that closes and reopens all channels including /dev/console. The kernel panices on open(/dev/console) system call sometimes. The system was source-updated from 12.4-STABLE/amd64 to the commit https://cgit.freebsd.org/src/commit/?h=stable/13&id=8711fd210

This is regression since 12.4-STABLE.

# conscontrol
Configured: ttyv0
 Available: uart,ttyv0,gdb
    Muting: off
# sysctl kern.vty
kern.vty: vt
# last | grep boot | head -5
boot time                                  Mon Aug 28 00:08
boot time                                  Sun Aug 27 00:07
boot time                                  Sat Aug 26 00:09
boot time                                  Tue Aug 22 00:08
boot time                                  Mon Aug 21 00:09

And I have 5 crashdumps with same backtrace. Custom kernel has debugging options that point to use-after-free (0xdeadc0dedeadc0de, see below).

options         KDB                     # Enable kernel debugger support.
options         KDB_UNATTENDED
options         KDB_TRACE
options         DDB                     # Support DDB.
options         GDB                     # Support remote GDB.
options        INVARIANTS              # Enable calls of extra sanity checkin
options        INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by IN
options        WITNESS                 # Enable checks to detect deadlocks and cycles
options        WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speedoptions

The backtrace:

#0  __curthread () at /data/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=textdump@entry=1) at /data/src/sys/kern/kern_shutdown.c:396
#2  0xffffffff80c0dd43 in kern_reboot (howto=260) at /data/src/sys/kern/kern_shutdown.c:484
#3  0xffffffff80c0e1af in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0152e8d600)
    at /data/src/sys/kern/kern_shutdown.c:923
#4  0xffffffff80c0df33 in panic (fmt=<unavailable>) at /data/src/sys/kern/kern_shutdown.c:847
#5  0xffffffff811178b7 in trap_fatal (frame=0xfffffe0152e8d690, eva=0)
    at /data/src/sys/amd64/amd64/trap.c:942
#6  <signal handler called>
#7  devfs_populate_loop (dm=dm@entry=0xfffff8044020a000, cleanup=cleanup@entry=0)
    at /data/src/sys/fs/devfs/devfs_devs.c:533
#8  0xffffffff80a9a0fa in devfs_populate (dm=dm@entry=0xfffff8044020a000)
    at /data/src/sys/fs/devfs/devfs_devs.c:677
#9  0xffffffff80a9f318 in devfs_populate_vp (vp=0xfffff804401d9988)
    at /data/src/sys/fs/devfs/devfs_vnops.c:359
#10 0xffffffff80a9d61b in devfs_lookup (ap=0xfffffe0152e8da30)
    at /data/src/sys/fs/devfs/devfs_vnops.c:1187
#11 0xffffffff80cecbb1 in VOP_LOOKUP (dvp=0xfffff804401d9988, vpp=0xfffffe0152e8dd10,
    cnp=0xfffffe0152e8dd38) at ./vnode_if.h:69
#12 lookup (ndp=ndp@entry=0xfffffe0152e8dcb8) at /data/src/sys/kern/vfs_lookup.c:1092
#13 0xffffffff80cebba2 in namei (ndp=ndp@entry=0xfffffe0152e8dcb8)
    at /data/src/sys/kern/vfs_lookup.c:617
#14 0xffffffff80d11f90 in vn_open_cred (ndp=ndp@entry=0xfffffe0152e8dcb8,
    flagp=flagp@entry=0xfffffe0152e8ddd4, cmode=cmode@entry=0,
    vn_open_flags=vn_open_flags@entry=16, cred=0xfffff80440282500, fp=0xfffff8005a296af0)
    at /data/src/sys/kern/vfs_vnops.c:328
#15 0xffffffff80d08c58 in kern_openat (td=0xfffffe003753b000, fd=-100,
    path=0x1fc80b443e60 <error: Cannot access memory at address 0x1fc80b443e60>,
    pathseg=UIO_USERSPACE, flags=6, mode=<optimized out>)
    at /data/src/sys/kern/vfs_syscalls.c:1158
#16 0xffffffff81118283 in syscallenter (td=<optimized out>)
    at /data/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
#17 amd64_syscall (td=0xfffffe003753b000, traced=0) at /data/src/sys/amd64/amd64/trap.c:1183
#18 <signal handler called>
#19 0x00001fc80cc4504a in ?? ()
Backtrace stopped: Cannot access memory at address 0x1fc80b443d58

(kgdb) frame 15
#15 0xffffffff80d08c58 in kern_openat (td=0xfffffe003753b000, fd=-100,
    path=0x1fc80b443e60 <error: Cannot access memory at address 0x1fc80b443e60>,
    pathseg=UIO_USERSPACE, flags=6, mode=<optimized out>)
    at /data/src/sys/kern/vfs_syscalls.c:1158
1158            error = vn_open_cred(&nd, &flags, cmode, VN_OPEN_WANTIOCTLCAPS,

(kgdb) p nd
$4 = {ni_dirp = 0x1fc80b443e60 <error: Cannot access memory at address 0x1fc80b443e60>,
  ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xfffffe0152e8ddb0, ni_startdir = 0x0,
  ni_rootdir = 0xfffff80003f36000, ni_topdir = 0x0, ni_dirfd = -100, ni_lcf = 0, ni_filecaps = {
    fc_rights = {cr_rights = {0, 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0},
  ni_vp = 0x0, ni_dvp = 0xfffff804401d9988, ni_resflags = 1, ni_debugflags = 3, ni_loopcnt = 0,
  ni_pathlen = 1, ni_next = 0xfffff8039aa8200c "", ni_cnd = {cn_origflags = 8683588,
    cn_flags = 344227908, cn_thread = 0xfffffe003753b000, cn_cred = 0xfffff80440282500,
    cn_nameiop = LOOKUP, cn_lkflags = 532480, cn_pnbuf = 0xfffff8039aa82000 "/dev/console",
    cn_nameptr = 0xfffff8039aa82005 "console", cn_namelen = 7}, ni_cap_tracker = {
    tqh_first = 0x0, tqh_last = 0xfffffe0152e8dd78}, ni_dvp_seqc = 928231424,
  ni_vp_seqc = 4294966784}

(kgdb) frame 7
#7  devfs_populate_loop (dm=dm@entry=0xfffff8044020a000, cleanup=cleanup@entry=0)
    at /data/src/sys/fs/devfs/devfs_devs.c:533
533                         cdp->cdp_dirents[dm->dm_idx] != NULL) {
(kgdb) p cdp->cdp_dirents[dm->dm_idx]
Cannot access memory at address 0xdeadc0dedeadc0de

Comment 1 Eugene Grosbein freebsd_committer

2023-08-29 11:58:05 UTC

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 3; apic id = 14
instruction pointer     = 0x20:0xffffffff80a9a1a1
stack pointer           = 0x28:0xfffffe0152e8d750
frame pointer           = 0x28:0xfffffe0152e8d7b0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2031 (syslogd)
trap number             = 9
panic: general protection fault
cpuid = 3
time = 1693170002
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0152e8d570
vpanic() at vpanic+0x152/frame 0xfffffe0152e8d5c0
panic() at panic+0x43/frame 0xfffffe0152e8d620
trap_fatal() at trap_fatal+0x387/frame 0xfffffe0152e8d680
calltrap() at calltrap+0x8/frame 0xfffffe0152e8d680
--- trap 0x9, rip = 0xffffffff80a9a1a1, rsp = 0xfffffe0152e8d750, rbp = 0xfffffe0152e8d7b0 ---
devfs_populate_loop() at devfs_populate_loop+0x91/frame 0xfffffe0152e8d7b0
devfs_populate() at devfs_populate+0x4a/frame 0xfffffe0152e8d7d0
devfs_populate_vp() at devfs_populate_vp+0x88/frame 0xfffffe0152e8d810
devfs_lookup() at devfs_lookup+0x2b/frame 0xfffffe0152e8d9e0
lookup() at lookup+0x491/frame 0xfffffe0152e8da80
namei() at namei+0x332/frame 0xfffffe0152e8db30
vn_open_cred() at vn_open_cred+0x4a0/frame 0xfffffe0152e8dca0
kern_openat() at kern_openat+0x298/frame 0xfffffe0152e8de00
amd64_syscall() at amd64_syscall+0x143/frame 0xfffffe0152e8df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0152e8df30
--- syscall (499, FreeBSD ELF64, openat), rip = 0x1fc80cc4504a, rsp = 0x1fc80b443d58, rbp = 0x1fc80b443e40 ---
Uptime: 23h58m28s
Dumping 9723 out of 32715 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Comment 2 Eugene Grosbein freebsd_committer

2023-08-29 12:19:20 UTC

Adding several people that might have a clue to CC:

Comment 3 Mateusz Guzik freebsd_committer

2023-08-29 13:12:06 UTC

What exactly is freed here? Since this is in a multi-line if statement I would not trust the line number.

As in can you print cdp itself and then cdp->cdp_dirent if the former is not bogus? cdp in fact should be dumped to check the refcount on it

Comment 4 Eugene Grosbein freebsd_committer

2023-08-29 13:20:38 UTC

(In reply to Mateusz Guzik from comment #3)

cdp has lots of 0xdeadc0dedeadc0de :

(kgdb) frame 7
#7  devfs_populate_loop (dm=dm@entry=0xfffff8044020a000, cleanup=cleanup@entry=0)
    at /data/src/sys/fs/devfs/devfs_devs.c:533
533                         cdp->cdp_dirents[dm->dm_idx] != NULL) {
(kgdb) p *cdp
$1 = {cdp_c = {si_spare0 = 0xdeadc0dedeadc0de, si_flags = 3735929054, si_atime = {
      tv_sec = -2401050962867404578, tv_nsec = -2401050962867404578}, si_ctime = {
      tv_sec = -2401050962867404578, tv_nsec = -2401050962867404578}, si_mtime = {
      tv_sec = -2401050962867404578, tv_nsec = -2401050962867404578}, si_uid = 3735929054,
    si_gid = 3735929054, si_mode = 49374, si_cred = 0xdeadc0dedeadc0de, si_drv0 = -559038242,
    si_refcount = -559038242, si_list = {le_next = 0xdeadc0dedeadc0de,
      le_prev = 0xdeadc0dedeadc0de}, si_clone = {le_next = 0xdeadc0dedeadc0de,
      le_prev = 0xdeadc0dedeadc0de}, si_children = {lh_first = 0xdeadc0dedeadc0de},
    si_siblings = {le_next = 0xdeadc0dedeadc0de, le_prev = 0xdeadc0dedeadc0de},
    si_parent = 0xdeadc0dedeadc0de, si_mountpt = 0xdeadc0dedeadc0de,
    si_drv1 = 0xdeadc0dedeadc0de, si_drv2 = 0xdeadc0dedeadc0de, si_devsw = 0xdeadc0dedeadc0de,
    si_iosize_max = -559038242, si_usecount = 16045693110842147038,
    si_threadcount = 16045693110842147038, __si_u = {__sid_snapdata = 0xdeadc0dedeadc0de},
    si_name = "\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336\300\255\336\336"...}, cdp_list = {tqe_next = 0xdeadc0dedeadc0de,
    tqe_prev = 0xdeadc0dedeadc0de}, cdp_inode = 3735929054, cdp_flags = 3735929054,
  cdp_inuse = 3735929053, cdp_maxdirent = 3735929054, cdp_dirents = 0xdeadc0dedeadc0de,
  cdp_dirent0 = 0xdeadc0dedeadc0de, cdp_dtr_list = {tqe_next = 0xdeadc0dedeadc0de,
    tqe_prev = 0xdeadc0dedeadc0de}, cdp_dtr_cb = 0xdeadc0dedeadc0de,
  cdp_dtr_cb_arg = 0xdeadc0dedeadc0de, cdp_fdpriv = {lh_first = 0xdeadc0dedeadc0de},
  cdp_threadlock = {lock_object = {
      lo_name = 0xdeadc0dedeadc0de <error: Cannot access memory at address 0xdeadc0dedeadc0de>,
      lo_flags = 3735929054, lo_data = 3735929054, lo_witness = 0xdeadc0dedeadc0de},
    mtx_lock = 16045693110842147038}}

(kgdb) p cdp->cdp_dirents
$2 = (struct devfs_dirent **) 0xdeadc0dedeadc0de

Comment 5 Mateusz Guzik freebsd_committer

2023-08-29 13:33:12 UTC

So cdp is backed by dev, but freeing dev is decoupled from unlinking it from the global cdp list and it may be some of the frees happen when they should not. There is very funky refcount scheme in there which pins cdp in a way which does not pin dev(???).

That said I would start with adding a bool to dev indicating that cdp is hanging out on the list (or not). Then assert it is not when freeing.

Comment 6 Eugene Grosbein freebsd_committer

2023-08-29 13:56:10 UTC

I'm not familiar with the code and locking schema. But I'm ready to test patches :-) Could you provide some?

Comment 7 Konstantin Belousov freebsd_committer

2023-08-30 05:05:12 UTC

Do you have any third-party modules loaded?
Are there any changes in src comparing to stock git?
Were there unmounts of devfs mounts?

Comment 8 Eugene Grosbein freebsd_committer

2023-08-30 06:11:42 UTC

(In reply to Konstantin Belousov from comment #7)

> Do you have any third-party modules loaded?

No third-party, but lots of ones from base system:

Id Refs Address                Size Name
 1  109 0xffffffff80200000  2165030 kernel
 2    1 0xffffffff82366000     3000 alias_nbt.ko
 3    9 0xffffffff82369000    175c8 libalias.ko
 4    1 0xffffffff82382000     2b10 alias_dummy.ko
 5    1 0xffffffff82385000   6e5a88 zfs.ko
 6    1 0xffffffff82a6b000     2fe8 alias_irc.ko
 7    1 0xffffffff82a6e000     3130 alias_pptp.ko
 8    1 0xffffffff82a72000     6d70 ng_mppc.ko
 9    2 0xffffffff82a79000     27b8 rc4.ko
10   12 0xffffffff82a7c000    18de8 netgraph.ko
11    1 0xffffffff82a95000    22458 geom_mirror.ko
12    1 0xffffffff82ab8000     7d38 ipfw_nat.ko
13    1 0xffffffff82ac0000     3498 alias_smedia.ko
14    1 0xffffffff82ac4000     2c08 alias_skinny.ko
15    1 0xffffffff82ac7000     3598 alias_ftp.ko
16    1 0xffffffff82d21000     3530 fdescfs.ko
17    1 0xffffffff82d25000    105c0 ipsec.ko
18    1 0xffffffff82d36000     4250 ichsmb.ko
19    1 0xffffffff82d3b000     2180 smbus.ko
20    1 0xffffffff82d3e000     8db8 ioat.ko
21    1 0xffffffff82d47000     3340 uhid.ko
22    1 0xffffffff82d4b000     3380 usbhid.ko
23    1 0xffffffff82d4f000     32b0 hidbus.ko
24    1 0xffffffff82d53000     3320 wmt.ko
25    1 0xffffffff82d57000     4350 ums.ko
26    1 0xffffffff82d5c000    12520 dummynet.ko
27    1 0xffffffff82d6f000     4700 nullfs.ko
28    1 0xffffffff82d74000     2a08 mac_ntpd.ko
29    1 0xffffffff82d77000     31c8 ng_ether.ko
30    1 0xffffffff82d7b000     3148 ng_ipacct.ko
31    1 0xffffffff82d7f000     39c0 ng_socket.ko
32    1 0xffffffff82d83000     2138 ng_tee.ko
33    1 0xffffffff82d86000     2398 ng_iface.ko
34    1 0xffffffff82d89000     61e8 ng_ppp.ko
35    1 0xffffffff82d90000     51e8 ng_l2tp.ko
36    1 0xffffffff82d96000     31e8 ng_ksocket.ko
37    1 0xffffffff82d9a000     2138 ng_tcpmss.ko
38    1 0xffffffff82d9d000     3138 ng_vjc.ko

Custom kernel config is GENERIC plus debugging options cited above plus:

include GENERIC
ident           Base13
options         IPDIVERT
options         IPFIREWALL
options         IPFIREWALL_VERBOSE
options         IPFIREWALL_VERBOSE_LIMIT=100
options         IPFIREWALL_DEFAULT_TO_ACCEPT
device          enc

> Are there any changes in src comparing to stock git?

Only one you committed against another panic:
https://cgit.freebsd.org/src/commit/?id=7b335e9f690e77841e3eb7dbf3403429b10fe222

It was applied as single patch to the stable/13 tree updated to https://cgit.freebsd.org/src/commit/?h=stable/13&id=8711fd210 because I suspected that problem here, too. I was wrong.

> Were there unmounts of devfs mounts?

No. All crashes occured just after midnight and all backtraces point to syslogd opening /dev/console, so I'm confident that panic is triggered by syslogd closing/reopening /dev/console after receiving SIGHUP from newsyslogd.

Comment 9 Eugene Grosbein freebsd_committer

2023-08-30 06:32:00 UTC

I can upload compressed kernel.debug and crashdump, if needed. Smallest of crashdumps is 5,4G in size, uncompressed. And 483M compressed with xz -9.

Comment 10 Eugene Grosbein freebsd_committer

2023-09-05 11:01:14 UTC

It paniced again today generating another crashdump with same backtrace.

Comment 11 Eugene Grosbein freebsd_committer

2023-09-11 09:52:40 UTC

It seems I've got same panic today on different server running system built from same sources but completely different hardware: AMD Athlon(tm) II X2 245 Processor vs. Intel(R) Xeon(R) CPU E5620 etc. Its FreeBSD configuration is pretty same.

Can't tell for sure, though, because this system ran without any watchod enabled and it hang hard at midnight. It has no console attached and was power-cycled after weekend (it serves small office). Nevertheless, I suspect same problem but this time it panices and failer to reboot all by itself.

I've just enabled on-board watchdog and watchdogd:

amdsbwd0: <AMD SB600/SB7xx Watchdog Timer> at iomem 0xfec000f0-0xfec000f3,0xfec000f4-0xfec000f7 on isa0

And will restart it with INVARIANTS-enabled kernel.

Comment 12 Eugene Grosbein freebsd_committer

2023-09-11 10:10:01 UTC

This AMD machine was source-upgraded from 12.4-STABLE/amd64 to 13.2-STABLE/amd64 on July 2 and since then it suffered a hang one or two times a month: July 27, August 7, August 29 and today September 11.

Comment 13 Jason A. Harmening freebsd_committer

2023-09-18 19:20:56 UTC

Per mjg's comment above, would something like this be useful as a starting point for debugging? (WARNING: compile-tested only)

diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c
index 6d8ce5cc3a63..245dcdc22307 100644
--- a/sys/fs/devfs/devfs_devs.c
+++ b/sys/fs/devfs/devfs_devs.c
@@ -175,6 +175,8 @@ devfs_free(struct cdev *cdev)
        struct cdev_priv *cdp;
 
        cdp = cdev2priv(cdev);
+       KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0,
+           ("cdp %p still on active list", cdp));
        if (cdev->si_cred != NULL)
                crfree(cdev->si_cred);
        devfs_free_cdp_inode(cdp->cdp_inode);
@@ -521,6 +523,8 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup)
        dev_lock();
        TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) {
                KASSERT(cdp->cdp_dirents != NULL, ("NULL cdp_dirents"));
+               KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0,
+                   ("cdp %p should not be on active list", cdp));
 
                /*
                 * If we are unmounting, or the device has been destroyed,
@@ -552,6 +556,7 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup)
                if (!(cdp->cdp_flags & CDP_ACTIVE)) {
                        if (cdp->cdp_inuse > 0)
                                continue;
+                       cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST;
                        TAILQ_REMOVE(&cdevp_list, cdp, cdp_list);
                        dev_unlock();
                        dev_rel(&cdp->cdp_c);
@@ -703,7 +708,9 @@ devfs_create(struct cdev *dev)
 
        dev_lock_assert_locked();
        cdp = cdev2priv(dev);
-       cdp->cdp_flags |= CDP_ACTIVE;
+       KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0,
+           ("cdp %p already on active list", cdp));
+       cdp->cdp_flags |= (CDP_ACTIVE | CDP_ON_ACTIVE_LIST);
        cdp->cdp_inode = alloc_unrl(devfs_inos);
        dev_refl(dev);
        TAILQ_INSERT_TAIL(&cdevp_list, cdp, cdp_list);
diff --git a/sys/fs/devfs/devfs_int.h b/sys/fs/devfs/devfs_int.h
index 26589e0bded6..9cf50c58018d 100644
--- a/sys/fs/devfs/devfs_int.h
+++ b/sys/fs/devfs/devfs_int.h
@@ -57,6 +57,7 @@ struct cdev_priv {
 #define CDP_ACTIVE             (1 << 0)
 #define CDP_SCHED_DTR          (1 << 1)
 #define        CDP_UNREF_DTR           (1 << 2)
+#define CDP_ON_ACTIVE_LIST     (1 << 3)
 
        u_int                   cdp_inuse;
        u_int                   cdp_maxdirent;
diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c
index e8c8956d36fd..a8c3c2a36db5 100644
--- a/sys/fs/devfs/devfs_vnops.c
+++ b/sys/fs/devfs/devfs_vnops.c
@@ -1676,6 +1676,9 @@ devfs_revoke(struct vop_revoke_args *ap)
        dev_lock();
        cdp->cdp_inuse--;
        if (!(cdp->cdp_flags & CDP_ACTIVE) && cdp->cdp_inuse == 0) {
+               KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0,
+                   ("cdp %p already not on active list", cdp));
+               cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST;
                TAILQ_REMOVE(&cdevp_list, cdp, cdp_list);
                dev_unlock();
                dev_rel(&cdp->cdp_c);

Comment 14 Mateusz Guzik freebsd_committer

2023-09-18 21:22:40 UTC

every spot which modifies the list has to come with an assert on the flag being appropriately set or unset, apart from the spot which does the actual free

i would even go as far as making it a non-assert check, panicking very loudly and *commit* for the time being

Comment 15 Konstantin Belousov freebsd_committer

2023-09-18 21:29:04 UTC

No, checking at list addition or removal is to late.
Problem is clearly that device is dereferenced while active, i.e. it sounds
as if somebody did dev_rel() manually.

So IMO the following assert is due, and it is indeed should be added to
the tree:

diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c
index c6efd0d421b1..f91e6e29b35d 100644
--- a/sys/fs/devfs/devfs_devs.c
+++ b/sys/fs/devfs/devfs_devs.c
@@ -175,6 +175,8 @@ devfs_free(struct cdev *cdev)
        struct cdev_priv *cdp;
 
        cdp = cdev2priv(cdev);
+       KASSERT((cdp->cdp_flags & CDP_ACTIVE) == 0,
+           ("cdev %p %s active and freed", cdp, cdev->si_name));
        if (cdev->si_cred != NULL)
                crfree(cdev->si_cred);
        devfs_free_cdp_inode(cdp->cdp_inode);

Comment 16 Mateusz Guzik freebsd_committer

2023-09-18 21:35:59 UTC

i am saying the current scheme is already buggy and needs to be reworked, but before that happens it would best to find the current bug, if only to make sure the rewrite does not just hide it

to that end a bunch of asserts i proposed should go a long way in narrowing it down

Comment 17 Konstantin Belousov freebsd_committer

2023-09-18 21:49:14 UTC

In which way the current scheme is buggy?

Asserting at active list manipulations would not help, the problem seems
to be that cdp was freed without removing from the list.

Comment 18 Jason A. Harmening freebsd_committer

2023-09-18 22:15:12 UTC

Shouldn't the asserts in my proposed patch already accomplish what we need here?

It may not be truly necessary to add a new flag solely tracking active list membership (vs. CDP_ACTIVE), but it seemed worth doing given that a devnode can be marked inactive while still on cdevp_list.  Beyond that, the patch already has:

1) An assert that the node isn't on cdevp_list when devfs_free() is called, which is probably the critical case here due to incorrect (un)referencing.

2) Asserts in the other places that add or remove from cdevp_list, to catch other incorrect behavior e.g. trying add a node to the tail of the list when it's already elsewhere on the list.

Comment 19 Konstantin Belousov freebsd_committer

2023-09-18 22:21:14 UTC

Well, the new flag CDP_ON_ACTIVE_LIST mostly duplicates the CDP_ACTIVE.

Crucial part is assert that CDP_ACTIVE is not set when freeing.  Adding
yet another flag would not hurt but it is purely debugging flag.  If you
add CDP_ACTIVE into devfs_free() assert, then the patch can be committed.

Comment 20 Jason A. Harmening freebsd_committer

2023-09-19 01:25:52 UTC

Expanded version that spits out devnode names, adds CDP_ACTIVE to devfs_free() assert, and asserts that we don't try to add a node to the freelist while still on the active list:

diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c
index 6d8ce5cc3a63..1e28db5966a7 100644
--- a/sys/fs/devfs/devfs_devs.c
+++ b/sys/fs/devfs/devfs_devs.c
@@ -175,6 +175,9 @@ devfs_free(struct cdev *cdev)
        struct cdev_priv *cdp;
 
        cdp = cdev2priv(cdev);
+       KASSERT((cdp->cdp_flags & (CDP_ACTIVE | CDP_ON_ACTIVE_LIST)) == 0,
+           ("%s: cdp %p (%s) still on active list",
+           __func__, cdp, cdev->si_name));
        if (cdev->si_cred != NULL)
                crfree(cdev->si_cred);
        devfs_free_cdp_inode(cdp->cdp_inode);
@@ -521,6 +524,9 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup)
        dev_lock();
        TAILQ_FOREACH(cdp, &cdevp_list, cdp_list) {
                KASSERT(cdp->cdp_dirents != NULL, ("NULL cdp_dirents"));
+               KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0,
+                   ("%s: cdp %p (%s) should not be on active list",
+                   __func__, cdp, cdp->cdp_c.si_name));
 
                /*
                 * If we are unmounting, or the device has been destroyed,
@@ -552,6 +558,7 @@ devfs_populate_loop(struct devfs_mount *dm, int cleanup)
                if (!(cdp->cdp_flags & CDP_ACTIVE)) {
                        if (cdp->cdp_inuse > 0)
                                continue;
+                       cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST;
                        TAILQ_REMOVE(&cdevp_list, cdp, cdp_list);
                        dev_unlock();
                        dev_rel(&cdp->cdp_c);
@@ -703,7 +710,10 @@ devfs_create(struct cdev *dev)
 
        dev_lock_assert_locked();
        cdp = cdev2priv(dev);
-       cdp->cdp_flags |= CDP_ACTIVE;
+       KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0,
+           ("%s: cdp %p (%s) already on active list",
+           __func__, cdp, dev->si_name));
+       cdp->cdp_flags |= (CDP_ACTIVE | CDP_ON_ACTIVE_LIST);
        cdp->cdp_inode = alloc_unrl(devfs_inos);
        dev_refl(dev);
        TAILQ_INSERT_TAIL(&cdevp_list, cdp, cdp_list);
diff --git a/sys/fs/devfs/devfs_int.h b/sys/fs/devfs/devfs_int.h
index 26589e0bded6..9cf50c58018d 100644
--- a/sys/fs/devfs/devfs_int.h
+++ b/sys/fs/devfs/devfs_int.h
@@ -57,6 +57,7 @@ struct cdev_priv {
 #define CDP_ACTIVE             (1 << 0)
 #define CDP_SCHED_DTR          (1 << 1)
 #define        CDP_UNREF_DTR           (1 << 2)
+#define CDP_ON_ACTIVE_LIST     (1 << 3)
 
        u_int                   cdp_inuse;
        u_int                   cdp_maxdirent;
diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c
index e8c8956d36fd..a71cfda9fa9a 100644
--- a/sys/fs/devfs/devfs_vnops.c
+++ b/sys/fs/devfs/devfs_vnops.c
@@ -1676,6 +1676,10 @@ devfs_revoke(struct vop_revoke_args *ap)
        dev_lock();
        cdp->cdp_inuse--;
        if (!(cdp->cdp_flags & CDP_ACTIVE) && cdp->cdp_inuse == 0) {
+               KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) != 0,
+                   ("%s: cdp %p (%s) not on active list",
+                   __func__, cdp, dev->si_name));
+               cdp->cdp_flags &= ~CDP_ON_ACTIVE_LIST;
                TAILQ_REMOVE(&cdevp_list, cdp, cdp_list);
                dev_unlock();
                dev_rel(&cdp->cdp_c);
diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c
index 866788530e7f..a3af24a43b61 100644
--- a/sys/kern/kern_conf.c
+++ b/sys/kern/kern_conf.c
@@ -119,6 +119,8 @@ dev_free_devlocked(struct cdev *cdev)
        cdp = cdev2priv(cdev);
        KASSERT((cdp->cdp_flags & CDP_UNREF_DTR) == 0,
            ("destroy_dev() was not called after delist_dev(%p)", cdev));
+       KASSERT((cdp->cdp_flags & CDP_ON_ACTIVE_LIST) == 0,
+           ("%s: cdp %p (%s) on active list", __func__, cdp, cdev->si_name));
        TAILQ_INSERT_HEAD(&cdevp_free_list, cdp, cdp_list);
 }

Comment 21 Konstantin Belousov freebsd_committer

2023-09-19 06:00:14 UTC

(In reply to Jason A. Harmening from comment #20)
Please go ahead.

Comment 22 Jason A. Harmening freebsd_committer

2023-09-19 17:17:22 UTC

Will do.  Testing on -current looks good so far, at least nothing blows up.
Eugene, can you apply the patch on your machine and see if the new asserts catch the bad actor?  It should apply cleanly against stable/13.

Comment 23 Eugene Grosbein freebsd_committer

2023-09-20 06:09:08 UTC

(In reply to Jason A. Harmening from comment #22)

Yesterday I grabbed 13.2-RELEASE/amd64 stock installation DVD image, created bhyve virtual machine installing 13.2 from the image, built custom kernel as GENERIC plus DIAGNOSTIC/INVARIANTS/WITNESS but without any patches.

I tried to reproduce the problem with naive shell script:

#!/bin/sh

i=0
while :
do
  i=$(($i+1))
  case "$i" in
  *000) printf "%u\r" $i ;;
  esac
  killall -1 syslogd
done

997000 iterations did not reproduce the problem. Either it does not present in the release but only in stable/13, or the script is too haive. All I got is a line printed to the console after its termination with Ctrl-C:

Expensive timeout(9) function: 0xffffffff80c6a980(0xfffffe00093e6560) 0.417209440s

Comment 24 Eugene Grosbein freebsd_committer

2023-09-20 06:18:29 UTC

(In reply to Jason A. Harmening from comment #20)

The patch applied cleanly to stable/13. However, panics are irregular. Current uptime is over 7 days but it paniced more often earlier this month:

# | fgrep boot
boot time                                  Wed Sep 13 00:08
boot time                                  Tue Sep 12 00:09
boot time                                  Sun Sep 10 00:07
boot time                                  Sat Sep  9 00:08
boot time                                  Wed Sep  6 00:08
boot time                                  Tue Sep  5 00:09

So, no estimated time for next one.

Comment 25 Eugene Grosbein freebsd_committer

2023-09-20 09:52:48 UTC

(In reply to Eugene Grosbein from comment #23)

Also, noted virtual machine had Uptime: 1d2h59m35s when I decided to power it down with "shutdown -p now" and it printed devfs-related LOR to its console:

Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 done
All buffers synced.
lock order reversal:
 1st 0xfffff80003cb2810 ufs (ufs, lockmgr) @ /usr/src/sys/kern/vfs_mount.c:1806
 2nd 0xfffff800037db628 devfs (devfs, lockmgr) @ /usr/src/sys/kern/vfs_subr.c:3047
lock order devfs -> ufs established at:
#0 0xffffffff80c7d76d at witness_checkorder+0x32d
#1 0xffffffff80bdaaf2 at lockmgr_lock_flags+0x182
#2 0xffffffff80f5449c at ffs_lock+0x8c
#3 0xffffffff80d12364 at _vn_lock+0x54
#4 0xffffffff80cf34ef at vfs_domount_first+0x5af
#5 0xffffffff80cefe39 at vfs_domount+0x309
#6 0xffffffff80cee9a5 at vfs_donmount+0x895
#7 0xffffffff80cf24f3 at kernel_mount+0x63
#8 0xffffffff80cf61c4 at parse_mount+0x4d4
#9 0xffffffff80cf4803 at vfs_mountroot+0x7c3
#10 0xffffffff80b96ed3 at start_init+0x23
#11 0xffffffff80bc4fc0 at fork_exit+0x80
#12 0xffffffff810bc94e at fork_trampoline+0xe
lock order ufs -> devfs attempted at:
#0 0xffffffff80c7e03d at witness_checkorder+0xbfd
#1 0xffffffff80bdc525 at lockmgr_xlock+0x55
#2 0xffffffff80d12364 at _vn_lock+0x54
#3 0xffffffff80cfaa4f at vget_finish+0x4f
#4 0xffffffff80a9bfff at devfs_allocv+0xbf
#5 0xffffffff80a9b7c3 at devfs_root+0x43
#6 0xffffffff80cffa24 at vfs_cache_root_fallback+0x114
#7 0xffffffff80cfba9c at vflush+0x4c
#8 0xffffffff80a9b6ce at devfs_unmount+0x3e
#9 0xffffffff80cf09c1 at dounmount+0x391
#10 0xffffffff80cfd331 at vfs_unmountall+0xc1
#11 0xffffffff80cce9ec at bufshutdown+0x31c
#12 0xffffffff80c0c3a3 at kern_reboot+0x203
#13 0xffffffff80c0c140 at sys_reboot+0x390
#14 0xffffffff810e9405 at amd64_syscall+0x145
#15 0xffffffff810bc1eb at fast_syscall_common+0xf8
Uptime: 1d2h59m35s
acpi0: Powering system off

Comment 26 Eugene Grosbein freebsd_committer

2023-09-21 07:28:50 UTC

Yesterday I applied the patch from the comment #20 and rebooted with new kernel. And now I got another panic with one of new KASSERTs.

Unread portion of the kernel message buffer:
panic: devfs_free: cdp 0xfffff804490bf400 (tun0) still on active list
cpuid = 5
time = 1695243602
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0152d4d540
vpanic() at vpanic+0x152/frame 0xfffffe0152d4d590
panic() at panic+0x43/frame 0xfffffe0152d4d5f0
devfs_free() at devfs_free+0x92/frame 0xfffffe0152d4d610
devfs_reclaim_vchr() at devfs_reclaim_vchr+0xef/frame 0xfffffe0152d4d640
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x34/frame 0xfffffe0152d4d660
vgonel() at vgonel+0x3b4/frame 0xfffffe0152d4d6c0
vgone() at vgone+0x2f/frame 0xfffffe0152d4d6e0
devfs_delete() at devfs_delete+0x161/frame 0xfffffe0152d4d740
devfs_populate_loop() at devfs_populate_loop+0x665/frame 0xfffffe0152d4d7b0
devfs_populate() at devfs_populate+0x4a/frame 0xfffffe0152d4d7d0
devfs_populate_vp() at devfs_populate_vp+0x88/frame 0xfffffe0152d4d810
devfs_lookup() at devfs_lookup+0x2b/frame 0xfffffe0152d4d9e0
lookup() at lookup+0x491/frame 0xfffffe0152d4da80
namei() at namei+0x332/frame 0xfffffe0152d4db30
vn_open_cred() at vn_open_cred+0x4a0/frame 0xfffffe0152d4dca0
kern_openat() at kern_openat+0x298/frame 0xfffffe0152d4de00
amd64_syscall() at amd64_syscall+0x143/frame 0xfffffe0152d4df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0152d4df30
--- syscall (499, FreeBSD ELF64, openat), rip = 0x2dbfc61e004a, rsp = 0x2dbfc3197398, rbp = 0x2dbfc3197480 ---
Uptime: 4h57m21s
Dumping 2437 out of 32715 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Comment 27 Eugene Grosbein freebsd_committer

2023-09-21 07:29:59 UTC

(kgdb) bt
#0  __curthread () at /data/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=textdump@entry=1) at /data/src/sys/kern/kern_shutdown.c:396
#2  0xffffffff80c0dd43 in kern_reboot (howto=260) at /data/src/sys/kern/kern_shutdown.c:484
#3  0xffffffff80c0e1af in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe0152d4d5d0)
    at /data/src/sys/kern/kern_shutdown.c:923
#4  0xffffffff80c0df33 in panic (fmt=<unavailable>) at /data/src/sys/kern/kern_shutdown.c:847
#5  0xffffffff80a994d2 in devfs_free (cdev=0xfffff804490bf400)
    at /data/src/sys/fs/devfs/devfs_devs.c:180
#6  0xffffffff80b9dc2d in dev_rel (dev=<unavailable>, dev@entry=0xfffff804490bf400)
    at /data/src/sys/kern/kern_conf.c:178
#7  0xffffffff80a9ffdf in devfs_reclaim_vchr (ap=<optimized out>)
    at /data/src/sys/fs/devfs/devfs_vnops.c:1574
#8  0xffffffff811eb084 in VOP_RECLAIM_APV (vop=0xffffffff81ab08a8 <devfs_specops>,
    a=a@entry=0xfffffe0152d4d670) at vnode_if.c:2180
#9  0xffffffff80cfceb4 in VOP_RECLAIM (vp=0xfffff80464929000) at ./vnode_if.h:1087
#10 vgonel (vp=vp@entry=0xfffff80464929000) at /data/src/sys/kern/vfs_subr.c:4150
#11 0xffffffff80cfd50f in vgone (vp=vp@entry=0xfffff80464929000)
    at /data/src/sys/kern/vfs_subr.c:3969
#12 0xffffffff80a99c81 in devfs_delete (dm=dm@entry=0xfffff80003f26d00,
    de=de@entry=0xfffff80440133a00, flags=flags@entry=0)
    at /data/src/sys/fs/devfs/devfs_devs.c:411
#13 0xffffffff80a9a7a5 in devfs_populate_loop (dm=dm@entry=0xfffff80003f26d00,
    cleanup=cleanup@entry=0) at /data/src/sys/fs/devfs/devfs_devs.c:551
#14 0xffffffff80a9a12a in devfs_populate (dm=dm@entry=0xfffff80003f26d00)
    at /data/src/sys/fs/devfs/devfs_devs.c:684
#15 0xffffffff80a9f3a8 in devfs_populate_vp (vp=0xfffff80003fcb5b8)
    at /data/src/sys/fs/devfs/devfs_vnops.c:359
#16 0xffffffff80a9d67b in devfs_lookup (ap=0xfffffe0152d4da30)
    at /data/src/sys/fs/devfs/devfs_vnops.c:1187
#17 0xffffffff80cecbb1 in VOP_LOOKUP (dvp=0xfffff80003fcb5b8, vpp=0xfffffe0152d4dd10,
    cnp=0xfffffe0152d4dd38) at ./vnode_if.h:69
#18 lookup (ndp=ndp@entry=0xfffffe0152d4dcb8) at /data/src/sys/kern/vfs_lookup.c:1092
#19 0xffffffff80cebba2 in namei (ndp=ndp@entry=0xfffffe0152d4dcb8)
    at /data/src/sys/kern/vfs_lookup.c:617
#20 0xffffffff80d11f90 in vn_open_cred (ndp=ndp@entry=0xfffffe0152d4dcb8,
    flagp=flagp@entry=0xfffffe0152d4ddd4, cmode=cmode@entry=0,
    vn_open_flags=vn_open_flags@entry=16, cred=0xfffff8044037dc00, fp=0xfffff804646c30a0)
    at /data/src/sys/kern/vfs_vnops.c:328
#21 0xffffffff80d08c58 in kern_openat (td=0xfffffe0037918c80,
    td@entry=<error reading variable: value is not available>, fd=-100,
    fd@entry=<error reading variable: value is not available>,
    path=0x2dbfc31974a0 <error: Cannot access memory at address 0x2dbfc31974a0>,
    path@entry=<error reading variable: value is not available>, pathseg=UIO_USERSPACE,
    pathseg@entry=<error reading variable: value is not available>, flags=6,
    flags@entry=<error reading variable: value is not available>, mode=<unavailable>,
    mode@entry=<error reading variable: value is not available>)
    at /data/src/sys/kern/vfs_syscalls.c:1158
#22 0xffffffff81117283 in syscallenter (td=<optimized out>)
    at /data/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
#23 amd64_syscall (td=0xfffffe0037918c80, traced=0) at /data/src/sys/amd64/amd64/trap.c:1183
#24 <signal handler called>
#25 0x00002dbfc61e004a in ?? ()
Backtrace stopped: Cannot access memory at address 0x2dbfc3197398

Comment 28 Eugene Grosbein freebsd_committer

2023-09-21 07:31:43 UTC

(kgdb) frame 21
#21 0xffffffff80d08c58 in kern_openat (td=0xfffffe0037918c80,
    td@entry=<error reading variable: value is not available>, fd=-100,
    fd@entry=<error reading variable: value is not available>,
    path=0x2dbfc31974a0 <error: Cannot access memory at address 0x2dbfc31974a0>,
    path@entry=<error reading variable: value is not available>, pathseg=UIO_USERSPACE,
    pathseg@entry=<error reading variable: value is not available>, flags=6,
    flags@entry=<error reading variable: value is not available>, mode=<unavailable>,
    mode@entry=<error reading variable: value is not available>)
    at /data/src/sys/kern/vfs_syscalls.c:1158
(kgdb) p nd
$1 = {ni_dirp = 0x2dbfc31974a0 <error: Cannot access memory at address 0x2dbfc31974a0>,
  ni_segflg = UIO_USERSPACE, ni_rightsneeded = 0xfffffe0152d4ddb0, ni_startdir = 0x0,
  ni_rootdir = 0xfffff80443009988, ni_topdir = 0x0, ni_dirfd = -100, ni_lcf = 0, ni_filecaps = {
    fc_rights = {cr_rights = {0, 0}}, fc_ioctls = 0x0, fc_nioctls = -1, fc_fcntls = 0},
  ni_vp = 0x0, ni_dvp = 0xfffff80003fcb5b8, ni_resflags = 1, ni_debugflags = 3, ni_loopcnt = 0,
  ni_pathlen = 1, ni_next = 0xfffff8068a66b00c "", ni_cnd = {cn_origflags = 8683588,
    cn_flags = 344227908, cn_thread = 0xfffffe0037918c80, cn_cred = 0xfffff8044037dc00,
    cn_nameiop = LOOKUP, cn_lkflags = 532480, cn_pnbuf = 0xfffff8068a66b000 "/dev/console",
    cn_nameptr = 0xfffff8068a66b005 "console", cn_namelen = 7}, ni_cap_tracker = {
    tqh_first = 0x0, tqh_last = 0xfffffe0152d4dd78}, ni_dvp_seqc = 932285568,
  ni_vp_seqc = 4294966784}

Comment 29 Eugene Grosbein freebsd_committer

2023-09-21 07:33:44 UTC

(In reply to Eugene Grosbein from comment #26)

tun0 is open and active network interface used by site-to-site OpenVPN tunnel.

Comment 30 Eugene Grosbein freebsd_committer

2023-09-21 07:48:42 UTC

/etc/newsyslog.conf contains a line:

/var/log/openvpn.log 640  30    *    @T00  J         /var/run/openvpn.pid

So, newsyslogd sends SIGHUP to the openvpn daemon at midnight. Its logs before panic are lost, but I have its logs for previous midnight when there was no panic:

Sep 20 00:00:02 slh openvpn[2314]: SIGHUP[hard,] received, process restarting
Sep 20 00:00:02 slh openvpn[2314]: OpenVPN 2.6.5 amd64-portbld-freebsd13.2 [SSL (OpenSSL)] [LZO] [LZ4] [MH/RECVDA] [AEAD]
Sep 20 00:00:02 slh openvpn[2314]: library versions: OpenSSL 1.1.1u-freebsd  30 May 2023, LZO 2.10
Sep 20 00:00:02 slh openvpn[2314]: Restart pause, 1 second(s)
Sep 20 00:00:03 slh openvpn[2314]: Diffie-Hellman initialized with 2048 bit key
...
Sep 20 00:00:03 slh openvpn[2314]: TUN/TAP device /dev/tun0 opened

Crashdump information points to some kind of entanglement between openvpn/tun0 and syslogd/console but it is beyond my comprehension.

Comment 31 Konstantin Belousov freebsd_committer

2023-09-21 07:49:27 UTC

(In reply to Eugene Grosbein from comment #29)
So print out cdp 0xfffff804490bf400

Comment 32 Eugene Grosbein freebsd_committer

2023-09-21 08:41:53 UTC

(In reply to Konstantin Belousov from comment #31)

(kgdb) frame 5
#5  0xffffffff80a994d2 in devfs_free (cdev=0xfffff804490bf400)
    at /data/src/sys/fs/devfs/devfs_devs.c:180
180            KASSERT((cdp->cdp_flags & (CDP_ACTIVE | CDP_ON_ACTIVE_LIST)) == 0,
(kgdb) l
175     devfs_free(struct cdev *cdev)
176     {
177             struct cdev_priv *cdp;
178
179             cdp = cdev2priv(cdev);
180            KASSERT((cdp->cdp_flags & (CDP_ACTIVE | CDP_ON_ACTIVE_LIST)) == 0,
181                ("%s: cdp %p (%s) still on active list",
182                __func__, cdp, cdev->si_name));
183             if (cdev->si_cred != NULL)
184                     crfree(cdev->si_cred);
(kgdb) p cdp
$2 = (struct cdev_priv *) 0xfffff804490bf400
(kgdb) p *cdp
$3 = {cdp_c = {si_spare0 = 0x0, si_flags = 0, si_atime = {tv_sec = 1695243602, tv_nsec = 0},
    si_ctime = {tv_sec = 1695243602, tv_nsec = 0}, si_mtime = {tv_sec = 1695243602, tv_nsec = 0},
    si_uid = 66, si_gid = 68, si_mode = 384, si_cred = 0xfffff8044037dc00, si_drv0 = 0,
    si_refcount = 0, si_list = {le_next = 0xfffff80029050000,
      le_prev = 0xffffffff81ac7860 <dead_cdevsw+160>}, si_clone = {le_next = 0x0,
      le_prev = 0xfffff80440bd0a30}, si_children = {lh_first = 0x0}, si_siblings = {
      le_next = 0x0, le_prev = 0x0}, si_parent = 0x0, si_mountpt = 0x0, si_drv1 = 0x0,
    si_drv2 = 0x0, si_devsw = 0x0, si_iosize_max = 65536, si_usecount = 0, si_threadcount = 0,
    __si_u = {__sid_snapdata = 0x0}, si_name = "tun0", '\000' <repeats 251 times>}, cdp_list = {
    tqe_next = 0xfffff801ef2fb000, tqe_prev = 0xfffff80017c845e0}, cdp_inode = 203,
  cdp_flags = 8, cdp_inuse = 1, cdp_maxdirent = 0, cdp_dirents = 0xfffff804490bf608,
  cdp_dirent0 = 0x0, cdp_dtr_list = {tqe_next = 0x0, tqe_prev = 0x0}, cdp_dtr_cb = 0x0,
  cdp_dtr_cb_arg = 0x0, cdp_fdpriv = {lh_first = 0x0}, cdp_threadlock = {lock_object = {
      lo_name = 0xffffffff81321590 "devthrd", lo_flags = 16973824, lo_data = 0,
      lo_witness = 0xfffff8083fd73680}, mtx_lock = 0}}

Comment 33 Konstantin Belousov freebsd_committer

2023-09-21 10:53:05 UTC

Yes, this is the usual problem with dev_clone handlers.

commit f041428f9d8aea9fe5a33c5c91ac31074d652672
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Thu Sep 21 13:47:14 2023 +0300

    tun/tap: correct ref count on cloned cdevs
    
    PR:     273418

diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c
index f4dbf685fba6..a01bc11aa64c 100644
--- a/sys/net/if_tuntap.c
+++ b/sys/net/if_tuntap.c
@@ -548,6 +548,7 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len,
 	if (i != 0)
 		i = tun_create_device(drv, unit, NULL, &dev, name);
 	if (i == 0) {
+		dev_ref(dev);
 		tuncreate(dev);
 		struct tuntap_softc *tp = dev->si_drv1;
 		*ifpp = tp->tun_ifp;
@@ -611,8 +612,10 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen,
 
 		i = tun_create_device(drv, u, cred, dev, name);
 	}
-	if (i == 0)
+	if (i == 0) {
+		dev_ref(*dev);
 		if_clone_create(name, namelen, NULL);
+	}
 out:
 	CURVNET_RESTORE();
 }

Comment 34 Eugene Grosbein freebsd_committer

2023-09-21 11:03:52 UTC

(In reply to Konstantin Belousov from comment #33)

Should I try it with stable/13?

Comment 35 Konstantin Belousov freebsd_committer

2023-09-21 11:36:00 UTC

(In reply to Eugene Grosbein from comment #34)
Code seems to be same.

Comment 36 Eugene Grosbein freebsd_committer

2023-09-21 19:44:32 UTC

(In reply to Konstantin Belousov from comment #35)

It is not, it lacks KPI change present in main:

https://cgit.freebsd.org/src/commit/sys/net/if_tuntap.c?id=91ebcbe02a48ebd40edb49283b90f38d246da15a

Comment 37 Konstantin Belousov freebsd_committer

2023-09-22 03:37:05 UTC

devfs consumer is same. There was a trivial conflict. Below is the stable/13'
equivalent.

commit e84128814bcdc854c1670447ba1525711736d7f9
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Thu Sep 21 13:47:14 2023 +0300

    tun/tap: correct ref count on cloned cdevs
    
    PR:     273418
    
    (cherry picked from commit 07fda41d2a2df0d0d422f30a8a2353de5f0d9c3e)

diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c
index b81c2149a62f..bad09a3148b9 100644
--- a/sys/net/if_tuntap.c
+++ b/sys/net/if_tuntap.c
@@ -544,8 +544,10 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params)
 	/* No preexisting struct cdev *, create one */
 	if (i != 0)
 		i = tun_create_device(drv, unit, NULL, &dev, name);
-	if (i == 0)
+	if (i == 0) {
+		dev_ref(dev);
 		tuncreate(dev);
+	}
 
 	return (i);
 }
@@ -605,8 +607,10 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen,
 
 		i = tun_create_device(drv, u, cred, dev, name);
 	}
-	if (i == 0)
+	if (i == 0) {
+		dev_ref(*dev);
 		if_clone_create(name, namelen, NULL);
+	}
 out:
 	CURVNET_RESTORE();
 }

Comment 38 Konstantin Belousov freebsd_committer

2023-09-22 04:52:38 UTC

Or better, this one.  Previous patch would leak refcount on non-cloned cdevs.

commit 0da5656dd70cd5a41c7fb3f4f1c5b8d7f72b9801
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Thu Sep 21 13:47:14 2023 +0300

    tun/tap: correct ref count on cloned cdevs
    
    PR:     273418
    
    (cherry picked from commit 07fda41d2a2df0d0d422f30a8a2353de5f0d9c3e)

diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c
index b81c2149a62f..723749c0dca9 100644
--- a/sys/net/if_tuntap.c
+++ b/sys/net/if_tuntap.c
@@ -541,6 +541,8 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params)
 	/* find any existing device, or allocate new unit number */
 	dev = NULL;
 	i = clone_create(&drv->clones, &drv->cdevsw, &unit, &dev, 0);
+	if (i == 0)
+		dev_ref(dev);
 	/* No preexisting struct cdev *, create one */
 	if (i != 0)
 		i = tun_create_device(drv, unit, NULL, &dev, name);
@@ -596,6 +598,8 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen,
 
 	/* find any existing device, or allocate new unit number */
 	i = clone_create(&drv->clones, &drv->cdevsw, &u, dev, 0);
+	if (i == 0) {
+		dev_ref(*dev);
 	if (i) {
 		if (append_unit) {
 			namelen = snprintf(devname, sizeof(devname), "%s%d",

Comment 39 Eugene Grosbein freebsd_committer

2023-09-22 05:45:34 UTC

(In reply to Konstantin Belousov from comment #38)

Last chunk either lacks closing } with some code, or has extra opening {

Comment 40 Konstantin Belousov freebsd_committer

2023-09-22 05:46:58 UTC

commit 508e5f7d8d852b0f24cf0200d4486997d5d09409
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Thu Sep 21 13:47:14 2023 +0300

    tun/tap: correct ref count on cloned cdevs
    
    PR:     273418
    
    (cherry picked from commit 07fda41d2a2df0d0d422f30a8a2353de5f0d9c3e)

diff --git a/sys/net/if_tuntap.c b/sys/net/if_tuntap.c
index b81c2149a62f..756aa04482cb 100644
--- a/sys/net/if_tuntap.c
+++ b/sys/net/if_tuntap.c
@@ -541,6 +541,8 @@ tun_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params)
 	/* find any existing device, or allocate new unit number */
 	dev = NULL;
 	i = clone_create(&drv->clones, &drv->cdevsw, &unit, &dev, 0);
+	if (i == 0)
+		dev_ref(dev);
 	/* No preexisting struct cdev *, create one */
 	if (i != 0)
 		i = tun_create_device(drv, unit, NULL, &dev, name);
@@ -596,6 +598,8 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen,
 
 	/* find any existing device, or allocate new unit number */
 	i = clone_create(&drv->clones, &drv->cdevsw, &u, dev, 0);
+	if (i == 0)
+		dev_ref(*dev);
 	if (i) {
 		if (append_unit) {
 			namelen = snprintf(devname, sizeof(devname), "%s%d",

Comment 41 Eugene Grosbein freebsd_committer

2023-09-22 06:55:25 UTC

(In reply to Konstantin Belousov from comment #40)

Thank you very much. I built new kernel with this patch to tuntap in addition to previous one from comment #20. I will boot new kernel at the end of day and will wait.

Comment 42 Eugene Grosbein freebsd_committer

2023-09-28 09:27:39 UTC

(In reply to Konstantin Belousov from comment #40)

A week has passed running kernel with this your patch. So far, so good. No panics.

Comment 43 Eugene Grosbein freebsd_committer

2023-10-09 11:10:39 UTC

Uptime of the system in question is 16 days already.

Comment 44 commit-hook freebsd_committer

2023-10-09 23:38:30 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=27f1ec0be24b45559793e486a4fa5a2e7fdadc17

commit 27f1ec0be24b45559793e486a4fa5a2e7fdadc17
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-09-21 10:47:14 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-10-09 23:36:59 +0000

    tun/tap: correct ref count on cloned cdevs

    Reported and tested by: eugen
    PR:     273418
    Discussed with: jah, kevans
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D42008

 sys/net/if_tuntap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comment 45 commit-hook freebsd_committer

2023-10-16 07:17:29 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5230afce76a2de387cc162384ffef4cebf576e97

commit 5230afce76a2de387cc162384ffef4cebf576e97
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-09-21 10:47:14 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-10-16 07:16:01 +0000

    tun/tap: correct ref count on cloned cdevs

    PR:     273418

    (cherry picked from commit 27f1ec0be24b45559793e486a4fa5a2e7fdadc17)

 sys/net/if_tuntap.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comment 46 commit-hook freebsd_committer

2023-10-16 07:18:31 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e900c81ede851f52ab50b541b9a6ef5fbc9bb919

commit e900c81ede851f52ab50b541b9a6ef5fbc9bb919
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-09-21 10:47:14 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-10-16 07:17:23 +0000

    tun/tap: correct ref count on cloned cdevs

    PR:     273418

    (cherry picked from commit 27f1ec0be24b45559793e486a4fa5a2e7fdadc17)

 sys/net/if_tuntap.c | 4 ++++
 1 file changed, 4 insertions(+)

Comment 47 Eugene Grosbein freebsd_committer

2023-10-22 19:57:21 UTC

Uptime is 30 days. Closing this.

Comment 48 Peter Much 2024-02-08 00:08:25 UTC

Sorry to interrupt, and sorry for the bad news, but
this fix has a side-effect: PR 276862

In short; 
it somehow mangles the vimage where openvpn is run, so it doesn't give
back the vnet.interface to the host on jail termination.

Sorry again, but You're in company - I already spent two nights hunting this down.