283312 – Kernel crash in sched_switch

Bug 283312 - Kernel crash in sched_switch

Summary: Kernel crash in sched_switch

Status:	New

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	14.2-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Some People
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:	crash, regression

Depends on:
Blocks:

Reported:	2024-12-13 16:14 UTC by Alexey Vyskubov
Modified:	2024-12-14 10:01 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alexey Vyskubov 2024-12-13 16:14:47 UTC

I have no idea how to reproduce this, but since upgrading to 14.2-RELEASE, I have been getting crashes occasionally. The same computer was running before 13.2-RELEASE, 14.0-RELEASE, 14.1-RELEASE, and 14-STABLE, and I saw no such crashes.

What I see in dmesg output is:

Fatal trap 12: page fault while in user mode
cpuid = 6; apic id = 06
fault virtual address   = 0x542350
fault code              = user read instruction, reserved bits in PTE
instruction pointer     = 0x43:0x542350
stack pointer           = 0x3b:0x8219fdae8
frame pointer           = 0x3b:0x8219fdb90
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 3, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 96419 (pkg-static)
rdi: 00000008219fe5b0 rsi: 00000008219fdc78 rdx: 00000008219fdc90
rcx: 0000000000000041  r8: 0000000000000081  r9: 0000000000000000
rax: 00002fae2698c588 rbx: 00000008219fdbe8 rbp: 00000008219fdb90
r10: 00000008219fdc78 r11: 000000000000004e r12: 0000000000000103
r13: 00000008219fdd18 r14: 00000008219fdc78 r15: 00000008219fdc90
trap number             = 12
panic: page fault
cpuid = 6
time = 1734089856
KDB: stack backtrace:
#0 0xffffffff80b8b89d at kdb_backtrace+0x5d
#1 0xffffffff80b3dc01 at vpanic+0x131
#2 0xffffffff80b3dac3 at panic+0x43
#3 0xffffffff81025a0b at trap_fatal+0x40b
#4 0xffffffff81025a56 at trap_pfault+0x46
#5 0xffffffff810251fb at trap+0x4ab
#6 0xffffffff80ffc398 at calltrap+0x8
Uptime: 9d17h46m22s

Here is what kgdb says:

sched_switch (td=td@entry=0xffffffff81b6eb20 <thread0_st>,
    flags=flags@entry=259) at /usr/src/sys/kern/sched_ule.c:2290
2290                    cpuid = td->td_oncpu = PCPU_GET(cpuid);
(kgdb) backtrace
#0  sched_switch (td=td@entry=0xffffffff81b6eb20 <thread0_st>,
    flags=flags@entry=259) at /usr/src/sys/kern/sched_ule.c:2290
#1  0xffffffff80b4adeb in mi_switch (flags=flags@entry=259)
    at /usr/src/sys/kern/kern_synch.c:548
#2  0xffffffff80b9b320 in sleepq_switch (
    wchan=wchan@entry=0xffffffff81b6e5b8 <proc0>, pri=pri@entry=52)
    at /usr/src/sys/kern/subr_sleepqueue.c:607
#3  0xffffffff80b9b91f in sleepq_timedwait (
    wchan=wchan@entry=0xffffffff81b6e5b8 <proc0>, pri=52)
    at /usr/src/sys/kern/subr_sleepqueue.c:689
#4  0xffffffff80b4a548 in _sleep (ident=0xffffffff81b6e5b8 <proc0>,
    lock=lock@entry=0x0, priority=priority@entry=52,
    wmesg=<optimized out>, sbt=42949670000, pr=pr@entry=0, flags=256)
    at /usr/src/sys/kern/kern_synch.c:219
#5  0xffffffff80ee3779 in swapper () at /usr/src/sys/vm/vm_swapout.c:753
#6  0xffffffff8037d023 in btext () at /usr/src/sys/amd64/amd64/locore.S:88

Let me know if I can help debug it somehow.

P.S. Intel Core i3-12100F; 64 Gb RAM.

Comment 1 Ed Maste freebsd_committer

2024-12-13 18:27:29 UTC

Can you provide any insight on the workload running when the crash happened (and if there's any commonality)?

Comment 2 Alexey Vyskubov 2024-12-14 10:00:35 UTC

I did not notice anything in common between crashes. What was running at this time was portmaster and a number of iocage jails (they are running continuously) with Angie, Nginx, and Lighttpd. There was also a Bhyve VM with 15-CURRENT (also running continuously). All in all the system was not very loaded. Two network interfaces (internal and external) but no routing. No IPv6. OpenVPN (only FIB 1 goes through it). Tailscale. PF. *Maybe* the crashes appeared when portmaster was running (portmaster -God), but I am not sure at all.

It may be related to memory that the system is running on ZFS (it always was). Also, just in case it has 64 Gb of swap on the SSD, it is normally empty. I would be looking for problems earlier, if any noticeable part of it was taken.

❯ freecolor -om
             total       used       free     shared    buffers     cached
Mem:         63567      52477      11089          0          0          0
Swap:        65536          0      65536

Comment 3 Alexey Vyskubov 2024-12-14 10:01:40 UTC

Sorry, portmaster -Gad, typo.