I have found a way to crash an entire iocage host from within an unprivileged process running within a jail. Here is a backtrace from the serial console: [SOL Session operational. Use ~? for help] FreeBSD/amd64 (node3) (ttyu0) login: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80bcc48b stack pointer = 0x28:0xfffffe014514cb10 frame pointer = 0x28:0xfffffe014514cb30 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 90474 (lua54) trap number = 12 panic: page fault cpuid = 1 time = 1673200821 KDB: stack backtrace: #0 0xffffffff80c694a5 at kdb_backtrace+0x65 #1 0xffffffff80c1bb5f at vpanic+0x17f #2 0xffffffff80c1b9d3 at panic+0x43 #3 0xffffffff810afdf5 at trap_fatal+0x385 #4 0xffffffff810afe4f at trap_pfault+0x4f #5 0xffffffff810875b8 at calltrap+0x8 #6 0xffffffff80bcbe37 at kqueue_drain+0x257 #7 0xffffffff80bcd222 at kqueue_close+0x42 #8 0xffffffff80bbdb91 at _fdrop+0x11 #9 0xffffffff80bc11eb at closef+0x24b #10 0xffffffff80bc0aac at fdescfree_fds+0xdc #11 0xffffffff80bc0553 at fdescfree+0x3a3 #12 0xffffffff80bd22f7 at exit1+0x4c7 #13 0xffffffff80bd1e2d at sys_sys_exit+0xd #14 0xffffffff810b06ec at amd64_syscall+0x10c #15 0xffffffff81087ecb at fast_syscall_common+0xf8 Uptime: 10m59s System details: FreeBSD 13.1-RELEASE-p3 GENERIC amd64 hw.model: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz hw.physmem: 34203000832 # cat /var/run/dmesg.boot Copyright (c) 1992-2021 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.1-RELEASE-p3 GENERIC amd64 FreeBSD clang version 13.0.0 (git@github.com:llvm/llvm-project.git llvmorg-13.0.0-0-gd7b669b3a303) VT(efifb): resolution 800x600 CPU: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (2100.11-MHz K8-class CPU) Origin="GenuineIntel" Id=0x50663 Family=0x6 Model=0x56 Stepping=3 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x121<LAHF,ABM,Prefetch> Structured Extended Features=0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE> Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, performance statistics real memory = 34358689792 (32767 MB) avail memory = 33214754816 (31676 MB) CPU microcode: updated from 0x7000012 to 0x700001c Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 8 core(s) x 2 hardware threads random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" random: unblocking device. ioapic0 <Version 2.0> irqs 0-23 ioapic1 <Version 2.0> irqs 24-47 Launching APs: 1 14 11 8 6 9 3 5 12 13 15 7 4 2 10 random: entropy device external interface kbd1 at kbdmux0 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s smbios0: <System Management BIOS> at iomem 0xf05e0-0xf05fe smbios0: Version: 3.0, BCD Revision: 3.0 aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS> acpi0: <ALASKA A M I > acpi0: Power Button (fixed) cpu0: <ACPI CPU> numa-domain 0 on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71,0x74-0x77 irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 350 Event timer "HPET1" frequency 14318180 Hz quality 340 Event timer "HPET2" frequency 14318180 Hz quality 340 Event timer "HPET3" frequency 14318180 Hz quality 340 Event timer "HPET4" frequency 14318180 Hz quality 340 Event timer "HPET5" frequency 14318180 Hz quality 340 Event timer "HPET6" frequency 14318180 Hz quality 340 Event timer "HPET7" frequency 14318180 Hz quality 340 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: <ACPI Host-PCI bridge> on acpi0 pci0: <ACPI PCI bus> on pcib0 pci0: <dasp, performance counters> at device 11.1 (no driver attached) pci0: <dasp, performance counters> at device 11.2 (no driver attached) pci0: <dasp, performance counters> at device 16.1 (no driver attached) pci0: <dasp, performance counters> at device 16.6 (no driver attached) pci0: <dasp, performance counters> at device 18.1 (no driver attached) acpi_syscontainer0: <System Container> on acpi0 apei0: <ACPI Platform Error Interface> on acpi0 pcib1: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0 pci1: <ACPI PCI bus> numa-domain 0 on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 26 at device 1.0 numa-domain 0 on pci1 pci2: <ACPI PCI bus> numa-domain 0 on pcib2 nvme0: <Generic NVMe Device> mem 0xfb400000-0xfb403fff irq 26 at device 0.0 numa-domain 0 on pci2 pcib3: <ACPI PCI-PCI bridge> irq 32 at device 2.0 numa-domain 0 on pci1 pci3: <ACPI PCI bus> numa-domain 0 on pcib3 pcib4: <ACPI PCI-PCI bridge> irq 40 at device 3.0 numa-domain 0 on pci1 pci4: <ACPI PCI bus> numa-domain 0 on pcib4 xhci0: <Intel Lynx Point USB 3.0 controller> mem 0xfb500000-0xfb50ffff irq 19 at device 20.0 numa-domain 0 on pci1 xhci0: 32 bytes context size, 64-bit DMA usbus0: waiting for BIOS to give up control xhci0: Port routing mask set to 0xffffffff usbus0 numa-domain 0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci1: <simple comms> at device 22.0 (no driver attached) pci1: <simple comms> at device 22.1 (no driver attached) pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.0 numa-domain 0 on pci1 pci5: <ACPI PCI bus> numa-domain 0 on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 0.0 numa-domain 0 on pci5 pci6: <ACPI PCI bus> numa-domain 0 on pcib6 vgapci0: <VGA-compatible display> port 0xe000-0xe07f mem 0xfa000000-0xfaffffff,0xfb000000-0xfb01ffff irq 16 at device 0.0 numa-domain 0 on pci6 vgapci0: Boot video device pcib7: <ACPI PCI-PCI bridge> irq 18 at device 28.2 numa-domain 0 on pci1 pci7: <ACPI PCI bus> numa-domain 0 on pcib7 igb0: <Intel(R) I210 (Copper)> port 0xd000-0xd01f mem 0xfb200000-0xfb27ffff,0xfb280000-0xfb283fff irq 18 at device 0.0 numa-domain 0 on pci7 igb0: PHY reset is blocked due to SOL/IDER session. igb0: EEPROM V3.16-0 eTrack 0x800004d6 igb0: Using 1024 TX descriptors and 1024 RX descriptors igb0: Using 4 RX queues 4 TX queues igb0: Using MSI-X interrupts with 5 vectors igb0: Ethernet address: d0:50:99:c1:3f:9b igb0: link state changed to UP igb0: netmap queues/slots: TX 4/1024, RX 4/1024 pcib8: <ACPI PCI-PCI bridge> irq 19 at device 28.3 numa-domain 0 on pci1 pci8: <ACPI PCI bus> numa-domain 0 on pcib8 igb1: <Intel(R) I210 (Copper)> port 0xc000-0xc01f mem 0xfb100000-0xfb17ffff,0xfb180000-0xfb183fff irq 19 at device 0.0 numa-domain 0 on pci8 igb1: EEPROM V3.16-0 eTrack 0x800004d6 igb1: Using 1024 TX descriptors and 1024 RX descriptors igb1: Using 4 RX queues 4 TX queues igb1: Using MSI-X interrupts with 5 vectors igb1: Ethernet address: d0:50:99:c1:3f:9c igb1: netmap queues/slots: TX 4/1024, RX 4/1024 ehci0: <Intel Lynx Point USB 2.0 controller USB-A> mem 0xfb513000-0xfb5133ff irq 18 at device 29.0 numa-domain 0 on pci1 usbus1: EHCI version 1.0 usbus1 numa-domain 0 on ehci0 usbus1: 480Mbps High Speed USB v2.0 isab0: <PCI-ISA bridge> at device 31.0 numa-domain 0 on pci1 isa0: <ISA bus> numa-domain 0 on isab0 ahci0: <Intel Lynx Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xfb512000-0xfb5127ff irq 16 at device 31.2 numa-domain 0 on pci1 ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ahciem0: <AHCI enclosure management bridge> on ahci0 acpi_button0: <Power Button> on acpi0 uart0: <16550 or compatible> port 0x2f8-0x2ff irq 3 flags 0x10 on acpi0 uart0: console (115200,n,8,1) orm0: <ISA Option ROM> at iomem 0xc0000-0xc7fff pnpid ORM0000 on isa0 est0: <Enhanced SpeedStep Frequency Control> numa-domain 0 on cpu0 Timecounter "TSC" frequency 2099998029 Hz quality 1000 Timecounters tick every 1.000 msec ZFS filesystem version: 5 ZFS storage pool version: features support (5000) ugen1.1: <Intel EHCI root HUB> at usbus1 ugen0.1: <Intel XHCI root HUB> at usbus0 uhub0 numa-domain 0 on usbus1 uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 uhub1 numa-domain 0 on usbus0 uhub1: <Intel XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 nvd0: <INTEL SSDPEKKW256G7> NVMe namespace nvd0: 244198MB (500118192 512 byte sectors) ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <INTEL SSDSC2BB480G7 N2010101> ACS-3 ATA SATA 3.x device ada0: Serial Number PHDV637500CA480BGN ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada0: Command Queueing enabled ada0: 457862MB (937703088 512 byte sectors) ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: <INTEL SSDSC2BB480G7 N2010101> ACS-3 ATA SATA 3.x device ada1: Serial Number PHDV63750058480BGN ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada1: Command Queueing enabled ada1: 457862MB (937703088 512 byte sectors) ses0 at ahciem0 bus 0 scbus6 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device GEOM_ELI: Device ada0p3.eli created. ses0: ada0,pass0 in 'Slot 00', SATA Slot: scbus0 target 0 GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: accelerated software ses0: ada1,pass1 in 'Slot 01', SATA Slot: scbus1 target 0 GEOM_MIRROR: Device mirror/swap launched (2/2). GEOM_ELI: Device ada1p3.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: accelerated software Trying to mount root from zfs:zroot/ROOT/default []... uhub0: 2 ports with 2 removable, self powered uhub1: 21 ports with 21 removable, self powered ugen1.2: <vendor 0x8087 product 0x8000> at usbus1 uhub2 numa-domain 0 on uhub0 uhub2: <vendor 0x8087 product 0x8000, class 9/0, rev 2.00/0.05, addr 2> on usbus1 uhub2: 4 ports with 4 removable, self powered Dual Console: Video Primary, Serial Secondary GEOM_ELI: Device mirror/swap.eli created. GEOM_ELI: Encryption: AES-XTS 128 GEOM_ELI: Crypto: accelerated software ichsmb0: <Intel Lynx Point SMBus controller> port 0xf000-0xf01f mem 0xfb511000-0xfb5110ff irq 18 at device 31.3 numa-domain 0 on pci1 smbus0: <System Management Bus> numa-domain 0 on ichsmb0 pchtherm0: <Haswell Thermal Subsystem> irq 18 at device 31.6 numa-domain 0 on pci1 ioat0: <BDXDE IOAT Ch0> mem 0xfb306000-0xfb307fff irq 32 at device 0.0 numa-domain 0 on pci3 ioat0: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> ioat1: <BDXDE IOAT Ch1> mem 0xfb304000-0xfb305fff irq 36 at device 0.1 numa-domain 0 on pci3 ioat1: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> ioat2: <BDXDE IOAT Ch2> mem 0xfb302000-0xfb303fff irq 37 at device 0.2 numa-domain 0 on pci3 ioat2: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> ioat3: <BDXDE IOAT Ch3> mem 0xfb300000-0xfb301fff irq 38 at device 0.3 numa-domain 0 on pci3 ioat3: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> acpi_wmi0: <ACPI-WMI mapping> on acpi0 acpi_wmi0: cannot find EC device acpi_wmi1: <ACPI-WMI mapping> on acpi0 acpi_wmi1: cannot find EC device lo0: link state changed to UP lagg0: link state changed to UP igb1: link state changed to UP lagg0.4: changing name to 'jail' It seems like this could be a pretty serious denial of service or unassisted jailbreak vulnerability. Note that Lua is not running as root when the kpanic occurs. The Lua process drops privileges after some initialization and port binding. A privileged user is not the only way to stop the Lua process causing the kpanic, it is quite possible to stop the process by other means such as overloading the garbage collector. steps to reproduce: 1. setup iocage 2. setup new jail in iocage 3. pkg install prosody 4. configure prosody to use libevent kqueue connection handling instead of SELECT (https://prosody.im/doc/network_backend) 5. # service start prosody 6. # service stop prosody system will now kpanic.
Thanks for the report. I tried to reproduce this in a 13.1-RELEASE VM but failed. I didn't use a jail since I'd expect this bug to be independent of that. I'm just using the default prosody config plus network_backend = "event", and I can see that the prosody process is using kevent(), whereas without that option it uses select(). > 5. # service start prosody > 6. # service stop prosody Presumably these should be "service prosody one(start|stop)"? Is it possible that I'd need some additional configuration to trigger the panic?
(In reply to Mark Johnston from comment #1) Do try to reproduce inside of a jail. I was using a zroot install and iocage. I also was on the latest patchlevel at the time using syspatch, though the system isn't reporting the versions properly. I was told this was expected for the current set of patches. I don't think using onestart/onestop will have a different effect. If needed I can provide a prosody config file but try replicating the crash from within a jail first.
(In reply to tom+fbsdbugzilla from comment #2) Also keep in mind I'm running on bare metal. Running in a virtual machine might be part of the reason your not able to reproduce the crash. I am able to reproduce the crash very consistently across 2 identically specced machines.
Please give another attempt at reproducing this bug. MarkJ's attempt did not replicate the environment properly and this bug seems pretty important.
This bug is still an issue with the latest release patch level 13.2-RELEASE-p2. I was able to replicate it in a virtual machine so it's not hardware specific. I will be disclosing this bug because of how it was handled and the lack of response.
I wasn't able to reproduce this either. Would it be possible to provide a coredump and a debug kernel?
Also unable to reproduce here (13.2-RELEASE tiny dell xps13 laptop) atm. - we all agree this should be fixed - we need either more details on current setup - or a coredump & debug kernel It looks like you can trigger this in ~ 10m uptime, is this a 100% reliable reproduction for you? If so, a coredump from a default debug kernel would be invaluable https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/ and we can coordinate a secure transfer for a coredump. For your current setup, can you provide: - full prosody config (obfuscated is fine) - full iocage config - any other non-default config maybe this will help us replicate it. For the VM, more config details will help. thanks again for the bug report, please help us out with a bit more info!
To rule out one scenario, I tested: FreeBSD 14.0-BETA3 on a Ryzen 7 5800H with prosody-0.12.4 from packages. In /usr/local/etc/prosody/prosody.cfg.lua I tried under "Server-wide settings": network_backend = "event" service prosody onestart service prosody onestatus service prosody onestop Without issues, plus tried the "select" and "epoll" backends. Do you have any other prosody.cfg.lua settings?
Update: I have run through the same steps with a a FreeBSD 14.0-BETA3 jail under py39-iocage-1.2_10, without issue. Please do share what additional configuration settings you might have. The connecting clients may also have an impact. iocage create -n prosody ip4_addr="re0|10.0.0.104/24" -r 14.0-BETA3 (Start, add pkg, configure as mentioned)
I apologize for the long delay, I recently moved and had to take care of some emergencies. I would like to try the coredump route as I have a few of those collected already. I can reproduce it reliably and have another node with the exact same hardware and hardware configuration where I could intentionally reproduce the crash without affecting my production server. I am concerned about sensitive data within the dump. A secure file transfer for the dump would be appreciated so that it is not publicly accessible. Some of the configuration has changed since I first starting experiencing this issue, such as my IP configuration and I started using vnet. Allow me some time to remove the secrets from my prosody configuration file as well.
I have a kernel dump ready to send to a developer.
bug still happens on 14.0 Fatal trap 12: page fault while in kernel mode cpuid = 14; apic id = 0e fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80af13c3 stack pointer = 0x28:0xfffffe017b2c8960 frame pointer = 0x28:0xfffffe017b2c8980 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 59187 (lua54) rdi: fffffe01744b0be8 rsi: fffff801128afd20 rdx: 0000000000000000 rcx: 0000000000000000 r8: fffff80219164400 r9: fffff80219164418 rax: 0000000000000000 rbx: fffff801128afd20 rbp: fffffe017b2c8980 r10: 0000000000000000 r11: fffffe017a7b0700 r12: 0000000000000000 r13: fffffe017b2c8a20 r14: 0000000000000000 r15: fffff80112392100 trap number = 12 panic: page fault cpuid = 14 time = 1702063138 KDB: stack backtrace: #0 0xffffffff80b9002d at kdb_backtrace+0x5d #1 0xffffffff80b43132 at vpanic+0x132 #2 0xffffffff80b42ff3 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe39c8 at calltrap+0x8 #6 0xffffffff80aeec2a at kqueue_register+0x8ea #7 0xffffffff80aefe06 at kqueue_kevent+0x106 #8 0xffffffff80aefba2 at kern_kevent_fp+0x52 #9 0xffffffff80aef816 at kern_kevent_generic+0xd6 #10 0xffffffff80aef6d1 at sys_kevent+0x61 #11 0xffffffff8100d119 at amd64_syscall+0x109 #12 0xffffffff80fe42db at fast_syscall_common+0xf8 Uptime: 16h3m25s
(In reply to tom+fbsdbugzilla from comment #12) Send me an email to rew@FreeBSD.org and we can coordinate on transferring the dump.
(In reply to Robert Wing from comment #13) Uploaded to your server. file vmcore.0
I can't debug the kernel dump because the debugger has bugs. Do you want a coredump for that as well? root@node3:/var/crash # kgdb /boot/kernel/kernel /var/crash/vmcore.0 GNU gdb (GDB) 13.2 [GDB v13.2 for FreeBSD] Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... (No debugging symbols found in /boot/kernel/kernel) /wrkdirs/usr/ports/devel/gdb/work-py39/gdb-13.2/gdb/thread.c:1337: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- 0x130cde1 ??? 0x17a5926 ??? 0x17a5788 ??? 0x1c2e56e ??? 0x176538f ??? 0x14408f0 ??? 0x174695e ??? 0x1340095 ??? 0x17731cf ??? 0x15267a5 ??? 0x152518b ??? 0x1523a9b ??? 0x1219263 ??? 0x830a45af9 ??? 0x12187cf ??? --------------------- /wrkdirs/usr/ports/devel/gdb/work-py39/gdb-13.2/gdb/thread.c:1337: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) n This is a bug, please report it. For instructions, see: <https://www.gnu.org/software/gdb/bugs/>. /wrkdirs/usr/ports/devel/gdb/work-py39/gdb-13.2/gdb/thread.c:1337: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n Command aborted. (kgdb) bt No stack. (kgdb)
(In reply to tom+fbsdbugzilla from comment #15) The kernel dump you provided is incomplete.
(In reply to Robert Wing from comment #16) I'm not sure if your getting my email replies so pasting it here: On Wed, 13 Dec 2023 14:43:36 -0900 Rob Wing <rew@freebsd.org> wrote: > I looked at the dump, it appears to be incomplete. > > The core indicates the dump was going to be ~32G but the core file is > ~7G. > > Are any of your core dumps ~32G? No, 32G is the amount of physical ram installed on the server, but I'm never using all of it. Are you sure it's incomplete? root@node3:/var/crash # ls -l /var/crash total 1717824 -rw-r--r-- 1 root wheel 2 Dec 12 15:13 bounds -rw-r--r-- 1 root wheel 84 Dec 12 15:13 core.txt.0 -rw------- 1 root wheel 471 Dec 12 15:13 info.0 lrwxr-xr-x 1 root wheel 6 Dec 12 15:13 info.last -> info.0 -rw-r--r-- 1 root wheel 5 May 12 2022 minfree -rw------- 1 root wheel 7942991872 Dec 12 15:13 vmcore.0 lrwxr-xr-x 1 root wheel 8 Dec 12 15:13 vmcore.last -> vmcore.0 Going forward, what would you like me to try? I can trigger the bug again and generate another dump. Are you sure a dump generates a 32G file or just the amount of ram that's actually being used?
(In reply to tom+fbsdbugzilla from comment #17) This might be a duplicate of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275286 Can you apply the following fix and trigger the panic again? https://cgit.freebsd.org/src/commit/?id=24346a2f777598272caffbd7e4fb6d85cafe64ed I was able to open the dump, it was complete. I forgot minidump is turned on by default.
(In reply to Robert Wing from comment #18) Yes, I will compile commit id=24346a2f777598272caffbd7e4fb6d85cafe64ed and attempt to trigger the bug with this version.
I was not able to trigger a kpanic using the same method with kernel commit id=24346a2f777598272caffbd7e4fb6d85cafe64ed Here is a dmesg: root@node3:~ # dmesg ---<<BOOT>>--- Copyright (c) 1992-2023 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.0-STABLE #0: Tue Dec 19 16:30:24 PST 2023 root@node3:/usr/obj/usr/src/src-24346a2f777598272caffbd7e4fb6d85cafe64ed/amd64.amd64/sys/MYKTEST1 amd64 FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152) VT(efifb): resolution 800x600 CPU microcode: updated from 0x7000012 to 0x700001c CPU: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (2100.08-MHz K8-class CPU) Origin="GenuineIntel" Id=0x50663 Family=0x6 Model=0x56 Stepping=3 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x121<LAHF,ABM,Prefetch> Structured Extended Features=0x21cbfbb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,PQE,RDSEED,ADX,SMAP,PROCTRACE> Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, performance statistics real memory = 34358689792 (32767 MB) avail memory = 33210122240 (31671 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 8 core(s) x 2 hardware threads random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" random: unblocking device. Security policy loaded: TrustedBSD MAC/portacl (mac_portacl) ioapic0 <Version 2.0> irqs 0-23 ioapic1 <Version 2.0> irqs 24-47 Launching APs: 1 7 3 15 12 8 9 11 10 14 6 2 5 4 13 random: entropy device external interface kbd1 at kbdmux0 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s smbios0: <System Management BIOS> at iomem 0xf05e0-0xf05fe smbios0: Version: 3.0, BCD Revision: 3.0 aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS> acpi0: <ALASKA A M I > acpi0: Power Button (fixed) cpu0: <ACPI CPU> numa-domain 0 on acpi0 atrtc0: <AT realtime clock> port 0x70-0x71,0x74-0x77 irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 350 Event timer "HPET1" frequency 14318180 Hz quality 340 Event timer "HPET2" frequency 14318180 Hz quality 340 Event timer "HPET3" frequency 14318180 Hz quality 340 Event timer "HPET4" frequency 14318180 Hz quality 340 Event timer "HPET5" frequency 14318180 Hz quality 340 Event timer "HPET6" frequency 14318180 Hz quality 340 Event timer "HPET7" frequency 14318180 Hz quality 340 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: <ACPI Host-PCI bridge> on acpi0 pci0: <ACPI PCI bus> on pcib0 pci0: <dasp, performance counters> at device 11.1 (no driver attached) pci0: <dasp, performance counters> at device 11.2 (no driver attached) pci0: <dasp, performance counters> at device 16.1 (no driver attached) pci0: <dasp, performance counters> at device 16.6 (no driver attached) pci0: <dasp, performance counters> at device 18.1 (no driver attached) acpi_syscontainer0: <System Container> on acpi0 apei0: <ACPI Platform Error Interface> on acpi0 pcib1: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0 pci1: <ACPI PCI bus> numa-domain 0 on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 26 at device 1.0 numa-domain 0 on pci1 pci2: <ACPI PCI bus> numa-domain 0 on pcib2 nvme0: <Generic NVMe Device> mem 0xfb400000-0xfb403fff irq 26 at device 0.0 numa-domain 0 on pci2 pcib3: <ACPI PCI-PCI bridge> irq 32 at device 2.0 numa-domain 0 on pci1 pci3: <ACPI PCI bus> numa-domain 0 on pcib3 pcib4: <ACPI PCI-PCI bridge> irq 40 at device 3.0 numa-domain 0 on pci1 pci4: <ACPI PCI bus> numa-domain 0 on pcib4 xhci0: <Intel Lynx Point USB 3.0 controller> mem 0xfb500000-0xfb50ffff irq 19 at device 20.0 numa-domain 0 on pci1 xhci0: 32 bytes context size, 64-bit DMA usbus0: waiting for BIOS to give up control xhci0: Port routing mask set to 0xffffffff usbus0 numa-domain 0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci1: <simple comms> at device 22.0 (no driver attached) pci1: <simple comms> at device 22.1 (no driver attached) pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.0 numa-domain 0 on pci1 pci5: <ACPI PCI bus> numa-domain 0 on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 0.0 numa-domain 0 on pci5 pci6: <ACPI PCI bus> numa-domain 0 on pcib6 vgapci0: <VGA-compatible display> port 0xe000-0xe07f mem 0xfa000000-0xfaffffff,0xfb000000-0xfb01ffff irq 16 at device 0.0 numa-domain 0 on pci6 vgapci0: Boot video device pcib7: <ACPI PCI-PCI bridge> irq 18 at device 28.2 numa-domain 0 on pci1 pci7: <ACPI PCI bus> numa-domain 0 on pcib7 igb0: <Intel(R) I210 (Copper)> port 0xd000-0xd01f mem 0xfb200000-0xfb27ffff,0xfb280000-0xfb283fff irq 18 at device 0.0 numa-domain 0 on pci7 igb0: PHY reset is blocked due to SOL/IDER session. igb0: EEPROM V3.16-0 eTrack 0x800004d6 igb0: Using 1024 TX descriptors and 1024 RX descriptors igb0: Using 4 RX queues 4 TX queues igb0: Using MSI-X interrupts with 5 vectors igb0: Ethernet address: d0:50:99:c1:3f:9b igb0: link state changed to UP igb0: netmap queues/slots: TX 4/1024, RX 4/1024 pcib8: <ACPI PCI-PCI bridge> irq 19 at device 28.3 numa-domain 0 on pci1 pci8: <ACPI PCI bus> numa-domain 0 on pcib8 igb1: <Intel(R) I210 (Copper)> port 0xc000-0xc01f mem 0xfb100000-0xfb17ffff,0xfb180000-0xfb183fff irq 19 at device 0.0 numa-domain 0 on pci8 igb1: EEPROM V3.16-0 eTrack 0x800004d6 igb1: Using 1024 TX descriptors and 1024 RX descriptors igb1: Using 4 RX queues 4 TX queues igb1: Using MSI-X interrupts with 5 vectors igb1: Ethernet address: d0:50:99:c1:3f:9c igb1: netmap queues/slots: TX 4/1024, RX 4/1024 ehci0: <Intel Lynx Point USB 2.0 controller USB-A> mem 0xfb513000-0xfb5133ff irq 18 at device 29.0 numa-domain 0 on pci1 usbus1: EHCI version 1.0 usbus1 numa-domain 0 on ehci0 usbus1: 480Mbps High Speed USB v2.0 isab0: <PCI-ISA bridge> at device 31.0 numa-domain 0 on pci1 isa0: <ISA bus> numa-domain 0 on isab0 ahci0: <Intel Lynx Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xfb512000-0xfb5127ff irq 16 at device 31.2 numa-domain 0 on pci1 ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich0: <AHCI channel> at channel 0 on ahci0 ahcich1: <AHCI channel> at channel 1 on ahci0 ahcich2: <AHCI channel> at channel 2 on ahci0 ahcich3: <AHCI channel> at channel 3 on ahci0 ahcich4: <AHCI channel> at channel 4 on ahci0 ahcich5: <AHCI channel> at channel 5 on ahci0 ahciem0: <AHCI enclosure management bridge> on ahci0 acpi_button0: <Power Button> on acpi0 uart0: <16550 or compatible> port 0x2f8-0x2ff irq 3 flags 0x10 on acpi0 ns8250: UART FCR is broken uart0: console (115200,n,8,1) ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0 ipmi0: KCS mode found at io 0xca2 on acpi orm0: <ISA Option ROM> at iomem 0xc0000-0xc7fff pnpid ORM0000 on isa0 est0: <Enhanced SpeedStep Frequency Control> numa-domain 0 on cpu0 Timecounter "TSC" frequency 2099998110 Hz quality 1000 Timecounters tick every 1.000 msec ugen1.1: <Intel EHCI root HUB> at usbus1 ugen0.1: <Intel XHCI root HUB> at usbus0 uhub0 numa-domain 0 on usbus1 uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) uhub1 numa-domain 0 on usbus0 uhub1: <Intel XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 ipmi0: IPMI device rev. 1, firmware rev. 0.17, version 2.0, device support mask 0xbf ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler nda0 at nvme0 bus 0 scbus7 target 0 lun 1 nda0: <INTEL SSDPEKKW256G7 PSF100C BTPY63560ETA256D> nda0: Serial Number BTPY63560ETA256D nda0: nvme version 1.2 nda0: 244198MB (500118192 512 byte sectors) ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <INTEL SSDSC2BB480G7 N2010101> ACS-3 ATA SATA 3.x device ada0: Serial Number PHDV637500CA480BGN ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada0: Command Queueing enabled ada0: 457862MB (937703088 512 byte sectors) ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: <INTEL SSDSC2BB480G7 N2010101> ACS-3 ATA SATA 3.x device ada1: Serial Number PHDV63750058480BGN ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada1: Command Queueing enabled ada1: 457862MB (937703088 512 byte sectors) ses0 at ahciem0 bus 0 scbus6 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device ses0: ada0,pass0 in 'Slot 00', SATA Slot: scbus0 target 0 ses0: ada1,pass1 in 'Slot 01', SATA Slot: scbus1 target 0 GEOM_ELI: Device ada0p3.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: accelerated software GEOM_MIRROR: Device mirror/swap launched (2/2). GEOM_ELI: Device ada1p3.eli created. GEOM_ELI: Encryption: AES-XTS 256 GEOM_ELI: Crypto: accelerated software Trying to mount root from zfs:zroot/ROOT/default []... uhub0: 2 ports with 2 removable, self powered uhub1: 21 ports with 21 removable, self powered ugen1.2: <vendor 0x8087 product 0x8000> at usbus1 uhub2 numa-domain 0 on uhub0 uhub2: <vendor 0x8087 product 0x8000, class 9/0, rev 2.00/0.05, addr 2> on usbus1 uhub2: 4 ports with 4 removable, self powered Dual Console: Video Primary, Serial Secondary GEOM_ELI: Device mirror/swap.eli created. GEOM_ELI: Encryption: AES-XTS 128 GEOM_ELI: Crypto: accelerated software ichsmb0: <Intel Lynx Point SMBus controller> port 0xf000-0xf01f mem 0xfb511000-0xfb5110ff irq 18 at device 31.3 numa-domain 0 on pci1 smbus0: <System Management Bus> numa-domain 0 on ichsmb0 pchtherm0: <Haswell Thermal Subsystem> irq 18 at device 31.6 numa-domain 0 on pci1 ioat0: <BDXDE IOAT Ch0> mem 0xfb306000-0xfb307fff irq 32 at device 0.0 numa-domain 0 on pci3 ioat0: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> ioat1: <BDXDE IOAT Ch1> mem 0xfb304000-0xfb305fff irq 36 at device 0.1 numa-domain 0 on pci3 ioat1: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> ioat2: <BDXDE IOAT Ch2> mem 0xfb302000-0xfb303fff irq 37 at device 0.2 numa-domain 0 on pci3 ioat2: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> ioat3: <BDXDE IOAT Ch3> mem 0xfb300000-0xfb301fff irq 38 at device 0.3 numa-domain 0 on pci3 ioat3: Capabilities: c2641<Completion_Timeout_Support,DMA_with_Multicasting_Support,Descriptor_Write_Back_Error_Support,DMA_with_DIF,PQ,Block_Fill,Page_Break> acpi_wmi0: <ACPI-WMI mapping> on acpi0 acpi_wmi0: cannot find EC device acpi_wmi1: <ACPI-WMI mapping> on acpi0 acpi_wmi1: cannot find EC device igb1: link state changed to UP lo0: link state changed to UP igb1: link state changed to DOWN lagg0: link state changed to UP igb1: link state changed to UP Security policy loaded: MAC/ntpd (mac_ntpd) warning: total configured swap (64611925 pages) exceeds maximum recommended amount (32499256 pages). warning: increase kern.maxswzone or reduce amount of swap. bridge1: Ethernet address: 58:9c:fc:10:ff:b6 epair0a: Ethernet address: 02:62:dc:4b:dd:0a epair0b: Ethernet address: 02:62:dc:4b:dd:0b epair0a: link state changed to UP epair0b: link state changed to UP epair0a: changing name to 'vnet1.1' epair0b: changing name to 'epair1b' igb0: promiscuous mode enabled bridge1: link state changed to UP igb1: promiscuous mode enabled lagg0: promiscuous mode enabled lagg0.4: promiscuous mode enabled vnet1.1: promiscuous mode enabled lo0: link state changed to UP epair1a: Ethernet address: 02:99:4c:e3:56:0a epair1b: Ethernet address: 02:99:4c:e3:56:0b epair1a: link state changed to UP epair1b: link state changed to UP epair1a: changing name to 'vnet1.2' vnet1.2: promiscuous mode enabled lo0: link state changed to UP root@node3:~ # iocage stop xmpp * Stopping xmpp + Executing prestop OK + Stopping services OK + Tearing down VNET OK + Removing devfs_ruleset: 1004 OK + Removing jail process OK + Executing poststop OK root@node3:~ # iocage start xmpp No default gateway found for ipv4. * Starting xmpp + Started OK + Using devfs_ruleset: 1004 (iocage generated default) + Configuring VNET OK + Using IP options: vnet + Starting services OK + Executing poststart OK root@node3:~ #
(In reply to tom+fbsdbugzilla from comment #20) Great, I'll mark this is as a duplicate of 275286. Thanks for the help/report. *** This bug has been marked as a duplicate of bug 275286 ***