Bug 260095 - page fault when restarting vnet jails: at htable_prefix_free+0xf4
Summary: page fault when restarting vnet jails: at htable_prefix_free+0xf4
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2021-11-28 19:59 UTC by Tim Foster
Modified: 2021-11-28 22:53 UTC (History)
2 users (show)

See Also:
koobs: maintainer-feedback? (net)


Attachments
dmesg output from the bhyve instance (29.35 KB, text/plain)
2021-11-28 19:59 UTC, Tim Foster
no flags Details
jail.conf file (699 bytes, text/plain)
2021-11-28 20:00 UTC, Tim Foster
no flags Details
rc.conf file (714 bytes, text/plain)
2021-11-28 20:00 UTC, Tim Foster
no flags Details
bhyve.xml (2.67 KB, text/plain)
2021-11-28 20:07 UTC, Tim Foster
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Foster 2021-11-28 19:59:47 UTC
Created attachment 229768 [details]
dmesg output from the bhyve instance

I have an 8GB bhyve instance running 

FreeBSD puroto 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

which panics reliably when I do:

# service jails restart

The system runs several vtnet jails which have been working fine since FreeBSD
11.x or so. In case it's useful, I've attached recent dmesg output, as well as
my rc.conf and jails.conf

I feel like this might be related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234985 but that's just a gut-feeling.

The panic looks like:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x440
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80be7cc2
stack pointer           = 0x28:0xfffffe00d3013620
frame pointer           = 0x28:0xfffffe00d30136a0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 9000 (ifconfig)
trap number             = 12
panic: page fault
cpuid = 2
time = 1638128325
KDB: stack backtrace:
#0 0xffffffff80c574c5 at kdb_backtrace+0x65
#1 0xffffffff80c09ea1 at vpanic+0x181
#2 0xffffffff80c09d13 at panic+0x43
#3 0xffffffff8108b1b7 at trap_fatal+0x387
#4 0xffffffff8108b20f at trap_pfault+0x4f
#5 0xffffffff8108a86d at trap+0x27d
#6 0xffffffff81061958 at calltrap+0x8
#7 0xffffffff80d27254 at htable_prefix_free+0xf4
#8 0xffffffff80d26fcd at lltable_prefix_free+0x6d
#9 0xffffffff80d9fcd1 at in_scrubprefix+0x281
#10 0xffffffff80d9f46d at in_difaddr_ioctl+0x30d
#11 0xffffffff80d9e8bf at in_control+0x5bf
#12 0xffffffff80d1f4df at ifioctl+0x55f
#13 0xffffffff80c76e6d at kern_ioctl+0x26d
#14 0xffffffff80c76b66 at sys_ioctl+0xf6
#15 0xffffffff8108babc at amd64_syscall+0x10c
#16 0xffffffff8106227e at fast_syscall_common+0xf8
Uptime: 2m13s
Dumping 443 out of 8152 MB:


I've tried dropping into the kernel debugger on panic, but haven't been able to get that to work (even with a 'boot -d') and I seem to be unable to inspect the crash dump:

root@puroto:/var/crash # kgdb /boot/kernel/kernel vmcore.0
GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
(No debugging symbols found in /boot/kernel/kernel)
/wrkdirs/usr/ports/devel/gdb/work-py38/gdb-11.1/gdb/thread.c:1345: internal-error: void switch_to_thread(thread_info *): Assertion `thr != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.

/wrkdirs/usr/ports/devel/gdb/work-py38/gdb-11.1/gdb/thread.c:1345: internal-error: void switch_to_thread(thread_info *): Assertion `thr != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
Command aborted.
(kgdb)
Comment 1 Tim Foster 2021-11-28 20:00:23 UTC
Created attachment 229769 [details]
jail.conf file
Comment 2 Tim Foster 2021-11-28 20:00:45 UTC
Created attachment 229770 [details]
rc.conf file
Comment 3 Tim Foster 2021-11-28 20:07:23 UTC
Created attachment 229771 [details]
bhyve.xml

Finally, in case it's relevant 6_puroto.xml is the bhyve configuration from the TrueNAS Core box that this bhyve instance is running on.
Comment 4 Tim Foster 2021-11-28 21:42:43 UTC
The core's now available at https://user.fm/files/v2-7705d4534f707028e76b7f09d507ea5f/vmcore.0.gz

(only 34mb compressed,
timf@puroto sha1 vmcore.0.gz
SHA1 (vmcore.0.gz) = 980c622e4a74304138a8b318552fe08a37f01527
)