|Summary:||ifconfig epair create panics the kernel (arm64)|
|Product:||Base System||Reporter:||Heinz N. Gies <heinz>|
|Component:||kern||Assignee:||Bjoern A. Zeeb <bz>|
|Severity:||Affects Some People||CC:||Andrew, bz, emaste, marklmi26-fbsd|
Description Heinz N. Gies 2017-09-14 00:28:55 UTC
Starting a jail with vnets panics the kernel, this is not a duplicate of #213896 as the patch mentioned there was applied to the kernel in question. I'm not sure weather or not it is a arm64 related issue or not, it is a arm64 system but the nic used is a normal intel nic. login: lock order reversal: 1st 0xfffffd0094344418 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:849 2nd 0xfffffd0094344240 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2533 stack backtrace: #0 0xffff0000002eee98 at witness_debugger+0x64 #1 0xffff00000026ad88 at lockmgr_lock_fast_path+0x1b4 #2 0xffff000000590634 at VOP_LOCK1_APV+0xcc #3 0xffff00000035cf3c at _vn_lock+0x6c #4 0xffff00000034e98c at vget+0x78 #5 0xffff00000017d6a8 at devfs_allocv+0xdc #6 0xffff00000017d1e0 at devfs_root+0x44 #7 0xffff000000345d2c at vfs_donmount+0x102c #8 0xffff000000344ccc at sys_nmount+0x68 #9 0xffff000000573404 at do_el0_sync+0x8c8 #10 0xffff00000055c9f4 at handle_el0_sync+0x74 panic: vm_fault: fault on nofault entry, addr: ffff0000999ee000 cpuid = 18 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0xffff00000055ae28 lr = 0xffff00000005eb10 sp = 0xffff00061d8dae40 fp = 0xffff00061d8db050 db_trace_self_wrapper() at vpanic+0x170 pc = 0xffff00000005eb10 lr = 0xffff00000029346c sp = 0xffff00061d8db060 fp = 0xffff00061d8db0e0 vpanic() at panic+0x48 pc = 0xffff00000029346c lr = 0xffff0000002934f8 sp = 0xffff00061d8db0f0 fp = 0xffff00061d8db170 panic() at vm_fault_hold+0x1ab0 pc = 0xffff0000002934f8 lr = 0xffff00000052f1dc sp = 0xffff00061d8db180 fp = 0xffff00061d8db2d0 vm_fault_hold() at vm_fault+0x70 pc = 0xffff00000052f1dc lr = 0xffff00000052d6dc sp = 0xffff00061d8db2e0 fp = 0xffff00061d8db310 vm_fault() at data_abort+0xd8 pc = 0xffff00000052d6dc lr = 0xffff0000005729dc sp = 0xffff00061d8db320 fp = 0xffff00061d8db3d0 data_abort() at handle_el1h_sync+0x74 pc = 0xffff0000005729dc lr = 0xffff00000055c874 sp = 0xffff00061d8db3e0 fp = 0xffff00061d8db4f0 handle_el1h_sync() at vnet_epair_init+0x2c pc = 0xffff00000055c874 lr = 0xffff00005996674c sp = 0xffff00061d8db500 fp = 0xffff00061d8db580 vnet_epair_init() at vnet_register_sysinit+0x100 pc = 0xffff00005996674c lr = 0xffff00000039e000 sp = 0xffff00061d8db590 fp = 0xffff00061d8db5b0 vnet_register_sysinit() at linker_load_module+0xaac pc = 0xffff00000039e000 lr = 0xffff000000266a68 sp = 0xffff00061d8db5c0 fp = 0xffff00061d8db8e0 linker_load_module() at kern_kldload+0xec pc = 0xffff000000266a68 lr = 0xffff000000268120 sp = 0xffff00061d8db8f0 fp = 0xffff00061d8db920 kern_kldload() at sys_kldload+0x64 pc = 0xffff000000268120 lr = 0xffff000000268278 sp = 0xffff00061d8db930 fp = 0xffff00061d8db950 sys_kldload() at do_el0_sync+0x8c8 pc = 0xffff000000268278 lr = 0xffff000000573404 sp = 0xffff00061d8db960 fp = 0xffff00061d8dba80 do_el0_sync() at handle_el0_sync+0x74 pc = 0xffff000000573404 lr = 0xffff00000055c9f4 sp = 0xffff00061d8dba90 fp = 0xffff00061d8dbba0 handle_el0_sync() at 0x21278 pc = 0xffff00000055c9f4 lr = 0x0000000000021278 sp = 0xffff00061d8dbbb0 fp = 0x0000ffffffffe2d0 KDB: enter: panic [ thread pid 1053 tid 101161 ] Stopped at kdb_enter+0x40: undefined d4200000
Comment 1 Heinz N. Gies 2017-09-14 01:48:36 UTC
I changed the description as this can be introduced by the simple command: ifconfig epair create that it was called as part of a vnet jail seems to have been coincidence (or a second bug?) I feel like this is more likely arm specific so I'll move it to the arm component rather than the kern one since I've created epairs successfully in the past - OTOH it might be more related to 48 cores then the architecture?
Comment 2 Andrew Turner 2018-07-30 16:05:39 UTC
This should be fixed, with VIMAGE enabled again in base r336915
Comment 3 Bjoern A. Zeeb 2018-10-15 16:26:09 UTC
I am closing it; if you still have problems with epair in 12; please re-open or follow-up on PR 223670 (re-opening that one). Thanks a lot for reporting! *** This bug has been marked as a duplicate of bug 223670 ***
Comment 4 Mark Millard 2018-10-18 05:59:12 UTC
A powerpc64 head -r339076 based context running # kyua test -k /usr/tests/Kyuafile reliably crashes (so far) during kyua displaying: sys/netinet/reuseport_lb:basic_ipv4 -> failed: /usr/src/tests/sys/netinet/reuseport_lb.c:165: bind() failed: Address already in use [0.013s] sys/netinet/reuseport_lb:basic_ipv6 -> failed: /usr/src/tests/sys/netinet/reuseport_lb.c:221: bind() failed: Address already in use [0.013s] sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4 -> Example details based on a debug kernel (invariants, witness, and diagnostics) . . . Note the LOR backtrace and the crash backtrace are the same for the call chain that calls vnet_sysinit. . . . epair3a: Ethernet address: 02:60:27:70:4b:0a epair3b: Ethernet address: 02:60:27:70:4b:0b epair3a: link state changed to UP epair3b: link state changed to UP lock order reversal: 1st 0x13be260 allprison (allprison) @ /usr/src/sys/kern/kern_jail.c:960 2nd 0x15964a0 vnet_sysinit_sxlock (vnet_sysinit_sxlock) @ /usr/src/sys/net/vnet.c:575 stack backtrace: #0 0x6f6520 at witness_debugger+0xf4 #1 0x6f8440 at witness_checkorder+0xa1c #2 0x675690 at _sx_slock_int+0x70 #3 0x675810 at _sx_slock+0x1c #4 0x7f4338 at vnet_sysinit+0x38 #5 0x7f44dc at vnet_alloc+0x118 #6 0x62ab84 at kern_jail_set+0x3274 #7 0x62b62c at sys_jail_set+0x8c #8 0xa8a798 at trap+0x9a0 #9 0xa7e660 at powerpc_interrupt+0x140 fatal kernel trap: exception = 0x300 (data storage interrupt) virtual address = 0xc00000008df1df30 dsisr = 0x42000000 srr0 = 0xe000000047854e98 (0xe000000047854e98) srr1 = 0x9000000000009032 current msr = 0x9000000000009032 lr = 0xe000000047854e90 (0xe000000047854e90) curthread = 0xc0000000206b6000 pid = 9464, comm = jail (Hand transcribed from here on:) [ thread pid 9464 tid 100296 ] Stopped at vnet_epair_init+0x78: stdx r3,r29,r30 db:0:kdb.enter.default> bt Tracing pid 9464 tid 100296 td 0xc0000000206b6000 0xe000000047274240: at vnet_sysinit+0x70 0xe000000047274270: at vnet_alloc+0x118 0xe000000047274300: at kern_jail_set+0x3274 0xe000000047274610: at sys_jail_set+0x8c 0xe000000047274660: at trap+0x9a0 0xe000000047274790: at powerpc_interrupt+0x140 0xe000000047274820: user sc trap by 0x81016a888 srr1 = 0x900000000000f032 r1 = 0x3fffffffffffd080 cr = 0x28002482 xer = 0x20000000 ctr = 0x81016a880 r2 = 0x810322300 There are past reports of the lock order reversal, such as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210907 but this did not report any crash. Notes: The powerpc64 -r339076 based system was built via devel/powerpc-xtoolchain-gcc and created system-cc-is-clang and is using base/binutils as well. kyua is as of ports -r480180 and system-clang built it (and other things). I experiment with what the issues are with using fairly modern compiler toolchains for powerpc64 instead of gcc 4.2.1 . At this point I do not see this as likely to be responsible for the above crash.
Comment 5 Bjoern A. Zeeb 2018-10-18 13:39:10 UTC
(In reply to Mark Millard from comment #4) Ignoring the LOR. Ignoring the fact that this bug report was arm64 specific. Let's see if it is the same problem at least; otherwise we should track this elsewhere. Shot in the dark, can you try adding powerpc to the place in sys/net/vnet.h as was done in https://svnweb.freebsd.org/base?view=revision&revision=336909 for arm64 ? changing the line #if defined(KLD_MODULE) && defined(__aarch64__) to #if defined(KLD_MODULE) && (defined(__aarch64__) || \ defined(__powerpc__) || defined(__powerpc64__)) and see if this helps; Be aware that (a) I hope I got the correct __<foo>__ for powerpc and (b) at the moment I am assuming that this applies to both and we need both. I am absolutely not sure which one is correct or needed for FreeBSD's powerpc support.
Comment 6 Mark Millard 2018-10-18 15:55:47 UTC
(In reply to Bjoern A. Zeeb from comment #5) Tested: still crashes the same way when based on . . . # svnlite diff /usr/src/sys/net/vnet.h Index: /usr/src/sys/net/vnet.h =================================================================== --- /usr/src/sys/net/vnet.h (revision 339076) +++ /usr/src/sys/net/vnet.h (working copy) @@ -273,7 +273,8 @@ /* struct _hack is to stop this from being used with static data */ #define VNET_DEFINE(t, n) \ struct _hack; t VNET_NAME(n) __section(VNET_SETNAME) __used -#if defined(KLD_MODULE) && (defined(__aarch64__) || defined(__riscv)) +#if defined(KLD_MODULE) && (defined(__aarch64__) || defined(__riscv) || \ + defined(__powerpc__) || defined(__powerpc64__)) /* * As with DPCPU_DEFINE_STATIC we are unable to mark this data as static * in modules on some architectures. (DPCPU_DEFINE_STATIC still not using such for powerpc family members) So: not the same problem and track elsewhere. Thanks for the test.
Comment 7 Bjoern A. Zeeb 2018-10-18 15:59:05 UTC
(In reply to Mark Millard from comment #6) Yes testing dpcpu would be harder and I only wanted to know if this is actually the same problem. Can you please open a separate PR with the data (feel free to put a reference in to this one saying "different issue"). If you also have a objdump of the epair vnet function we are crashing in (in the new PR) that would be helpful. *** This bug has been marked as a duplicate of bug 223670 ***