Bug 222314 - ifconfig epair create panics the kernel (arm64)
Summary: ifconfig epair create panics the kernel (arm64)
Status: Closed DUPLICATE of bug 223670
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords: crash, vimage
Depends on:
Blocks:
 
Reported: 2017-09-14 00:28 UTC by Heinz N. Gies
Modified: 2018-10-18 15:59 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Heinz N. Gies 2017-09-14 00:28:55 UTC
Starting a jail with vnets panics the kernel, this is not a duplicate of #213896 as the patch mentioned there was applied to the kernel in question.

I'm not sure weather or not it is a arm64 related issue or not, it is a arm64 system but the nic used is a normal intel nic.

login: lock order reversal:
 1st 0xfffffd0094344418 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:849
 2nd 0xfffffd0094344240 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2533
stack backtrace:
#0 0xffff0000002eee98 at witness_debugger+0x64
#1 0xffff00000026ad88 at lockmgr_lock_fast_path+0x1b4
#2 0xffff000000590634 at VOP_LOCK1_APV+0xcc
#3 0xffff00000035cf3c at _vn_lock+0x6c
#4 0xffff00000034e98c at vget+0x78
#5 0xffff00000017d6a8 at devfs_allocv+0xdc
#6 0xffff00000017d1e0 at devfs_root+0x44
#7 0xffff000000345d2c at vfs_donmount+0x102c
#8 0xffff000000344ccc at sys_nmount+0x68
#9 0xffff000000573404 at do_el0_sync+0x8c8
#10 0xffff00000055c9f4 at handle_el0_sync+0x74
panic: vm_fault: fault on nofault entry, addr: ffff0000999ee000
cpuid = 18
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
	 pc = 0xffff00000055ae28  lr = 0xffff00000005eb10
	 sp = 0xffff00061d8dae40  fp = 0xffff00061d8db050

db_trace_self_wrapper() at vpanic+0x170
	 pc = 0xffff00000005eb10  lr = 0xffff00000029346c
	 sp = 0xffff00061d8db060  fp = 0xffff00061d8db0e0

vpanic() at panic+0x48
	 pc = 0xffff00000029346c  lr = 0xffff0000002934f8
	 sp = 0xffff00061d8db0f0  fp = 0xffff00061d8db170

panic() at vm_fault_hold+0x1ab0
	 pc = 0xffff0000002934f8  lr = 0xffff00000052f1dc
	 sp = 0xffff00061d8db180  fp = 0xffff00061d8db2d0

vm_fault_hold() at vm_fault+0x70
	 pc = 0xffff00000052f1dc  lr = 0xffff00000052d6dc
	 sp = 0xffff00061d8db2e0  fp = 0xffff00061d8db310

vm_fault() at data_abort+0xd8
	 pc = 0xffff00000052d6dc  lr = 0xffff0000005729dc
	 sp = 0xffff00061d8db320  fp = 0xffff00061d8db3d0

data_abort() at handle_el1h_sync+0x74
	 pc = 0xffff0000005729dc  lr = 0xffff00000055c874
	 sp = 0xffff00061d8db3e0  fp = 0xffff00061d8db4f0

handle_el1h_sync() at vnet_epair_init+0x2c
	 pc = 0xffff00000055c874  lr = 0xffff00005996674c
	 sp = 0xffff00061d8db500  fp = 0xffff00061d8db580

vnet_epair_init() at vnet_register_sysinit+0x100
	 pc = 0xffff00005996674c  lr = 0xffff00000039e000
	 sp = 0xffff00061d8db590  fp = 0xffff00061d8db5b0

vnet_register_sysinit() at linker_load_module+0xaac
	 pc = 0xffff00000039e000  lr = 0xffff000000266a68
	 sp = 0xffff00061d8db5c0  fp = 0xffff00061d8db8e0

linker_load_module() at kern_kldload+0xec
	 pc = 0xffff000000266a68  lr = 0xffff000000268120
	 sp = 0xffff00061d8db8f0  fp = 0xffff00061d8db920

kern_kldload() at sys_kldload+0x64
	 pc = 0xffff000000268120  lr = 0xffff000000268278
	 sp = 0xffff00061d8db930  fp = 0xffff00061d8db950

sys_kldload() at do_el0_sync+0x8c8
	 pc = 0xffff000000268278  lr = 0xffff000000573404
	 sp = 0xffff00061d8db960  fp = 0xffff00061d8dba80

do_el0_sync() at handle_el0_sync+0x74
	 pc = 0xffff000000573404  lr = 0xffff00000055c9f4
	 sp = 0xffff00061d8dba90  fp = 0xffff00061d8dbba0

handle_el0_sync() at 0x21278
	 pc = 0xffff00000055c9f4  lr = 0x0000000000021278
	 sp = 0xffff00061d8dbbb0  fp = 0x0000ffffffffe2d0

KDB: enter: panic
[ thread pid 1053 tid 101161 ]
Stopped at      kdb_enter+0x40: undefined       d4200000
Comment 1 Heinz N. Gies 2017-09-14 01:48:36 UTC
I changed the description as this can be introduced by the simple command:

ifconfig epair create

that it was called as part of a vnet jail seems to have been coincidence (or a second bug?)

I feel like this is more likely arm specific so I'll move it to the arm component rather than the kern one since I've created epairs successfully in the past - OTOH it might be more related to 48 cores then the architecture?
Comment 2 Andrew Turner freebsd_committer freebsd_triage 2018-07-30 16:05:39 UTC
This should be fixed, with VIMAGE enabled again in base r336915
Comment 3 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-15 16:26:09 UTC
I am closing it; if you still have problems with epair in 12; please re-open or follow-up on PR 223670 (re-opening that one).

Thanks a lot for reporting!

*** This bug has been marked as a duplicate of bug 223670 ***
Comment 4 Mark Millard 2018-10-18 05:59:12 UTC
A powerpc64 head -r339076 based context running

# kyua test -k /usr/tests/Kyuafile

reliably crashes (so far) during kyua displaying:

sys/netinet/reuseport_lb:basic_ipv4  ->  failed: /usr/src/tests/sys/netinet/reuseport_lb.c:165: bind() failed: Address already in use [0.013s]
sys/netinet/reuseport_lb:basic_ipv6  ->  failed: /usr/src/tests/sys/netinet/reuseport_lb.c:221: bind() failed: Address already in use [0.013s]
sys/netipsec/tunnel/aes_cbc_128_hmac_sha1:v4  ->  

Example details based on a debug kernel (invariants,
witness, and diagnostics) . . .

Note the LOR backtrace and the crash backtrace
are the same for the call chain that calls
vnet_sysinit.

. . .
epair3a: Ethernet address: 02:60:27:70:4b:0a
epair3b: Ethernet address: 02:60:27:70:4b:0b
epair3a: link state changed to UP
epair3b: link state changed to UP
lock order reversal:
1st 0x13be260 allprison (allprison) @ /usr/src/sys/kern/kern_jail.c:960
2nd 0x15964a0 vnet_sysinit_sxlock (vnet_sysinit_sxlock) @ /usr/src/sys/net/vnet.c:575
stack backtrace:
#0 0x6f6520 at witness_debugger+0xf4
#1 0x6f8440 at witness_checkorder+0xa1c
#2 0x675690 at _sx_slock_int+0x70
#3 0x675810 at _sx_slock+0x1c
#4 0x7f4338 at vnet_sysinit+0x38

#5 0x7f44dc at vnet_alloc+0x118
#6 0x62ab84 at kern_jail_set+0x3274
#7 0x62b62c at sys_jail_set+0x8c
#8 0xa8a798 at trap+0x9a0
#9 0xa7e660 at powerpc_interrupt+0x140

fatal kernel trap:

  exception       = 0x300 (data storage interrupt)
  virtual address = 0xc00000008df1df30
  dsisr           = 0x42000000
  srr0            = 0xe000000047854e98 (0xe000000047854e98)
  srr1            = 0x9000000000009032
  current msr     = 0x9000000000009032
  lr              = 0xe000000047854e90 (0xe000000047854e90)
  curthread       = 0xc0000000206b6000
         pid = 9464, comm = jail

(Hand transcribed from here on:)

[ thread pid 9464 tid 100296 ]
Stopped at vnet_epair_init+0x78: stdx r3,r29,r30
db:0:kdb.enter.default> bt
Tracing pid 9464 tid 100296 td 0xc0000000206b6000
0xe000000047274240: at vnet_sysinit+0x70

0xe000000047274270: at vnet_alloc+0x118
0xe000000047274300: at kern_jail_set+0x3274
0xe000000047274610: at sys_jail_set+0x8c
0xe000000047274660: at trap+0x9a0
0xe000000047274790: at powerpc_interrupt+0x140

0xe000000047274820: user sc trap by 0x81016a888
srr1 = 0x900000000000f032
r1   = 0x3fffffffffffd080
cr   = 0x28002482
xer  = 0x20000000
ctr  = 0x81016a880
r2   = 0x810322300



There are past reports of the lock order
reversal, such as:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210907

but this did not report any crash.

Notes:

The powerpc64 -r339076 based system was built via
devel/powerpc-xtoolchain-gcc and created
system-cc-is-clang and is using base/binutils as
well. kyua is as of ports -r480180 and system-clang
built it (and other things).

I experiment with what the issues are with using
fairly modern compiler toolchains for powerpc64
instead of gcc 4.2.1 . At this point I do not
see this as likely to be responsible for the
above crash.
Comment 5 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-18 13:39:10 UTC
(In reply to Mark Millard from comment #4)
 
Ignoring the LOR.  Ignoring the fact that this bug report was arm64 specific.  Let's see if it is the same problem at least; otherwise we should track this elsewhere.


Shot in the dark, can you try adding powerpc to the place in sys/net/vnet.h as was done in https://svnweb.freebsd.org/base?view=revision&revision=336909 for arm64 ?

changing the line

#if defined(KLD_MODULE) && defined(__aarch64__)

to

#if defined(KLD_MODULE) && (defined(__aarch64__) || \
    defined(__powerpc__) || defined(__powerpc64__))

and see if this helps;  Be aware that (a) I hope I got the correct __<foo>__ for powerpc and (b) at the moment I am assuming that this applies to both and we need both.  I am absolutely not sure which one is correct or needed for FreeBSD's powerpc support.
Comment 6 Mark Millard 2018-10-18 15:55:47 UTC
(In reply to Bjoern A. Zeeb from comment #5)

Tested: still crashes the same way when based on . . .

# svnlite diff /usr/src/sys/net/vnet.h
Index: /usr/src/sys/net/vnet.h
===================================================================
--- /usr/src/sys/net/vnet.h	(revision 339076)
+++ /usr/src/sys/net/vnet.h	(working copy)
@@ -273,7 +273,8 @@
 /* struct _hack is to stop this from being used with static data */
 #define	VNET_DEFINE(t, n)	\
     struct _hack; t VNET_NAME(n) __section(VNET_SETNAME) __used
-#if defined(KLD_MODULE) && (defined(__aarch64__) || defined(__riscv))
+#if defined(KLD_MODULE) && (defined(__aarch64__) || defined(__riscv) || \
+                            defined(__powerpc__) || defined(__powerpc64__))
 /*
  * As with DPCPU_DEFINE_STATIC we are unable to mark this data as static
  * in modules on some architectures.

(DPCPU_DEFINE_STATIC still not using such for powerpc family members)

So: not the same problem and track elsewhere.

Thanks for the test.
Comment 7 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-18 15:59:05 UTC
(In reply to Mark Millard from comment #6)

Yes testing dpcpu would be harder and I only wanted to know if this is actually the same problem.

Can you please open a separate PR with the data (feel free to put a reference in to this one saying "different issue").  If you also have a objdump of the epair vnet function we are crashing in (in the new PR) that would be helpful.

*** This bug has been marked as a duplicate of bug 223670 ***