Port Manager is experiencing repeatable panics on -current when trying to build packages on the Cavium Thunderx Platform: FreeBSD thunderx1.nyi.freebsd.org 13.0-CURRENT FreeBSD 13.0-CURRENT r340050 GENERIC-NODEBUG arm64 Fatal data abort: x0: e x1: ffff000000d8c118 x2: 21 x3: 103 x4: fffffd1f296aa000 x5: ffff0001780e0418 x6: ffff0001780e051c x7: ffff0001780e050c x8: 1 x9: 1 x10: 21 x11: 0 x12: 0 x13: b5400703 x14: ffffffffa160 x15: 4062b344 x16: 401b3f94 x17: ffffffffa4f0 x18: ffff0001780e0360 x19: ffff0000011bd9a8 x20: fffffd0000000000 x21: fffffd1f296aa000 x22: ffff0000011bd000 x23: 0 x24: ffff000062000000 x25: 0 x26: e x27: 407fc0001 x28: 1 x29: ffff0001780e03d0 sp: ffff0001780e0360 lr: ffff0000006f77a4 elr: ffff0000006f77a8 spsr: 60400345 far: 5e esr: 96000007 panic: vm_fault failed: ffff0000006f77a8 cpuid = 8 time = 1541768414 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0xffff0000006e4b3c lr = 0xffff0000000f70b8 sp = 0xffff0001780dfd50 fp = 0xffff0001780dff60 db_trace_self_wrapper() at vpanic+0x1a8 pc = 0xffff0000000f70b8 lr = 0xffff0000003b0ffc sp = 0xffff0001780dff70 fp = 0xffff0001780e0020 vpanic() at panic+0x44 pc = 0xffff0000003b0ffc lr = 0xffff0000003b0e50 sp = 0xffff0001780e0030 fp = 0xffff0001780e00b0 panic() at data_abort+0x1d8 pc = 0xffff0000003b0e50 lr = 0xffff0000006fd93c sp = 0xffff0001780e00c0 fp = 0xffff0001780e0170 data_abort() at do_el1h_sync+0x11c pc = 0xffff0000006fd93c lr = 0xffff0000006fd660 sp = 0xffff0001780e0180 fp = 0xffff0001780e01b0 do_el1h_sync() at handle_el1h_sync+0x74 pc = 0xffff0000006fd660 lr = 0xffff0000006e7074 sp = 0xffff0001780e01c0 fp = 0xffff0001780e02d0 handle_el1h_sync() at pmap_enter_l2+0x128 pc = 0xffff0000006e7074 lr = 0xffff0000006f77a0 sp = 0xffff0001780e02e0 fp = 0xffff0001780e03d0 pmap_enter_l2() at pmap_enter+0x104 pc = 0xffff0000006f77a0 lr = 0xffff0000006f6830 sp = 0xffff0001780e03e0 fp = 0xffff0001780e0470 pmap_enter() at vm_fault_hold+0xecc pc = 0xffff0000006f6830 lr = 0xffff000000691174 sp = 0xffff0001780e0480 fp = 0xffff0001780e05f0 vm_fault_hold() at vm_fault+0x60 pc = 0xffff000000691174 lr = 0xffff000000690250 sp = 0xffff0001780e0600 fp = 0xffff0001780e0630 vm_fault() at data_abort+0xa0 pc = 0xffff000000690250 lr = 0xffff0000006fd804 sp = 0xffff0001780e0640 fp = 0xffff0001780e06f0 data_abort() at do_el1h_sync+0x11c pc = 0xffff0000006fd804 lr = 0xffff0000006fd660 sp = 0xffff0001780e0700 fp = 0xffff0001780e0730 do_el1h_sync() at handle_el1h_sync+0x74 pc = 0xffff0000006fd660 lr = 0xffff0000006e7074 sp = 0xffff0001780e0740 fp = 0xffff0001780e0850 handle_el1h_sync() at exec_copyin_args+0x88 pc = 0xffff0000006e7074 lr = 0xffff0000003683bc sp = 0xffff0001780e0860 fp = 0xffff0001780e0920 exec_copyin_args() at sys_execve+0x3c pc = 0xffff0000003683bc lr = 0xffff00000036821c sp = 0xffff0001780e0930 fp = 0xffff0001780e09a0 sys_execve() at do_el0_sync+0x4f8 pc = 0xffff00000036821c lr = 0xffff0000006fdeac sp = 0xffff0001780e09b0 fp = 0xffff0001780e0a70 do_el0_sync() at handle_el0_sync+0x84 pc = 0xffff0000006fdeac lr = 0xffff0000006e7200 sp = 0xffff0001780e0a80 fp = 0xffff0001780e0b90 handle_el0_sync() at 0x24940 pc = 0xffff0000006e7200 lr = 0x0000000000024940 sp = 0xffff0001780e0ba0 fp = 0x0000ffffffffa650
Could the kernel be changed from GENERIC-NODEBUG to GENERIC to see if any INVARIANTS/WITNESS is triggered?
Created attachment 199094 [details] proposed patch Sean, if you're able to test kernel patches on this system, please give the attached patch a try.
(In reply to Mark Johnston from comment #2) Patch applied and restarted bulk build. If this can survive ~24 hours, during the bulk run, I'd mark this as fixed.
(In reply to Mark Johnston from comment #2) Still up and running building packages. I think you got it.
(In reply to Sean Bruno from comment #4) Thanks. In a review Alan noted that the real problem actually appears to be in arm64-specific code, so I'll have to write an alternate patch. I'll update this PR once that's ready.
Created attachment 199188 [details] proposed patch Could you please try this patch instead of the old one?
markj@ working to have this fixed in 12.0
Created attachment 199259 [details] proposed patch (In reply to Mark Johnston from comment #6) Assuming you haven't already started testing the new patch, please try this one instead. It fixes a flaw in the first version.
(In reply to Mark Johnston from comment #8) We've applied this to the package builder this morning. I'll report back after it runs for a day or two.
(In reply to Sean Bruno from comment #9) The cavium box is running full out and seems to be super stable. Thank you! http://thunderx1.nyi.freebsd.org/index.html
A commit references this bug: Author: markj Date: Tue Nov 20 15:12:37 UTC 2018 New revision: 340678 URL: https://svnweb.freebsd.org/changeset/base/340678 Log: Handle kernel superpage mappings in pmap_remove_l2(). PR: 233088 Reviewed by: alc, andrew, kib Tested by: sbruno MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17981 Changes: head/sys/arm64/arm64/pmap.c
A commit references this bug: Author: markj Date: Tue Nov 20 17:43:24 UTC 2018 New revision: 340685 URL: https://svnweb.freebsd.org/changeset/base/340685 Log: MFstable/12 r340680: Handle kernel superpage mappings in pmap_remove_l2(). PR: 233088 Approved by: re (gjb) Changes: _U releng/12.0/ releng/12.0/sys/arm64/arm64/pmap.c