Created attachment 210979 [details] dmesg.boot contents from 12.1 There has been a fix proposal for unwind on SPARC64 that is looking for testers (r356552). I'd like to give it a try, but I cannot get any -CURRENT kernel booting on my machine. Both cross-compiled kernels as well as natively-built ones seem to hit the same problem, so it's likely not a GCC9 issue. I'll attach a dmesg.boot file from a natively-built 12-STABLE system to give people a clue on what hardware the system has. The newest kernel that I tested is a cross-built r356986. I re-read AF3e's chapter on crash dumps, trying to provide something useful, but I guess that the crash happens too early and the system cannot dump anything, yet. So here's the serial output that I get: Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... jumping to kernel entry at 0xc00b8020. GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2020 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.0-CURRENT #0 r356986: Wed Jan 22 16:54:54 CET 2020 root@fbsdtest.omc.net:/usr/obj/usr/src/sparc64.sparc64/sys/GENERIC sparc64 gcc version 9.2.0 (FreeBSD Ports Collection for sparc64) WARNING: WITNESS option enabled, expect reduced performance. real memory = 1073741824 (1024 MB) avail memory = 1024761856 (977 MB) cpu0: Sun Microsystems UltraSparc-IIe Processor (548.00 MHz CPU) random: unblocking device. random: entropy device external interface [ath_hal] loaded WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 13.0. kbd0 at kbdmux0 WARNING: Device "openfirm" is Giant locked and may be deleted before FreeBSD 13.0. WARNING: Device "openprom" is Giant locked and may be deleted before FreeBSD 13.0. nexus0: <Open Firmware Nexus device> pcib0: <U2P UPA-PCI bridge> mem 0x1fe00000000-0x1fe0000ffff,0x1fe01000000-0x1fe010000ff irq 2032,2030,2031,2021 on nexus0 pcib0: Sabre, impl 0, version 0, IGN 0x1f, bus A, 66MHz pcib0: DVMA map: 0x60000000 to 0x63ffffff 8192 entries pci0: <OFW PCI bus> on pcib0 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 pci0: <old, non-VGA display device> at device 3.0 (no driver attached) dc0: <Davicom DM9102A 10/100BaseTX> port 0x10000-0x100ff mem 0-0xff at device 12.0 on pci0 miibus0: <MII bus> on dc0 amphy0: <DM9102 10/100 media interface> PHY 1 on miibus0 amphy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto dc0: Ethernet address: 00:03:ba:4e:55:e6 dc1: <Davicom DM9102A 10/100BaseTX> port 0x10100-0x101ff mem 0x2000-0x20ff at device 5.0 on pci0 miibus1: <MII bus> on dc1 amphy1: <DM9102 10/100 media interface> PHY 1 on miibus1 amphy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto dc1: Ethernet address: 00:03:ba:4e:55:e6 ohci0: <AcerLabs M5237 (Aladdin-V) USB controller> mem 0x1000000-0x1000fff at device 10.0 on pci0 usbus0 on ohci0 atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 o n pci0 atapci0: using PIO transfers above 137GB as workaround for 48bit DMA access bug, expect reduced performance ata2: <ATA channel> at channel 0 on atapci0 ata3: <ATA channel> at channel 1 on atapci0 cryptosoft0: <software crypto> on nexus0 nexus0: <syscons> type unknown (no driver attached) rtc0: <Real-Time Clock> at port 0x70-0x71 pnpid PNP0b00 on isa0 rtc0: registered as a time-of-day clock, resolution 1.000000s uart0: console (9600,n,8,1)> at port 0x3f8-0x3ff irq 43 pnpid PNP0501 on isa0 uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 pnpid PNP0501 on isa0 Timecounter "tick" frequency 548000000 Hz quality 1000 Event timer "tick" frequency 548000000 Hz quality 1000 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 Obsolete code will be removed soon: random(9) is the obsolete Park-Miller LCG from 1988 panic: invalid count 2 cpuid = 0 time = 1 KDB: stack backtrace: _end() at 0xc1416fb8 vpanic() at vpanic+0x31c panic() at panic+0x20 sched_switch() at sched_switch+0x8ac mi_switch() at mi_switch+0x1dc critical_exit_preempt() at critical_exit_preempt+0x88 spinlock_exit() at spinlock_exit+0x70 __mtx_unlock_spin_flags() at __mtx_unlock_spin_flags+0xb0 sched_add() at sched_add+0x2e8 gtaskqueue_start_threads() at gtaskqueue_start_threads+0x254 taskqgroup_cpu_create() at taskqgroup_cpu_create+0x124 taskqgroup_adjust() at taskqgroup_adjust+0x280 taskqgroup_adjust_softirq() at taskqgroup_adjust_softirq+0x34 mi_startup() at mi_startup+0x32c btext() at btext+0x28 KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x80: ta %xcc, 1 db> I'll gladly provide additional information if required. (BTW for those who care: The binutils fix for SPARC64 was accepted upsteam.)
This happened on other architectures after r355784; see the thread at https://lists.freebsd.org/pipermail/svn-src-all/2019-December/191362.html It was fixed for others in r355819. I'll apply the same change to sparc64.
A commit references this bug: Author: emaste Date: Thu Jan 23 14:11:03 UTC 2020 New revision: 357045 URL: https://svnweb.freebsd.org/changeset/base/357045 Log: Apply r355819 to sparc64 - fix assertion failure after r355784 From r355819: Repeat the spinlock_enter/exit pattern from amd64 on other architectures to fix an assert violation introduced in r355784. Without this spinlock_exit() may see owepreempt and switch before reducing the spinlock count. amd64 had been optimized to do a single critical enter/exit regardless of the number of spinlocks which avoided the problem and this optimization had not been applied elsewhere. This is completely untested - I have no obsolete Sparc hardware - but someone did try testing recent changes on sparc64 (PR 243534). PR: 243534 Changes: head/sys/sparc64/sparc64/machdep.c
FYI I have the GCC removal changes staged in a Git branch on GitHub at https://github.com/emaste/freebsd/tree/deorbit-gcc (which includes the change I just committed).
(In reply to Ed Maste from comment #1) Thanks for the quick fix! It solved that problem and lead to the machine boot further. With r357045 I'm getting a new panic, though (this time obviously from the VM system): Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... jumping to kernel entry at 0xc00b8020. GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2020 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.0-CURRENT #0 r357046: Thu Jan 23 15:54:21 CET 2020 root@fbsdtest.omc.net:/usr/obj/usr/src/sparc64.sparc64/sys/GENERIC sparc64 gcc version 9.2.0 (FreeBSD Ports Collection for sparc64) WARNING: WITNESS option enabled, expect reduced performance. real memory = 1073741824 (1024 MB) avail memory = 1024761856 (977 MB) cpu0: Sun Microsystems UltraSparc-IIe Processor (548.00 MHz CPU) random: unblocking device. random: entropy device external interface [ath_hal] loaded WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 13.0. kbd0 at kbdmux0 WARNING: Device "openfirm" is Giant locked and may be deleted before FreeBSD 13.0. WARNING: Device "openprom" is Giant locked and may be deleted before FreeBSD 13.0. nexus0: <Open Firmware Nexus device> pcib0: <U2P UPA-PCI bridge> mem 0x1fe00000000-0x1fe0000ffff,0x1fe01000000-0x1fe010000ff irq 2032,2030,2031,2021 on nexus0 pcib0: Sabre, impl 0, version 0, IGN 0x1f, bus A, 66MHz pcib0: DVMA map: 0x60000000 to 0x63ffffff 8192 entries pci0: <OFW PCI bus> on pcib0 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 pci0: <old, non-VGA display device> at device 3.0 (no driver attached) dc0: <Davicom DM9102A 10/100BaseTX> port 0x10000-0x100ff mem 0-0xff at device 12.0 on pci0 miibus0: <MII bus> on dc0 amphy0: <DM9102 10/100 media interface> PHY 1 on miibus0 amphy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto dc0: Ethernet address: 00:03:ba:4e:55:e6 dc1: <Davicom DM9102A 10/100BaseTX> port 0x10100-0x101ff mem 0x2000-0x20ff at device 5.0 on pci0 miibus1: <MII bus> on dc1 amphy1: <DM9102 10/100 media interface> PHY 1 on miibus1 amphy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto dc1: Ethernet address: 00:03:ba:4e:55:e6 ohci0: <AcerLabs M5237 (Aladdin-V) USB controller> mem 0x1000000-0x1000fff at device 10.0 on pci0 usbus0 on ohci0 atapci0: <AcerLabs M5229 UDMA66 controller> port 0x10200-0x10207,0x10218-0x1021b,0x10210-0x10217,0x10208-0x1020b,0x10220-0x1022f at device 13.0 on pci0 atapci0: using PIO transfers above 137GB as workaround for 48bit DMA access bug, expect reduced performance ata2: <ATA channel> at channel 0 on atapci0 ata3: <ATA channel> at channel 1 on atapci0 cryptosoft0: <software crypto> on nexus0 nexus0: <syscons> type unknown (no driver attached) rtc0: <Real-Time Clock> at port 0x70-0x71 pnpid PNP0b00 on isa0 rtc0: registered as a time-of-day clock, resolution 1.000000s uart0: console (9600,n,8,1)> at port 0x3f8-0x3ff irq 43 pnpid PNP0501 on isa0 uart1: <16550 or compatible> at port 0x2e8-0x2ef irq 43 pnpid PNP0501 on isa0 Timecounter "tick" frequency 548000000 Hz quality 1000 Event timer "tick" frequency 548000000 Hz quality 1000 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 Obsolete code will be removed soon: random(9) is the obsolete Park-Miller LCG from 1988 WARNING: WITNESS option enabled, expect reduced performance. ugen0.1: <AcerLabs OHCI root HUB> at usbus0 uhub0 on usbus0 uhub0: <AcerLabs OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 Trying to mount root from ufs:/dev/ada1a [rw]... Root mount waiting for: usbus0 CAM cd0 at ata3 bus 0 scbus1 target 1 lun 0 cd0: <TEAC CD-224E P.9A> Removable CD-ROM SCSI device cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present uhub0: 2 ports with 2 removable, self powered ada0 at ata2 bus 0 scbus0 target 0 lun 0 ada0: <IC35L060AVER07-0 ER6OA46A> ATA-5 device ada0: Serial Number SZPTZ202544 ada0: 66.700MB/s transfers (UDMA4, PIO 8192bytes) ada0: 58644MB (120103200 512 byte sectors) ada1 at ata3 bus 0 scbus1 target 0 lun 0 ada1: <IBM-DTLA-307015 TX2OA60A> ATA-5 device ada1: Serial Number YFEYFML4312 ada1: 66.700MB/s transfers (UDMA4, PIO 8192bytes) ada1: 14649MB (30003120 512 byte sectors) mountroot: waiting for device /dev/ada1a... panic: vm_page_assert_xbusied: page 0xfffff8009f65cb90 not exclusive busy @ /usr/src/sys/vm/vm_page.c:1555 cpuid = 0 time = 1579793596 KDB: stack backtrace: _end() at 0xc92e90f8 vpanic() at vpanic+0x31c panic() at panic+0x20 vm_page_object_remove() at vm_page_object_remove+0x16c vm_page_free_prep() at vm_page_free_prep+0xe4 vm_page_free_toq() at vm_page_free_toq+0x4 vm_page_free_zero() at vm_page_free_zero+0x10 pmap_release() at pmap_release+0xcc vmspace_free() at vmspace_free+0x9c start_init() at start_init+0x36c fork_exit() at fork_exit+0x6c fork_trampoline() at fork_trampoline+0x8 KDB: enter: panic [ thread pid 1 tid 100002 ] Stopped at kdb_enter+0x80: ta %xcc, 1 db> Again, I'll gladly provide additional information as needed.
sparc64's pmap_release() is freeing pages belonging to the TSB object, and the new vm_page_free() contract requires the caller to busy the page. diff --git a/sys/sparc64/sparc64/pmap.c b/sys/sparc64/sparc64/pmap.c index 46454795ad26..753bd6af5aa1 100644 --- a/sys/sparc64/sparc64/pmap.c +++ b/sys/sparc64/sparc64/pmap.c @@ -1301,6 +1301,7 @@ pmap_release(pmap_t pm) while (!TAILQ_EMPTY(&obj->memq)) { m = TAILQ_FIRST(&obj->memq); m->md.pmap = NULL; + vm_page_xbusy(m); vm_page_unwire_noq(m); vm_page_free_zero(m); }
(In reply to Mark Johnston from comment #5) Thank you, that fixed the second issue! I re-built the kernel after applying your patch and was able to successfully boot it up. So now we have a working SPARC64 -CURRENT kernel built using the xtoolchain. I'll see if I can build the userland, too, next.
A commit references this bug: Author: markj Date: Thu Jan 23 17:18:59 UTC 2020 New revision: 357055 URL: https://svnweb.freebsd.org/changeset/base/357055 Log: sparc64: Busy the TSB page before freeing it in pmap_release(). This is now required by vm_page_free(). PR: 243534 Reported and tested by: Michael Reim <kraileth@elderlinux.org> Changes: head/sys/sparc64/sparc64/pmap.c