Hello, after updating a RPi 3 from r356417 to r356826 the RPi freezes while booting. It does not respond to USB keyboard or serial console any more, so I am unable to generate a crashdump or a core. The last console output is always: Creating and/or trimming log files. Starting syslogd. NFS access cache time=60 No core dumps found. Clearing /tmp (X related). Recovering vi editor sessions. Updating motd:. Mounting late filesystems:. Those RPi is booting from a ZFS mirrored pool. By bisecting I found that base r356776 is the first release which that behavior. r356767 is the last one which booted fine. Ralf
Created attachment 211016 [details] Verbose boot of with ntpd disabled till multiuser login and failed CPU reset. Doing some additional digging I learned that disabeling ntpd via /etc/rc.conf gets the system to boot till multiuser login with the r356776 kernel. Loading the mac_ntpd kernel module immediately freezes the system. I tried to load it without the -q option, but there are no additional messages shown before the system freezes. I also learned that rebooting via shutdown(8) -r or CTRL-ALT-DEL does not work any more. The last console message is always "cpu_reset failed". The tests were done with a r356776 and a r357073 kernel. Both give the same results. The attached console log is made from the newer r357073 kernel.
(In addition to my comment #1) > Loading the mac_ntpd kernel module immediately freezes the system. Preloading, via the loader(8), does not.
(In reply to iz-rpi03 from comment #2) Letting the system work with a preloaded mac_ntpd kernel module seems not to be a smart idea. Because during a make installworld the kernel panics: # panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd0000f33560, blocked for 1802833 ticks cpuid = 0 time = 1579967871 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc = 0xffff00000073b34c lr = 0xffff000000106814 sp = 0xffff00005214b580 fp = 0xffff00005214b790 vpanic() at panic+0x44 pc = 0xffff000000409838 lr = 0xffff0000004095e0 sp = 0xffff00005214b860 fp = 0xffff00005214b8e0 panic() at deadlkres+0x2f8 pc = 0xffff0000004095e0 lr = 0xffff0000003a46f4 sp = 0xffff00005214b8f0 fp = 0xffff00005214b940 KDB: enter: panic [ thread pid 0 tid 100083 ] Stopped at 0
(In reply to iz-rpi03 from comment #3) It's not a good idea to try kldload mac_ntpd either 8-) On an RPI3 running FreeBSD 13.0-CURRENT (GENERIC) #0 r357296M: Thu Jan 30 13:39:42 PST 2020 the result was a silent hang, with backtrace reporting > KDB: enter: Break to debugger > [ thread pid 1116 tid 100085 ] > Stopped at 0x4031ed44 > db> bt > Tracing pid 1116 tid 100085 td 0xfffffd0001020000 > db_trace_self() at db_stack_trace+0xf8 > pc = 0xffff00000073c55c lr = 0xffff000000103c58 > sp = 0xffff00005a4bc830 fp = 0xffff00005a4bc860 > > db_stack_trace() at db_command+0x228 > pc = 0xffff000000103c58 lr = 0xffff0000001038d0 > sp = 0xffff00005a4bc870 fp = 0xffff00005a4bc950 > > db_command() at db_command_loop+0x58 > pc = 0xffff0000001038d0 lr = 0xffff000000103678 > sp = 0xffff00005a4bc960 fp = 0xffff00005a4bc980 > > db_command_loop() at db_trap+0xf4 > pc = 0xffff000000103678 lr = 0xffff00000010697c > sp = 0xffff00005a4bc990 fp = 0xffff00005a4bcbb0 > > db_trap() at kdb_trap+0x1d8 > pc = 0xffff00000010697c lr = 0xffff0000004510a0 > sp = 0xffff00005a4bcbc0 fp = 0xffff00005a4bcc70 > > kdb_trap() at do_el1h_sync+0xf4 > pc = 0xffff0000004510a0 lr = 0xffff000000759568 > sp = 0xffff00005a4bcc80 fp = 0xffff00005a4bccb0 > > do_el1h_sync() at handle_el1h_sync+0x78 > pc = 0xffff000000759568 lr = 0xffff00000073f078 > sp = 0xffff00005a4bccc0 fp = 0xffff00005a4bcdd0 > > handle_el1h_sync() at kdb_alt_break_internal+0x130 > pc = 0xffff00000073f078 lr = 0xffff000000450858 > sp = 0xffff00005a4bcde0 fp = 0xffff00005a4bce80 > > kdb_alt_break_internal() at kdb_alt_break+0xc > pc = 0xffff000000450858 lr = 0xffff000000450718 > sp = 0xffff00005a4bce90 fp = 0xffff00005a4bce90 > > kdb_alt_break() at uart_intr_rxready+0x88 > pc = 0xffff000000450718 lr = 0xffff000000263124 > sp = 0xffff00005a4bcea0 fp = 0xffff00005a4bcec0 > > uart_intr_rxready() at uart_intr+0xec > pc = 0xffff000000263124 lr = 0xffff000000263e0c > sp = 0xffff00005a4bced0 fp = 0xffff00005a4bcf00 > > uart_intr() at intr_event_handle+0xc8 > pc = 0xffff000000263e0c lr = 0xffff0000003cbaf4 > sp = 0xffff00005a4bcf10 fp = 0xffff00005a4bcf50 > > intr_event_handle() at intr_isrc_dispatch+0x34 > pc = 0xffff0000003cbaf4 lr = 0xffff00000078faa4 > sp = 0xffff00005a4bcf60 fp = 0xffff00005a4bcf70 > > intr_isrc_dispatch() at bcm2835_intc_intr+0xa8 > pc = 0xffff00000078faa4 lr = 0xffff000000729a50 > sp = 0xffff00005a4bcf80 fp = 0xffff00005a4bcfc0 > > bcm2835_intc_intr() at intr_event_handle+0xc8 > pc = 0xffff000000729a50 lr = 0xffff0000003cbaf4 > sp = 0xffff00005a4bcfd0 fp = 0xffff00005a4bd010 > > intr_event_handle() at intr_isrc_dispatch+0x34 > pc = 0xffff0000003cbaf4 lr = 0xffff00000078faa4 > sp = 0xffff00005a4bd020 fp = 0xffff00005a4bd030 > > intr_isrc_dispatch() at bcm_lintc_intr+0x1e8 > pc = 0xffff00000078faa4 lr = 0xffff00000072fc88 > sp = 0xffff00005a4bd040 fp = 0xffff00005a4bd0c0 > > bcm_lintc_intr() at intr_irq_handler+0x74 > pc = 0xffff00000072fc88 lr = 0xffff00000078f904 > sp = 0xffff00005a4bd0d0 fp = 0xffff00005a4bd0f0 > > intr_irq_handler() at handle_el1h_irq+0x74 > pc = 0xffff00000078f904 lr = 0xffff00000073f140 > sp = 0xffff00005a4bd100 fp = 0xffff00005a4bd210 > > handle_el1h_irq() at rms_wlock+0x124 > pc = 0xffff00000073f140 lr = 0xffff000000403fc4 > sp = 0xffff00005a4bd220 fp = 0xffff00005a4bd2b0 > > rms_wlock() at rms_wlock+0x124 > pc = 0xffff000000403fc4 lr = 0xffff000000403fc4 > sp = 0xffff00005a4bd2c0 fp = 0xffff00005a4bd360 > > rms_wlock() at mac_policy_modevent+0xc0 > pc = 0xffff000000403fc4 lr = 0xffff00000066ba44 > sp = 0xffff00005a4bd370 fp = 0xffff00005a4bd3c0 > > mac_policy_modevent() at module_register_init+0xc4 > pc = 0xffff00000066ba44 lr = 0xffff0000003e783c > sp = 0xffff00005a4bd3d0 fp = 0xffff00005a4bd400 > > module_register_init() at linker_load_module+0xab8 > pc = 0xffff0000003e783c lr = 0xffff0000003d8670 > sp = 0xffff00005a4bd410 fp = 0xffff00005a4bd740 > > linker_load_module() at kern_kldload+0xec > pc = 0xffff0000003d8670 lr = 0xffff0000003d9e40 > sp = 0xffff00005a4bd750 fp = 0xffff00005a4bd780 > > kern_kldload() at sys_kldload+0x64 > pc = 0xffff0000003d9e40 lr = 0xffff0000003d9f98 > sp = 0xffff00005a4bd790 fp = 0xffff00005a4bd7b0 > > sys_kldload() at do_el0_sync+0x514 > pc = 0xffff0000003d9f98 lr = 0xffff000000759bf4 > sp = 0xffff00005a4bd7c0 fp = 0xffff00005a4bd860 > > do_el0_sync() at handle_el0_sync+0x90 > pc = 0xffff000000759bf4 lr = 0xffff00000073f224 > sp = 0xffff00005a4bd870 fp = 0xffff00005a4bd980 > > handle_el0_sync() at 0x210368 > pc = 0xffff00000073f224 lr = 0x0000000000210368 > sp = 0xffff00005a4bd990 fp = 0x0000ffffffffeaf0 > > db> HTH, bob prohaska
(In reply to Bob Prohaska from comment #4) Sorry, forgot to add this machine is running ufs.
To answer a question form the ARM mailing-list if my system also silent hangs whenit kldloads other modules than mac_ntp: No, or better none I use regularly. Tested with r357112 kernel and world, immediately after a fresh boot: # kldstat Id Refs Address Size Name 1 21 0xffff000000000000 14f81e8 kernel 2 1 0xffff0000014f9000 2eed00 zfs.ko 3 1 0xffff0000017e8000 214a8 mac_ntpd.ko 4 2 0xffff00000180a000 284d8 opensolaris.ko 5 1 0xffff00005c400000 21000 uhid.ko 6 1 0xffff00005c421000 21000 wmt.ko # While kernel, zfs, opensolaris and mac_ntp are loaded by the loader, uhid and wmt are loaded later. From /var/run/dmesg.boot ... Configuring vt: keymap. Autoloading module: uhid.ko Autoloading module: wmt.ko uhid0 on uhub1 ... And after mounting a smbfs: # kldstat Id Refs Address Size Name 1 30 0xffff000000000000 14f81e8 kernel 2 1 0xffff0000014f9000 2eed00 zfs.ko 3 1 0xffff0000017e8000 214a8 mac_ntpd.ko 4 2 0xffff00000180a000 284d8 opensolaris.ko 5 1 0xffff00005c400000 21000 uhid.ko 6 1 0xffff00005c421000 21000 wmt.ko 7 1 0xffff00005c442000 32000 smbfs.ko 8 2 0xffff00005c474000 21000 libiconv.ko 9 2 0xffff00005c495000 21000 libmchain.ko #
Just an update: Tried r357408 because of the bug fix made in r357392. System still get unresponsive when kldloading mac_ntpd. "cpu_reset failed" still happens when trying to reboot. Regardless how.
Created attachment 211398 [details] Some debugger outputs after waiting more than 2 hours for somenthing (may be a panic) to happen. Inspired by the debugger commands in a syscaller crash report. It is made with a r356776 kernel. Trying to create a dump finally freezes the debugger as well.
(In reply to iz-rpi03 from comment #8) FYI: Your verbose boot console output (or dmesg -a outuput) shows multiple: psci0: PSCI version number mismatched with DT device_attach: psci0 attach returned 6 psci0: <ARM Power State Co-ordination Interface Driver> on ofwbus0 but none of: psci0: PSCI version 0.2 compatible So starting the extra cores and doing reset/reboot and such are not likely to work: The "Power State Co-ordination Interface" is apparently part of how such activities are done. From what I can tell not all that much should be expected to be well behaved after it fails to produce the "PSCI version 0.2 compatible" notice (and not produce the other type of notices). For example, the extra cores (APs) end up with the notice: Release APs...APs not started That seems to be an expected result given the earlier notices. At least on a RPi4B, the only earlier verbose boot output difference vs. an older kernel that worked was a difference for one range of "Excluded memory", 10 fewer pages excluded for the working kernel ( -r356767 ) for that specific range. (I've no information on if this is significant to the problems or not.) (I ignores the lines that list the rdddddd version and build time and such: expected to be different.)
(In reply to Mark Millard from comment #9) For the artifact.ci's kernel for -r356776 the PSCI problem still ooccurs but the "Excluded memory" was the smaller size ( like for the working -r356767 ). So, somehow, -r356776 makes the difference that leads to not identifying the PSCI version on an RPi4B (and likely on any RPi3). That in turn means much is messed up from lack of PSCI activity.
A couple of work arounds are now known. One updates the applicable sys_util/u-boot-rpi* . The example was for RPI4B's in : https://lists.freebsd.org/pipermail/freebsd-arm/2020-February/021258.html The other modified FreeBSD, as shown in the svnlite diff in: https://lists.freebsd.org/pipermail/freebsd-arm/2020-February/021240.html Both updates cause the FreeBSD kernel to be told to leave alone the first 2 pages of the address space. Without any changes, u-boot-rpi* is causing the kernel to be told to avoid only the first page. The FreeBSD specific armstub8*.bin is bigger than one page but does not fill 2 pages (yet?). The long term solution for an official update may be more involved, such as having the number of pages auto-adjust for the number of bytes of content in the armstub8*.bin involved. Nor is it obvious to me which of multiple ways to get the "avoid these pages" information to the kernel would be used.
(In reply to Mark Millard from comment #11) FYI: It is not Jeff's defect to solve. His changes just exposed that the kernel is not being told the correct information about what pages to avoid allocating or touching. Who would deal with this is unclear to me. How it is dealt with may or may not involve kernel or other FreeBSD changes.
This problem was dealt with long ago. It involved getting more complete information to the FreeBSD kernel, such that the kernel would avoid using the problematical low memory pages. Should be closed, probably as overcome by events because the change was not in the base system.