While trying to do an exp-run on the FreeBSD ports tree on riscv64, I found that the machine would hang after a few hours of crunching on the ports. I initially thought this was HW related, but was able to reproduce the hang on two emulators (QEMU and RVVM). Conditions in which this happens: - poudriere-bulk -a - on a system with root on ZFS - 4 or 8 cores - 16 GB of RAM - on SiFive Unmatched, QEMU, or RVVM (see port emulators/rvvm) - USE_TMPFS is set to "data localbase" - sufficient disk space is available Other conditions may apply. This may reproduce with root on UFS, though I have not tried. The hang seems to be a lifelock: in the emulators, all cores seem to be peaked at 100% load. Machine doesn't ping anymore and doesn't respond to the serial console. The hang doesn't happen immediately but rather after a few hours to days of building. Once it has happened once, it seems to happen within a few (~6) hours when I restart the build. It is possible that certain ports trigger the conditions needed for the hang. To reproduce, set up a riscv64 machine with 15-CURRENT root on ZFS, get a ports tree, install Poudriere, build a jail, and do "poudriere bulk -a". Then wait until the machine hangs.
Problem also occurs when building with tmpfs disabled.
Hi, A shot in the dark, but might be related to new vnlru problems I'm observing on some full-ZFS machines when building multiple worlds at once since an upgrade to a recent stable/14 (it might be that commit ab05a1cf321aca0fe632c1ab40f68630b477422c has something to do with it, but I have not thoroughly analyzed the situation yet). Are you able to obtain backtraces of all processes during a livelock (`procstat -a -kk`)? Try keeping a console with an open `top` running, which will allow you to kill some processes during the livelock without spawning more processes, which after some time apparently enables the system to run again (after a long while, in my experiments). Also, please post the output of `sysctl kern.maxfiles kern.maxfilesperproc vfs.vnode` (before the livelock, and if you're able to, also during it). Regards.
(In reply to Olivier Certner from comment #2) The system completely locks up and doesn't respond to any input whatsoever. No console, no ping. If it happens in an emulator, I observe that all cores are pegged at 100%. So unfortunately I cannot introspect the system further.
(In reply to Robert Clausecker from comment #3) If your system doesn't respond to any input, then that's most probably something else, as I'm still able to type and switch virtual consoles when this happens.
As has been said multiple times on IRC, attach KGDB to QEMU when this happens and report on the system state.
I managed to catch the problem in the act with some threads hanging as I tried to shut down the system. Here's a transcript: http://fuz.su/~fuz/freebsd/riscv-hang.txt Unfortunately I then exited ddb and the system immediately hung completely.
(In reply to Robert Clausecker from comment #6) Most of the threads are in sbi_remote_fence_i(), i.e., they're waiting for other cores to finish flushing their icaches. Aside from that being super inefficient (we really need to be tracking whether the flush is required when mapping a given page with PROT_EXEC; most of the time it's not required), it's hard to tell why the system apparently isn't making progress. Do the remote harts need to have interrupts enabled in order to acknowledge SBI calls?
I am going to take an informed guess that this might a bug in OpenSBI. The version provided by sysutils/opensbi sat at v1.4 for some time. A quick log of the commits to the IPI code since that version yields one interesting candidate: commit be9752a071475ae1d9e58a2dfcb8e83185fb7ae5 Author: Samuel Holland <samuel.holland@sifive.com> Date: Fri Oct 25 11:59:46 2024 -0700 lib: sbi_ipi: Make .ipi_clear always target the current hart All existing users of this operation target the current hart, and it seems unlikely that a future user will need to clear the pending IPI status of a remote hart. Simplify the logic by changing .ipi_clear (and its wrapper sbi_ipi_raw_clear()) to always operate on the current hart. This incidentally fixes a bug introduced in commit 78c667b6fc07 ("lib: sbi: Prefer hartindex over hartid in IPI framework"), which changed the .ipi_clear parameter from a hartid to a hart index, but failed to update the warm_init functions to match. Fixes: 78c667b6fc07 ("lib: sbi: Prefer hartindex over hartid in IPI framework") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> A bug in clearing the IPI status, when multiple harts are attempting an IPI broadcast concurrently, might explain the livelock we are seeing. I did not inspect the implementation to verify this. Notably, the buggy commit was present in the v1.4 release, but this fix was not. I recently (last week) updated the sysutils/opensbi port to v1.6, and dependent u-boot ports were bumped. So, I suggest you update your firmware, keep running things the usual way, and if the livelocks continue to manifest report back here.
(In reply to Mitchell Horne from comment #8) Thanks, I'll give it a try. Unfortunately emulators/rvvm requires a version of u-boot more recent than what we package, so that'll be a bit annoying.
I have today updated the machine to the U-Boot package shipped in current main branch (that is, u-boot-sifive-fu740-2024.07_1) and started another poudriere-bulk run. Unfortunately the problem still occurs; the system just livelocked again.
(In reply to Robert Clausecker from comment #10) recently i've got an sifive unmatch, i'm gonna take a look at this problem.. that is the make_job set on the machiene?
(In reply to Huan Zhou from comment #11) s/that/what/g
(In reply to Huan Zhou from comment #11) MAKE_JOBS_NUMBER is set to 2, PARALLEL_JOBS is set to 3. The machine runs on ZFS. Happy investigation!
(In reply to Robert Clausecker from comment #13) hey fuz, i've run the poudriere bulk -a with default settings for about 15 hours, and the whole system still works fine... ➜ ~ fastfetch ``` ` root@bsd ` `.....---.......--.``` -/ -------- +o .--` /y:` +. OS: FreeBSD 14.2-RELEASE riscv yo`:. :o `+- Host: SiFive HiFive Unmatched A00 y/ -/` -o/ Kernel: FreeBSD 14.2-RELEASE .- ::/sy+:. Uptime: 20 hours, 42 mins / `-- / Shell: zsh 5.9 `: :` Terminal: /dev/pts/0 `: :` Memory: 10.80 GiB / 15.94 GiB (68%) / / Swap: 0 B / 2.00 GiB (0%) .- -. Disk (/): 4.86 GiB / 446.60 GiB (1%) - zfs -- -. Disk (/boot/efi): 1.44 MiB / 259.91 MiB (1%) `:` `:` Disk (/zroot): 96.00 KiB / 441.73 GiB (0%) - .-- `--. Local IP (cgem0): 192.168.4.204/24 .---.....----. Locale: C.UTF-8 ``` does it only appear on 15-CURRENT? after i bulk all the ports i'll check again on 15-CURRENT btw, i also run `poudriere bulk -a` with qemu_user_static in poudriere and also works fine in several days until i manually shut it down. the config of that machiene is here ``` ➜ ~ fastfetch ``` ` root@freebsd ` `.....---.......--.``` -/ ------------ +o .--` /y:` +. Kernel: FreeBSD 14.2-RELEASE yo`:. :o `+- Uptime: 1 day, 11 hours, 52 mins y/ -/` -o/ Packages: 1171 (pkg) .- ::/sy+:. Shell: zsh 5.9 / `-- / Terminal: /dev/pts/0 `: :` CPU: Intel(R) Core(TM) i5-3470 (4) @ 3.19 GH `: :` GPU: Intel Xeon E3-1200 v2/3rd Gen Core proc / / Memory: 7.35 GiB / 7.85 GiB (94%) .- -. Swap: 0 B / 2.00 GiB (0%) -- -. Disk (/): 60.42 GiB / 873.58 GiB (7%) - zfs `:` `:` Disk (/zroot): 96.00 KiB / 813.16 GiB (0%) - .-- `--. Local IP (re0): 192.168.4.102/24 .---.....----. Locale: C.UTF-8 ``` if the problem is caused by too high workload, i think most of the platform(*nix/windows) has the problem... also i doubt that you power supply is a little bit low? i have a 550 watt gold psu for the unmatch, and a 512G ssd, the fans also run whole time at full speed(14000rpm)...
(In reply to Huan Zhou from comment #14) > hey fuz, i've run the poudriere bulk -a with default settings for about 15 hours, and the whole system still works fine... It can take a few days for the issues to manifest. I run 15-CURRENT and cannot say if the same issue appears on 14.2. Please do test with the settings I gave you. Although it would be nice if things worked with other settings, we're trying to fix the bug here and not find workarounds. > btw, i also run `poudriere bulk -a` with qemu_user_static in poudriere and also works fine in several days until i manually shut it down. Yeah no shit. This is a firmware or kernel issue pertaining riscv64 and with qemu_user_static (which I never got to work since an update a few years ago), you don't run a riscv64 kernel or firmware. That said, the issue did reproduce for me both with emulators/rvvm and qemu-system. > if the problem is caused by too high workload, i think most of the platform(*nix/windows) has the problem... Absolutely not. I never had this kind of lockup on any of my other Poudriere setups and those are heavily loaded, too. > also i doubt that you power supply is a little bit low? i have a 550 watt gold psu for the unmatch, and a 512G ssd, the fans also run whole time at full speed(14000rpm)... The datasheet says that the max power consumption of the board is 150 W and in my measurements, I didn't get it even close to that. Don't worry about that. Also, the consequence of a brownout is not usually a livelock.
There seems to be a correlation between the system livelocking and it attempting to build the Go toolchain. Maybe the way the Go garbage collector works exacerbates the OpenSBI livelock issue?
Created attachment 259681 [details] proposed patch I think this might be the result of a mistake in commit c226f193515c. Does the attached patch help?
(In reply to Mark Johnston from comment #17) I was on a kernel state too old to have this commit. Let me update to the most recent one and test your patch. Specifically, I was on src 4771c2e9d1c7db949a82dfe4f2b9878bb358a50e.
Updated to main-n276657-ccaf78c962e8 and got an almost hang like this (it was possible to CR ~ ^P panic the kernel to enter the debugger): http://fuz.su/~fuz/freebsd/riscv-hang2.txt At the same time, there was a ZFS problem: nvme0: Resetting controller due to a timeout and possible hot unplug. nvme0: event="start" nvme0: event="timed_out" nda0 at nvme0 bus 0 scbus0 target 0 lun 1 nda0: <SanDisk Extreme Pro 1TB 111110WD 20480E801756> s/n 20480E801756 detached nvme0: Failed controller, stopping watchdog timeout. Solaris: WARNING: Pool 'risotto' has encountered an uncorrectable I/O failure and has been suspended. So this may or may not be the same hang. I'll keep investigating.
Got another hang with the current kernel version. That wasn't it.
So I'm thinking; I usually run my poudriere-bulk runs under idle priority (idprio 5). Could this be a factor in causing this livelock?
(In reply to Robert Clausecker from comment #21) That'd be a bug of course, but it's possible. Have you been able to reproduce the hang without idprio? Are you able to get kernel dumps from this system?
(In reply to Robert Clausecker from comment #21) I'd say it's quite unlikely. There's somewhere (but right now don't remember where) a mention that the reason for not allowing every users to use idprio is the risk of deadlocks. I've reviewed quite a lot of code related to idprio (and also have lots of changes, yet uncommitted), and that convinced me that deadlocks should not be possible as if the kernel has to sleep to obtain some resource, it will normally boost its priority above the idle class and will eventually make progress even on a loaded machine (if it's not hold by realtime processes), so even if it holds another resource itself, it should eventually release it. I might have missed some problems though (or maybe I've already "fixed" some I've forgotten about in some now-old uncommitted code). Did you tweak `kern.sched.static_boost`, or other scheduler tunables? I would anyway follow Mark's advice: Try without idprio and see if you can reproduce the deadlock/livelock, to determine whether this peculiar configuration plays a role here.
(In reply to Mark Johnston from comment #22) > Have you been able to reproduce the hang without idprio? I'll try that next. Once again, there are no kernel dumps as the system is completely stuck in a live lock, not even reacting to CR ~ ^B in the serial console. I can only reboot it by turning off power and then turning it back on. > Did you tweak `kern.sched.static_boost`, or other scheduler tunables? /etc/sysctl.conf only has this entry: security.bsd.unprivileged_idprio=1 /boot/loader.conf has: kern.geom.label.disk_ident.enable="0" kern.geom.label.gptid.enable="0" cryptodev_load="YES" zfs_load="YES" radeonkms_load="YES" kern.vty="vt" hint.uart.1.disabled="1" which should not affect anything. It unfortunately takes very long for me to have another shot after a hang as the machine needs to be physically power cycled and the admin of the datacenter it is colocated in is a bit fed up with doing so, so he'll hold it off until the next time he has to go down into the datacenter (about once a week).
(In reply to Robert Clausecker from comment #24) > /etc/sysctl.conf only has this entry: (snip) > /boot/loader.conf has: (snip) Yes, this shouldn't affect anything. Let's see.
I changed two things: - only one job per build jail (as opposed to two) - no idprio Build has been running for a day without crashing now. Maybe this avoids the problem?
(In reply to Robert Clausecker from comment #26) Have you continued running the build? Still no issues? I understand that it takes time, but it would help if you could also replay your scenario with only one change at a time, e.g., just "no idprio", but still two jobs per build jail, just to be sure.
(In reply to Olivier Certner from comment #27) My test build went through with no problems. > I understand that it takes time, but it would help if you could also replay your scenario with only one change at a time, e.g., just "no idprio", but still two jobs per build jail, just to be sure. I'll try “with idprio, but only one job per jail next.” I'm fairly certain it's the “one job per jail” setting that works around the issue as it avoids languages like Go multithreading.
(In reply to Robert Clausecker from comment #28) If you're fairly certain, then maybe test "one job per jail" first?
(In reply to Olivier Certner from comment #29) Right now I'm testing “idprio on, one job per jail” which seems to also not crash. This differs from the crashing configuration only in that I only run one job per jail instead of two. I could also test “idprio off, two jobs per jail,” but once that has crashed the machine it may take a week or two for it to go back up.