Created attachment 245683 [details] head of /var/crash/core.txt.2 Around fourteen minutes after beginning a run of poudriere-devel, poudriere bulk -j main -J 3 -Ctv mail/thunderbird Dump header from device: /dev/ada1p2 Architecture: amd64 Architecture Version: 2 Dump Length: 4845637632 Blocksize: 512 Compression: none Dumptime: 2023-10-16 15:18:14 +0100 Hostname: mowa219-gjp4-8570p-freebsd Magic: FreeBSD Kernel Dump Version String: FreeBSD 15.0-CURRENT #1 main-n265830-fee14577d590-dirty: Mon Oct 9 13:08:19 BST 2023 grahamperrin@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC Panic String: sleepq_add: td 0xfffffe0144345740 to sleep on wchan 0xffffffff8543a120 with sleeping prohibited Dump Parity: 1912812553 Bounds: 2 Dump Status: good
Created attachment 245684 [details] poudriere bulk results, viewed after the event
linux-nvidia-libs-470-470.161.03 nvidia-driver-470-470.161.03
locking(9) <https://man.freebsd.org/cgi/man.cgi?query=locking&sektion=9&manpath=freebsd-release> <https://www.freshports.org/x11/nvidia-driver-470/> Cc: danfe@ (maintainer) At <https://discord.com/channels/727023752348434432/831066226074976267/1163537365134999622> I mentioned a post in NVIDIA Developer Forums: <https://forums.developer.nvidia.com/t/panic-related-to-nvkms-timers-lock-sx-lock/55376> (2017-11-20) > panic related to nvkms_timers.lock (sx lock) Re: comment #2: > That one might not have the fix … > > Basically it got switched to use a spin mutex instead
Created attachment 248704 [details] A photograph of one of two external displays that were in use when the panic occurred Today's panic occurred whilst playing GeoGuessr <https://www.geoguessr.com/> in Chromium, full screen. Space bar pressed and held after a click on the on-screen blue arrow (to travel as fast as possible). On the second external display: nothing other than Teams in Firefox. Note: play in Chromium coincided with a kernel panic at least once before … for what it's worth, I tend to avoid Chromium for activities such as this (I did suspect that a panic would occur before I began play today). ---- Dump header from device: /dev/ada1p2 Architecture: amd64 Architecture Version: 2 Dump Length: 3358982144 Blocksize: 512 Compression: none Dumptime: 2024-02-23 13:29:55 +0000 Hostname: mowa219-gjp4-zbook-freebsd Magic: FreeBSD Kernel Dump Version String: FreeBSD 15.0-CURRENT main-n268493-759a996d610d GENERIC Panic String: sleepq_add: td 0xfffff80032314000 to sleep on wchan 0xffffffff8443a120 with sleeping prohibited Dump Parity: 2544365881 Bounds: 3 Dump Status: good ---- <https://cgit.freebsd.org/src/log/?qt=range&q=759a996d610d>
Created attachment 248705 [details] head of today's /var/crash/core.txt.3
Created attachment 248706 [details] A photograph of one of two external displays that were in use on 14th February (In reply to Graham Perrin from comment #4) > … Note: play in Chromium coincided with a kernel panic at least once before … I found a photograph from 14th February. The same as today: GeoGuessr in Chromium, full screen. Original photograph metadata: 14 Feb Wed, 12:57GMT+00:00 – that is, around eight minutes after the time of the dump (see below); the freeze remains visible for a long time before an automated restart of the OS. ---- Dump header from device: /dev/ada1p2 Architecture: amd64 Architecture Version: 2 Dump Length: 4074110976 Blocksize: 512 Compression: none Dumptime: 2024-02-14 12:49:14 +0000 Hostname: mowa219-gjp4-zbook-freebsd Magic: FreeBSD Kernel Dump Version String: FreeBSD 15.0-CURRENT main-n268149-eb86c6c5b462 GENERIC Panic String: sleepq_add: td 0xfffff80013259000 to sleep on wchan 0xffffffff8443a120 with sleeping prohibited Dump Parity: 3755260517 Bounds: 0 Dump Status: good
Just a FYI. I just found legacy driver 470.239.06 is released at Feb.22,2024 by nvidia. https://www.nvidia.com/Download/driverResults.aspx/218854/en-us/ You can try it by overriding DISTVERSION and PKGNAMESUFFIX with NO_CHECKSUM=YES on x11/nvidia-driver just as x11/nvidia-driver-470 port does. As I'm not using -470 version of driver and x11/linux-nvidia-libs, I cannot assure if it builds/installs fine or not. BTW, I found DISTINFO_FILE?= ${MASTERDIR}/distinfo line in Mk/bsd.ports.mk. It seems that it allows conditional switching of distinfo. What do you think if introducing something like DISTINFO_FILE?= ${MASTERDIR}/distinfo${PKGNAMESUFFIX} in x11/nvidia-driver/Makefile and split distindo like below? distinfo for master port (PKGNAMESUFFIX is not set), distinfo-470 for *-470 ports (PKGNAMESUFFIX is set to -470 in slave port), ... Not yet even tried for now, so possibly doesn't work as intended, though. But if it works OK, I think it could make it easier when nvidia released updated legacy driver and only bump of DISTVERSION and distinfo is required, independently even if some works are ongoing on masterport like just now.
(In reply to Tomoaki AOKI from comment #7) Thanks. <https://discord.com/channels/727023752348434432/757305573866733680/1210783179766501377> Austin Shafer mentions fixes in a superior release that are probably not yet backported to 470.⋯.
<https://discord.com/channels/727023752348434432/757305573866733680/1215471550862598195> from Austin Shafer: > You can also compare it with this: > <https://github.com/amshafer/nvidia-driver/blob/535.98/nvidia/src/nvidia-modeset/nvidia-modeset-freebsd.c#L76> > and see the differences in locking and the task queue > > See the sx lock for nvkms_lock and the fast nvkms task queue. These are > what 470 is missing iirc
<https://github.com/amshafer/freebsd-ports/commit/b2d030183a661703c8b3c0000169df077284e1b8> > WIP: x11/nvidia-driver/470: Backport fix for nvidia-modeset panic > > This backports a fix where a non-sleepable lock is held while sleeping > occurs. This is fixed in more recent versions but is still causing issues > in 470. > > PR: 274519
(In reply to Graham Perrin from comment #10) Instead (again, work in progress): <https://github.com/amshafer/freebsd-ports/commit/635c3df3fefbe00ffe6aaf51df8aa20f906594ac> …
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=cbbce9a123da84852f289ba5aa53b4955b53a2dd commit cbbce9a123da84852f289ba5aa53b4955b53a2dd Author: Austin Shafer <ashafer@badland.io> AuthorDate: 2024-03-22 16:26:58 +0000 Commit: Gleb Popov <arrowd@FreeBSD.org> CommitDate: 2024-03-22 17:38:25 +0000 x11/nvidia-driver-470: Backport fix for nvidia-modeset panic PR: 274519 Differential Revision: https://reviews.freebsd.org/D44432 x11/nvidia-driver-470/Makefile | 2 +- ...tch-src_nvidia-modeset_nvidia-modeset-freebsd.c | 102 +++++++++++++++++++-- 2 files changed, 94 insertions(+), 10 deletions(-)
👍 thanks