Created attachment 219520 [details]
- FreeBSD 12.1-RELEASE-p6 #6 r362488M, built with debug
- unattended reboot (sysctl debug.debugger_on_panic=0)
- Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
- NVIDIA GPU Quadro 1000M (GF108GL) at PCI:1:0:0
- 24 GB main memory
- ports at latest
- using x11/nvidia-driver-390 to drive the graphics card
- KDE running
- Even without great graphics activity (the user may be away) FreeBSD crashes regularly
- The core dumps indicate issues with the nvidia driver
Three /var/crash/core.txt.* files are attached as well as Xorg.0.log.
The crashes keep occurring regularly - always at _nv007402rm+0x12. Now with FreeBSD 12.2 instead of 12.1.
With such a definite crash source, shouldn't it be easy to fix it? :-)
(In reply to Martin Birgmeier from comment #1)
I can't tell if you are joking or not.
I'm seeing immediate panics on an iMac9,1 GeForce 9400 since upgrading to 12.2-RELEASE. Under 12.1-RELEASE, it was mostly fine but did panic on rare occasions.
(In reply to Jason W. Bacon from comment #3)
Scrap that, I misread the version. My iMac is running 340. 390 reports that the chipset is supported by 340.
Could you as maintainer please contact Nvidia about this PR and get them to fix the issue?
Created attachment 224733 [details]
more nvidia crashes
Here are some more recent crashes... it is always at the same symbol _nv007402rm.
Maybe it would be easy to contact Nvidia with this information and ask for a fix?
This might be related to bug #195097... Could you try to apply (by hand, the current code is a bit different) the patch https://bz-attachments.freebsd.org/attachment.cgi?id=170499 and see if it makes any difference?
Thank you for the pointer.
The crash happens randomly during operation, but the patch seems to address an open/close issue. What leads you to believe it might help?
(In reply to Martin Birgmeier from comment #8)
> What leads you to believe it might help?
There were quite a few similar reports in the past (bug #193622, https://bugzilla.redhat.com/show_bug.cgi?id=589007, https://forums.developer.nvidia.com/t/gpu-stuck-during-deep-learning-training/115258) and in all of them, the last non-obfuscated function call before obfuscated _nvXXXXrm() chain was rm_free_unused_clients(), so it deemed something's wrong with resource management teardown logic.
> #8 0xffffffff82077bf2 in _nv007402rm () from /boot/modules/nvidia.ko
> #9 0xfffffe00a7bebd50 in ?? ()
> #10 0xffffffff82077a69 in _nv007400rm () from /boot/modules/nvidia.ko
> #11 0xfffffe00a7bebd50 in ?? ()
> #12 0xfffffe00a7bebda0 in ?? ()
> #13 0x0000000000000000 in ?? ()
However, in your case I don't see that call (and the stack trace is rather short), so you're probably right, it must be something else in your case. Too bad nVidia obfuscates Resource Manager API. :-(