(This is in response to bug #235487 comment 13 asking to submit a separate bug report.)
I have crashes since a while which are probably related to 3D acceleration with nvidia.
They are usually triggered by activities in Firefox, sometimes seemingly also by VirtualBox clients even though I run them mostly headless.
The setup is as follows:
- Thinkpad W520
- GF108 [Quadro 1000M] graphics card
- releng/12.0 with latest patches
The symptoms are as follows:
- In most cases, the machine suddenly locks up, consuming 100% CPU (from the fan noise on between 1 to 4 CPUs).
- In one case so far, the machine has rebooted instantaneously.
A crash dump is never produced.
The crashes seem to happen after activities in Firefox, but sometimes also with a VirtualBox client running. Regarding the latter, I recently switched from an emulated le(4) device to a vtnet(4) device, and the issue might actually lie there, which is to say that probably the lockups are not due to nvidia, but rather to vtnet misbehaving.
It seems that I also regularly get into this problem, but I have crash information (below)
Essentially, panic occurs for me when there is some activity in firefox ( e.g: open new tabs/link ).
/usr/libexec/kgdb kernel.debug /var/crash/vmcore.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Unread portion of the kernel message buffer:
panic: vm_page_free_prep: freeing wired page 0xfffff8082bfe8570
cpuid = 5
time = 1557074560
KDB: stack backtrace:
#0 0xffffffff80be7977 at kdb_backtrace+0x67
#1 0xffffffff80b9b563 at vpanic+0x1a3
#2 0xffffffff80b9b3b3 at panic+0x43
#3 0xffffffff80ef5bc7 at vm_page_free_prep+0x137
#4 0xffffffff80ef1e93 at vm_page_free_toq+0x13
#5 0xffffffff80ee0680 at _kmem_unback+0xf0
#6 0xffffffff80ee072d at kmem_free+0x2d
#7 0xffffffff83d251fe at nv_free_system_pages+0x9e
#8 0xffffffff83d252b7 at nv_free_pages+0x17
#9 0xffffffff83cef7d7 at _nv029865rm+0x97
Dumping 1985 out of 32674 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
#0 doadump (textdump=<value optimized out>) at pcpu.h:230
#1 0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/jails/src/src_12.0/src/sys/kern/kern_shutdown.c:446
#2 0xffffffff80b9b5c3 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/jails/src/src_12.0/src/sys/kern/kern_shutdown.c:872
#3 0xffffffff80b9b3b3 in panic (fmt=<value optimized out>) at /usr/jails/src/src_12.0/src/sys/kern/kern_shutdown.c:799
#4 0xffffffff80ef5bc7 in vm_page_free_prep (m=<value optimized out>) at atomic.h:444
#5 0xffffffff80ef1e93 in vm_page_free_toq (m=0xfffff8082bfe8570) at /usr/jails/src/src_12.0/src/sys/vm/vm_page.c:3521
#6 0xffffffff80ee0680 in _kmem_unback (object=<value optimized out>, addr=<value optimized out>, size=<value optimized out>) at /usr/jails/src/src_12.0/src/sys/vm/vm_kern.c:588
#7 0xffffffff80ee072d in kmem_free (addr=18446741878459031552, size=4096) at /usr/jails/src/src_12.0/src/sys/vm/vm_kern.c:614
#8 0xffffffff83d251fe in nv_free_system_pages () from /boot/modules/nvidia.ko
#9 0xffffffff83d252b7 in nv_free_pages () from /boot/modules/nvidia.ko
#10 0xffffffff83cef7d7 in _nv029865rm () from /boot/modules/nvidia.ko
#11 0x0000000000400000 in ?? ()
#12 0xfffffe00df990000 in ?? ()
#13 0xfffffe00b7804000 in ?? ()
#14 0xfffff80598791600 in ?? ()
#15 0xfffff80598a91c08 in ?? ()
#16 0xffffffff83a6459d in _nv007254rm () from /boot/modules/nvidia.ko
#17 0x0000000000000000 in ?? ()
I am not sure yet, but it seems that a complete shutdown of a swap increases stability. One day of solid work without a swap, and still no panic
This is interesting. I had crashes with geli swap which improved after turning of encryption.
Was your swap geli-encrypted? If yes, how does it work using regular swap?
If anyone would like to work on this: The crash mostly seems to occur when scrolling (I regularly experience it with Firefox but recently also with Thunderbird).
And it seems to be more likely if I have VirtualBox running at the same time.
I guess that the NVidia driver is writing outside its allocated memory range, or that it maybe tries to allocate video memory from an already allocated range... the latter because it seems that the issue is less likely to occur if VirtualBox is started a long time after Firefox has been started and used heavily, leading me to believe that probably the NVidia driver in such a case has already allocated all the memory needed for any scrolling operations. But if, after booting, first VirtualBox is started and then Firefox, the issue occurs with a high probability.
Maybe it is also VirtualBox which improperly uses memory which the NVidia driver later tries to claim.
Unfortunately I can never get a dump or anything because the machine just deadlocks with high CPU usage.
Btw, this may be the same issue as #239822.
Regarding this bug report #235865, there are basically two issues:
- Very seldom, the machine would crash when scrolling in Firefox
- Very often, the machine would crash when using it as a virtualbox host and then do graphics operations (like scrolling in Firefox)
With ports r549922 I hope that the second issue is gone - I am currently running several vbox clients with no ill effects so far.
The question is, could it be that nvidia-driver needs similar changes to kernel memory handling as were done in r549922?