Bug 235865 - x11/nvidia-driver leads to deadlock with high cpu or instant reboot
Summary: x11/nvidia-driver leads to deadlock with high cpu or instant reboot
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Alexey Dokuchaev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-19 19:10 UTC by Martin Birgmeier
Modified: 2019-08-31 14:33 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (danfe)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Birgmeier 2019-02-19 19:10:18 UTC
(This is in response to bug #235487 comment 13 asking to submit a separate bug report.)

I have crashes since a while which are probably related to 3D acceleration with nvidia.

They are usually triggered by activities in Firefox, sometimes seemingly also by VirtualBox clients even though I run them mostly headless.

The setup is as follows:
- Thinkpad W520
- GF108 [Quadro 1000M] graphics card
- releng/12.0 with latest patches
- mesa-dri-18.3.2
- nvidia-driver-390.87_2
- xorg-server-1.18.4_11,1

The symptoms are as follows:
- In most cases, the machine suddenly locks up, consuming 100% CPU (from the fan noise on between 1 to 4 CPUs).
- In one case so far, the machine has rebooted instantaneously.
A crash dump is never produced.

The crashes seem to happen after activities in Firefox, but sometimes also with a VirtualBox client running. Regarding the latter, I recently switched from an emulated le(4) device to a vtnet(4) device, and the issue might actually lie there, which is to say that probably the lockups are not due to nvidia, but rather to vtnet misbehaving.

-- Martin
Comment 1 olevole 2019-05-05 17:02:54 UTC
It seems that I also regularly get into this problem, but I have crash information (below)

Essentially, panic occurs for me when there is some activity in firefox ( e.g: open new tabs/link ).

My info:

12.0-RELEASE-p3 amd64
nvidia-driver-390.87_2

/usr/libexec/kgdb kernel.debug /var/crash/vmcore.1 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: vm_page_free_prep: freeing wired page 0xfffff8082bfe8570
cpuid = 5
time = 1557074560
KDB: stack backtrace:
#0 0xffffffff80be7977 at kdb_backtrace+0x67
#1 0xffffffff80b9b563 at vpanic+0x1a3
#2 0xffffffff80b9b3b3 at panic+0x43
#3 0xffffffff80ef5bc7 at vm_page_free_prep+0x137
#4 0xffffffff80ef1e93 at vm_page_free_toq+0x13
#5 0xffffffff80ee0680 at _kmem_unback+0xf0
#6 0xffffffff80ee072d at kmem_free+0x2d
#7 0xffffffff83d251fe at nv_free_system_pages+0x9e
#8 0xffffffff83d252b7 at nv_free_pages+0x17
#9 0xffffffff83cef7d7 at _nv029865rm+0x97
Uptime: 1h8m2s
Dumping 1985 out of 32674 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:230
#1  0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/jails/src/src_12.0/src/sys/kern/kern_shutdown.c:446
#2  0xffffffff80b9b5c3 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/jails/src/src_12.0/src/sys/kern/kern_shutdown.c:872
#3  0xffffffff80b9b3b3 in panic (fmt=<value optimized out>) at /usr/jails/src/src_12.0/src/sys/kern/kern_shutdown.c:799
#4  0xffffffff80ef5bc7 in vm_page_free_prep (m=<value optimized out>) at atomic.h:444
#5  0xffffffff80ef1e93 in vm_page_free_toq (m=0xfffff8082bfe8570) at /usr/jails/src/src_12.0/src/sys/vm/vm_page.c:3521
#6  0xffffffff80ee0680 in _kmem_unback (object=<value optimized out>, addr=<value optimized out>, size=<value optimized out>) at /usr/jails/src/src_12.0/src/sys/vm/vm_kern.c:588
#7  0xffffffff80ee072d in kmem_free (addr=18446741878459031552, size=4096) at /usr/jails/src/src_12.0/src/sys/vm/vm_kern.c:614
#8  0xffffffff83d251fe in nv_free_system_pages () from /boot/modules/nvidia.ko
#9  0xffffffff83d252b7 in nv_free_pages () from /boot/modules/nvidia.ko
#10 0xffffffff83cef7d7 in _nv029865rm () from /boot/modules/nvidia.ko
#11 0x0000000000400000 in ?? ()
#12 0xfffffe00df990000 in ?? ()
#13 0xfffffe00b7804000 in ?? ()
#14 0xfffff80598791600 in ?? ()
#15 0xfffff80598a91c08 in ?? ()
#16 0xffffffff83a6459d in _nv007254rm () from /boot/modules/nvidia.ko
#17 0x0000000000000000 in ?? ()
Comment 2 olevole 2019-05-06 21:30:54 UTC
I am not sure yet, but it seems that a complete shutdown of a swap increases stability. One day of solid work without a swap, and still no panic
Comment 3 Martin Birgmeier 2019-05-07 14:30:33 UTC
This is interesting. I had crashes with geli swap which improved after turning of encryption.

Was your swap geli-encrypted? If yes, how does it work using regular swap?

-- Martin
Comment 4 Martin Birgmeier 2019-08-31 14:33:29 UTC
If anyone would like to work on this: The crash mostly seems to occur when scrolling (I regularly experience it with Firefox but recently also with Thunderbird).

And it seems to be more likely if I have VirtualBox running at the same time.

I guess that the NVidia driver is writing outside its allocated memory range, or that it maybe tries to allocate video memory from an already allocated range... the latter because it seems that the issue is less likely to occur if VirtualBox is started a long time after Firefox has been started and used heavily, leading me to believe that probably the NVidia driver in such a case has already allocated all the memory needed for any scrolling operations. But if, after booting, first VirtualBox is started and then Firefox, the issue occurs with a high probability.

Maybe it is also VirtualBox which improperly uses memory which the NVidia driver later tries to claim.

Unfortunately I can never get a dump or anything because the machine just deadlocks with high CPU usage.

Btw, this may be the same issue as #239822.

-- Martin