Bug 229899 - x11/nvidia-driver-340 graphics hangs after upgrade from 11.1-p11 to 11.2
Summary: x11/nvidia-driver-340 graphics hangs after upgrade from 11.1-p11 to 11.2
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Alexey Dokuchaev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-20 03:18 UTC by Donn Seeley
Modified: 2022-10-19 08:59 UTC (History)
1 user (show)

See Also:


Attachments
failing Xorg.0.log file (5.92 KB, text/plain)
2018-07-20 03:18 UTC, Donn Seeley
no flags Details
Xorg.0.log file from a successful boot with an 11.1-p11 kernel (11.09 KB, text/plain)
2018-07-20 03:19 UTC, Donn Seeley
no flags Details
xorg.conf file, lightly modified from nvidia-xconfig (3.02 KB, text/plain)
2018-07-20 03:20 UTC, Donn Seeley
no flags Details
/var/log/messages from 11.2 boots (108.67 KB, text/plain)
2018-07-20 03:20 UTC, Donn Seeley
no flags Details
/var/log/messages from an 11.1-p11 boot (with 11.2 userspace) (16.96 KB, text/plain)
2018-07-20 03:21 UTC, Donn Seeley
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Donn Seeley 2018-07-20 03:18:23 UTC
Created attachment 195288 [details]
failing Xorg.0.log file
Comment 1 Donn Seeley 2018-07-20 03:19:31 UTC
Created attachment 195289 [details]
Xorg.0.log file from a successful boot with an 11.1-p11 kernel
Comment 2 Donn Seeley 2018-07-20 03:20:04 UTC
Created attachment 195290 [details]
xorg.conf file, lightly modified from nvidia-xconfig
Comment 3 Donn Seeley 2018-07-20 03:20:53 UTC
Created attachment 195291 [details]
/var/log/messages from 11.2 boots
Comment 4 Donn Seeley 2018-07-20 03:21:38 UTC
Created attachment 195292 [details]
/var/log/messages from an 11.1-p11 boot (with 11.2 userspace)
Comment 5 Donn Seeley 2018-07-20 03:41:01 UTC
I have an aging but still quite usable Dell Latitude E6510 laptop that has been running FreeBSD 11.1 quite happily for months.  I upgraded to 11.2 earlier this week, and had a nasty problem: Xorg gets stuck during start-up.  The Xorg.0.log file shows that it gets almost to the point of printing out the GPU model before it wedges.

After the problem showed up, I updated nvidia-driver-340 to 340.107 using 'make reinstall' from ports just to be sure that I was fully up to date.  It had no effect on the problem.

I normally boot to a console login rather than gdm, then run startx.  When I do this under 11.2, the screen clears, and a block cursor gets painted in the upper left corner while a mouse pointer is painted in the center of the screen, and then Xorg makes no further progress.  Oddly, ps and top show Xorg with constant 100% CPU (on one CPU) with no CPU accumulation in the stats.  Here's an example:

  # ps axo pid,time,systime,usertime,pcpu,command
   PID        TIME    SYSTIME USERTIME  %CPU COMMAND
     0     1:01.53    1:01.53  0:00.00 100.0 [kernel]
  [...]
    11 10244:51.74 2369:09.03  0:00.00 600.0 [idle]
  [...]
  3346     0:01.70    0:01.70  0:00.00 100.0 /usr/local/bin/X :0 -auth /home/donn
  [...]
  #

At this point Xorg is unkillable, untraceable with ktrace -p (it just returns immediately without generating any trace data) and un-gcoreable (it wedges).  Console switching does work (but it's very slow); I tried kern.vty=sc, but it didn't help.  I ran 'sysctl debug.kdb.panic=1' to get a crash dump, but the backtrace for Xorg is uninteresting:

  (kgdb) info thread
    Id   Target Id         Frame 
  [...]
    141  Thread 100145 (PID=3346: Xorg) 0xffffffff80b25ebd in sched_switch ()
    142  Thread 100211 (PID=3372: sysctl) 0xffffffff80b25ebd in sched_switch ()
  (kgdb) thread 141
  [Switching to thread 141 (Thread 100145)]
  #0  0xffffffff80b25ebd in sched_switch ()
  (kgdb) bt
  #0  0xffffffff80b25ebd in sched_switch ()
  #1  0xffffffff8293bdc8 in ?? ()
  #2  0x0000000000000000 in ?? ()
  (kgdb) 

Xorg appears to be running constantly, but it doesn't rack up any CPU time -- maybe it's in a loop yielding the CPU?

I tried the nv driver after removing xorg.conf, but it failed, as it has always done in the past with this laptop.  I tried minimal xorg.conf files; they didn't help.  I tried running Xorg as root; it made no difference.

Finally I just booted kernel.old, which brought up the 11.1-p11 kernel, and that DID work, running with the 11.2 userspace, including Xorg and its drivers.  I've attached the Xorg.0.log from that boot, along with the /var/log/messages contents.

Was there some kernel API change in 11.2 that caused breakage for the Nvidia 340 driver?

For what it's worth, here is the pciconf output for the GPU:

  vgapci0@pci0:1:0:0:     class=0x030000 card=0x040b1028 chip=0x0a6c10de rev=0xa2 hdr=0x00
      vendor     = 'NVIDIA Corporation'
      device     = 'GT218M [NVS 3100M]'
      class      = display
      subclass   = VGA

My desktop machine is still running 11.1 and it has Nvidia graphics.  I'm putting off upgrading it to 11.2 until I can get some resolution for my poor old laptop...
Comment 6 Donn Seeley 2018-08-04 22:25:05 UTC
I decided to try some bisection, so I built a -CURRENT kernel from subversion.  It booted without a problem; here's the uname -v output:

  FreeBSD 12.0-CURRENT #2 r337326: Sat Aug  4 14:23:12 MDT 2018     donn@callao:/usr/obj/scratch/freebsd/base/head/amd64.amd64/sys/GENERIC

I rebuilt the Nvidia 340 driver with these commands:

  cd /usr/ports/x11/nvidia-driver-340
  make OSREL=12 OSVERSION=120000 ALLOW_UNSUPPORTED_SYSTEM=1 SYSDIR=/scratch/freebsd/base/head/sys clean
  make OSREL=12 OSVERSION=120000 ALLOW_UNSUPPORTED_SYSTEM=1 SYSDIR=/scratch/freebsd/base/head/sys reinstall

That didn't actually install the driver, due to a makefile glitch, so I copied the nvidia.ko file to /boot/modules by hand.  I'm winging it here -- I don't know whether this is the usual way to rebuild drivers in /usr/ports based on custom kernels.

Anyway, the results were good: the kernel loads the driver, and Xorg appears to run just fine.

It's possible that a fix has gone into -CURRENT, or that the change that introduced the problem never made it into -CURRENT, or that the problem is flakey and depends on some nasty concurrency or latency issue (and it was a coincidence that upgrading to 11.2 exposed the problem).  I could check out an older version of the kernel and see whether it fails; is there any interest in that?
Comment 7 Tijl Coosemans freebsd_committer freebsd_triage 2018-11-09 21:28:41 UTC
Is this still a problem?  It sounds like this may be a duplicate of bug 228536.
Comment 8 Donn Seeley 2018-11-10 17:09:44 UTC
Re bug 228536: I haven't seen the characteristic message about 'rm_init_adapter() failed', so I'm skeptical that this bug is directly related to 228536.  Also, I did rebuild and reinstall the kernel module for 11.2, and it didn't help, whereas that fixes the problem in 228536 (if I'm reading the bug report correctly).  I'm still running 11.1 on my desktop, and my laptop is running the same -CURRENT kernel with an 11.2 userspace.  If there are any tests that would be useful to run on the laptop, let me know.
Comment 9 Tijl Coosemans freebsd_committer freebsd_triage 2018-11-10 18:03:27 UTC
(In reply to Donn Seeley from comment #8)
Can you try the "SUGGESTED PATCH" in bug 205903?  Just save it as /usr/ports/x11/nvidia-driver/files/patch-src-nvidia_subr.c and then rebuild x11/nvidia-driver-340.
Comment 10 Alexey Dokuchaev freebsd_committer freebsd_triage 2022-10-19 08:59:11 UTC
As of September 30, 2021, FreeBSD 11.4 and stable/11 branch had reached end-of-life and thus no longer supported (11.2 had EoLed earlier in October 31, 2019), hence I'm closing this PR.  Feel free to reopen if the problem persists on supported FreeBSD versions.