On recent CURRENT (FreeBSD 16.0-CURRENT #14 master-n282165-718acd74657f: Wed Nov 26 08:26:23 CET 2025 amd64) with a recent nvidia 5060Ti GPU, port x11/nvidia-kmod (nvidia-kmod-580.105.08.1600004) fails to present xdm/GUI. The x11/nvidia-kmod, x11/nvidia-driver ports are recompiled every time world/kernel is build. xdm is setup via /etc/ttys (ttyv9 "/usr/local/bin/xdm -nodaemon" xterm onifexists insecure). Since a couple of days for now, it seems that with recent nvidia kernel module loaded, a reboot (either reboot or shutdown -r now) takes a long time until an error occurs ("... some processes would not die, ... adviced"). It is not possible to interrupt or watch which process rejects to perish, but I guess it is xdm. Without nvidia kernel module loaded, FreeBSD behaves in most cases as expected and performs a clean reboot or even working without GUI is possible so far. I use a custom kernel! Custom kernel worked before, I guess, commit 9562994a7aacee2baae6ddee1a7b558b48ae39ef - this commit is a marker to me, I rebuilt kernel yesterday before this commit has been made and it worked so far even with GUI. After that commit, I had to follow now vanished UPDATING remarks about setting some sysctl flags (see my PR 291212 on that).
(In reply to O. Hartmann from comment #0) On which commit that ran fine for you? Can you bisect which commit actually affected? I'm currently on massive poudriere rebuilds on stable/15 and cannot boot into / upgrading main branch on exactly same computer. And which GPU are you affected? RTX 5xxx series are known to still have some rough edges and GSP (GPU System Processor that Turing and later generations, including RTX 5xxx [Blackwel], have) is needed to be active, unlike prior (through Ada Lovelace) generation of GPUs.
(In reply to Tomoaki AOKI from comment #1) Second question first: [...] vgapci0: child nvidia0 requested pci_enable_io vgapci0: child nvidia0 requested pci_enable_io nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 580.105.08 Wed Oct 29 22:04:36 UTC 2025 kldunload: attempt to unload file that was loaded by the kernel [...] Vendor/OEM is MSI Your first question: I have never done a bisection. Yesterday the box worked, I remember that the last update/compilation I did was with commit 64ee9c166ce5e807e575d205ac2e15cc5cf6581b It is because I use an Intel igb/em NIC on my servers and I was wondering ...
Just for the record: unloading nvidia-modeset via kldunload results in a hung up console (still possible to connect via SSH, but no F1..FXX console/TTY switching on console anymore). Hung is permanent ...
(In reply to O. Hartmann from comment #2) So on commit base 64ee9c166ce5e807e575d205ac2e15cc5cf6581b, x11/nvidia-kmod in conjunction with x11/nvidia-driver 580.105.08 worked fine on it, right? Then upgrading src to commit base 9562994a7aacee2baae6ddee1a7b558b48ae39ef broke it, and still broken at commit base 718acd74657fdf21cfd03c721bb7484d3789aaa0, right? If so, no need to bisect, as commit base 9562994a7aacee2baae6ddee1a7b558b48ae39ef is just the next one after commit base 64ee9c166ce5e807e575d205ac2e15cc5cf6581b. It makes clear that commit base 9562994a7aacee2baae6ddee1a7b558b48ae39ef broke things. Fortunately, stable/15 has related codes before it branched (what's missing is just the one flipped the default). I can test whether flipping tunable debug.link_elf_obj_leak_locals from 1 to 0 in my /boot/loader.conf and restart to see what happenes. (For me, debug.link_elf_leak_locals is somehow already 0 and working fine for my Quadro P1000 (notebook).) But it would be after a couple of days after massive poudriere rebuilds finishes. If it affect for me, the issue would be because nvidia.ko cannot fetch tunable you defined in /boot/loader.conf for your GPU (assuming RTX 5xxx series) as of the flip (for me, affected one would be nvidia-drm.ko, though). And if not, possibly something others are affected and nvidia.ko is killed indirectly by it.
(In reply to O. Hartmann from comment #3) Not limited with nvidia, unloading kernel mode setting GPU drivers (included non-generic nvidia-modesetting.ko) usually causes crash or hang, as kernel / console driver no lonnger know how to set modes properly. On ancient GPU drivers (recently called User Mode Setting driver in contrast with KMS), kernel / console driver didn't matter about mode settings (UMS drove / release video buffer by itself). Vanilla kernel/console drivers run as if nothing changed, IIUC.
(In reply to Tomoaki AOKI from comment #4) Sorry, I was to hasty and unprecise and, to be honest, I do not understand the situation anymore! I went back as far as commit 120f8a4c2ae8a011827d83b098ecf70c791f794b (git reset hard 120f8a4c2ae8a011827d83b098ecf70c791f794b). Then I recompiled both world and kernel (I use NOCLEAN, so it might be the wrong approach). Also recompiled is the x11/nvidia-xxxx stuff as set in src.conf accordingly. No change! not loading nvidia-modeset leaves the system fully operational as far as I could judge this. As stated prior to this message. So, reflecting on what I could miss, I'm rebuilding the whole OS with the lastest CURRENT commit and also rebuild both x11/nvidia-driver and x11/nvidia-kmod. Prior to a whole rebuild of CURRENT after cleanworld with a rebuilt nvidia driver AND carefully set debug.link_elf_leak_locals=0 debug.link_elf_obj_leak_locals=0 in /boot/loader.conf.local, rebooting and manually trying to load the nvidia-modeset module I get this error on the console: [...] bridge0: link state changed to UP nvidia0: <NVIDIA GeForce RTX 5060 Ti> on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: child nvidia0 requested pci_enable_io link_elf_obj: symbol nv_kthread_q_schedule_q_item undefined nvidia0: detached linker_load_file: /boot/modules/nvidia-modeset.ko - unsupported file type
(In reply to O. Hartmann from comment #6) With FreeBSD 16.0-CURRENT #0 master-n282166-23af364630b1: Wed Nov 26 15:36:30 CET 2025 amd64, recent ports tree, rebuilt nvidia modules, and properly disabled debug.link_elf_leak_locals=0 debug.link_elf_obj_leak_locals=0 in /boot/loader.conf.local the result is as stated prior to this add: [...] nvidia0: <NVIDIA GeForce RTX 5060 Ti> on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: child nvidia0 requested pci_enable_io link_elf_obj: symbol nv_kthread_q_schedule_q_item undefined nvidia0: detached linker_load_file: /boot/modules/nvidia-modeset.ko - unsupported file type I'm sorry being not much of help in terms of debugging. I guess the latest "issue" is more convenient than a frozen, non responsive console.
(In reply to O. Hartmann from comment #6) NOCLEAN (and ccache or alike, if you're using) on rolling back could harm here. At worst, `rm -rf /usr/obj/` is needed. I've bitten by this worst case before (unrelated with nvidia GPU drivers). I've struggled with bi-sectiong before, until I've noticed about this problem. This is because "linker_load_file: /boot/modules/nvidia-modeset.ko - unsupported file type" usually means mis-match in (interface) versions between actual kernel and kernel modules.
(In reply to Tomoaki AOKI from comment #8) Because of the fact being bitten by out of sync kernel and kernel modules from ports tree, I try to recompile kernel modules used whenever world is build. At this very specific case regarding the recent kernel and recent nvidia driver, both, world/kernel and ports tree/nvidia driver are up to date, being recompiled from scratch after deleting /usr/obj/*. So, the issue is now a real world issue, I guess and not a phenomenon of an out-of-sync ABI. A further observation: Xorg is eating up 100% WCPU (top) when nvidia driver is loaded and debug.link_elf_leak_locals=1 debug.link_elf_obj_leak_locals=1 are both left to their default.
(In reply to O. Hartmann from comment #9) What I've mentioned is not only "sync src and pkg tree", but also /src/obj/ needed to be cleaned after rolling back the tree. This is because if *.o (including *.pico) already built with updated source files are left in /usr/obj/, they can be skipped to be built using rolled back source, thus, things does not change or even worse. This is what I've be betten before. And (IIRC) *.depends and *.meta generated by compiler for records are not cleaned by usual `make clean` (including cleanworld and cleankernel). For *.depends, `make cleandepends` would work (IIRC, not worked as expected for kernel, unlike for world, though). But IIRC, there's not `cleanmeta` target to clean *.meta. After bitten by it, and as I'm using Root-on-ZFS installation, I've started to take snapshot (independent dataset for me) of /usr/obj having the commit used in snapshot names everytime buildworld and buildkernel succeeded. So I'm not bothered for clean rebuilds. Just roll back to known working commit and install from there.
Fun fact: with FreeBSD 16.0-CURRENT #6 master-n282239-57c0a337dbc5: Sat Nov 29 09:07:33 CET 2025 amd64 everything turned back to normal!
(In reply to O. Hartmann from comment #11) It would be simply because the offending commit was temporarily reverted at commit base fad4c92b78a123f87195173ac118655fa8e325cd, isn't it? But the offending commit base 9562994a7aacee2baae6ddee1a7b558b48ae39ef is planned to be reapplied, so need actual fix anyway. I'm now working on it at Bug291212.
I mark this as a duplicate of Bug 291212. *** This bug has been marked as a duplicate of bug 291212 ***