Bug 253733

Summary: vesa.ko: Invalid BIOS call when resuming from S3 suspend/sleep causes nvidia driver hang
Product: Base System Reporter: Stefan B. <sblachmann>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: In Progress ---    
Severity: Affects Many People CC: emaste, grahamperrin, jkim, linimon, pi, sblachmann
Priority: ---    
Version: 12.2-RELEASE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224069
Attachments:
Description Flags
Disable POST and save/restore states on NVIDIA cards none

Description Stefan B. 2021-02-21 00:00:48 UTC
In vesa.ko there is a function that gets called when resuming from sleep, e.g. at resuming after S3 suspend via 'zzz'.
This function does a BIOS call, which is related to restoring the graphics cards' previous state it had before powering off.

On Nvidia cards this BIOS function seems to be implemented in a different way than on most other cards.
For this reason, calling this BIOS function causes the Nvidia graphics driver to hang, failing to resume.
(For technical background, read my discussion with jkim in PR 224069: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224069 )

Reproducing the bug is easy:
-Install FreeBSD (eg using GENERIC kernel).
-Enable sc (kern.vty="sc" in /boot/loader.conf).
-Install xorg. Install and configure nvidia driver. Reboot and start xorg via startx.
-Enter "zzz" in a xterm.
-Watch the system/driver hang, keyboard (PS/2) and mouse becoming unresponsive when attempting to switch back to graphics mode.
-Hitting the power button results no visible change, until after timeout a message about an unresponsive stop job (presumably the nvidia driver) appears shortly before powering off.

For reproducing the bug it is essential to use GENERIC kernel!
Because, working around by building and installing a custom kernel without "options VESA" fixes the hang after suspend/resume. It is also important to not have vt and its helper modules (vt_efifb etc) in the kernel, as these pull in the vesa.ko showstopper module.

Already back in 2017 I found that skipping (commenting out) the Nvidia BIOS call fixes the issue, making resume work reliably.

So I believe the proper fix would be:
1. check whether the graphics card is Nvidia
2. if it is Nvidia, skip that BIOS call in /usr/src/sys/dev/fb/vesa.c line 520.

Pseudocode for a patch might look like this:

(+)if (! nvidia_card_is_installed) {
x86bios_intr(&regs, 0x10);
(+)}


Side note:
All my systems use the sc console, not the vt console.
So I do not know the system behaviour when using vt.
Thus please use sc in text mode when reproducing the bug!
Comment 1 Jung-uk Kim freebsd_committer freebsd_triage 2021-03-12 19:50:19 UTC
Created attachment 223218 [details]
Disable POST and save/restore states on NVIDIA cards

Please try the attached patch.  I cannot test it because I don't use syscons(4) any more.
Comment 2 Stefan B. 2021-03-13 01:09:44 UTC
Thank you very much, Jung-uk!

I am going to test the patch on my computers using different nvidia cards/drivers.

To make sure it works reliably, I want to test long enough, to accumulate sufficient uptime and suspend/resume cycles.
So it might take a few days until I report back.
Comment 3 Stefan B. 2021-04-04 06:07:07 UTC
Sorry for late update. I was busy with other things.

Tested the patch. It does not work. Still hangs in text mode when resuming.
Not sure for which exact cause yet.

Apparently I didn't express myself clearly enough; it is *only* the LOAD_STATE call which breaks resume and needs to be omitted in case of Nvidia card/chip.
I verified this is still valid by commenting out the x86bios_intr() call in the case STATE_LOAD: of vesa_bios_save_restore().

So I believe the other VESA calls, including POST, do *not* have a negative impact on suspend/resume.


I didn't test yet whether vesa_find_pci_device() actually finds the card which responds to the VESA BIOS call (but will do soon using some debug printfs).
So I can not rule out yet that a problem there could be the potential cause for the patch not working.


Another issue I am not yet clear of whether it matters:

There are some OEMs who had in some cases their onboard video BIOS at other locations than C000. I remember some cases I personally encountered, where video BIOS was at E000.
For this reason I am not really sure whether the approach of checking for a C000 BIOS start address is 100% safe.


I am now thinking about scanning the OEM string which gets returned by function 4F00 for "nvidia" (case-independent), eg the string the VESA 1.2 OEMStringPtr points to.
This approach would be independent of the Option BIOS memory address.

I have about ten different Nvidia cards and onboard chips, NV4 and higher, and will read out their OEMString via debug printfs, to find out whether this alternative approach could be viable.

As I am currently moving my hardware lab, it will take about 1-3 weeks until I report back, maybe with an updated patch.
Comment 4 Ed Maste freebsd_committer freebsd_triage 2021-11-27 20:52:49 UTC
Since VESA is not supported by the default vt(4) console anyway I propose removing it from GENERIC: https://reviews.freebsd.org/D33141

sc(4) users who want to use VESA modes can still load it as a module.
Comment 5 commit-hook freebsd_committer freebsd_triage 2021-11-28 16:38:20 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b8cf1c5c30a5e6da4e2c9702ffd607a90453fb33

commit b8cf1c5c30a5e6da4e2c9702ffd607a90453fb33
Author:     Ed Maste <emaste@FreeBSD.org>
AuthorDate: 2021-11-27 20:27:45 +0000
Commit:     Ed Maste <emaste@FreeBSD.org>
CommitDate: 2021-11-28 16:29:17 +0000

    Remove options VESA from x86 GENERIC

    options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
    the legacy sc(4) console.  It is not used by the default console, vt(4).

    There is a report[1] of an incompatibility between VESA and the Nvidia
    driver breaking suspend/resume.  Since VESA is not used by the default
    configuration anyway, just remove options VESA from GENERIC.  The kernel
    module is still available and may be loaded by sc(4) users who want to
    select a VBE mode.

    (Note that vt(4) does not support selecting a VBE mode.  The loader can
    set a VBE mode and vt(4) will use it via the vt_vbefb driver.)

    [1] https://lists.freebsd.org/archives/freebsd-hackers/2021-November/000469.html

    PR:             253733
    Reported by:    Stefan Blachmann [1]
    Reviewed by:    imp, manu, tsoome
    Relnotes:       Yes
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D33141

 sys/amd64/conf/GENERIC | 1 -
 sys/i386/conf/GENERIC  | 1 -
 2 files changed, 2 deletions(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2021-11-28 19:42:58 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=777526ed83822e1af2b7f7ea4186dbf7d3d3d60a

commit 777526ed83822e1af2b7f7ea4186dbf7d3d3d60a
Author:     Ed Maste <emaste@FreeBSD.org>
AuthorDate: 2021-11-28 19:10:28 +0000
Commit:     Ed Maste <emaste@FreeBSD.org>
CommitDate: 2021-11-28 19:37:46 +0000

    Remove options VESA from x86 MINIMAL

    Followup to b8cf1c5c30a5, remove from MINIMAL in addition to GENERIC.

    options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
    the legacy sc(4) console.  It is not used by the default console, vt(4).

    PR:             253733
    Fixes:          b8cf1c5c30a5 ("Remove options VESA from x86 GENERIC")
    Relnotes:       Yes
    Sponsored by:   The FreeBSD Foundation

 sys/amd64/conf/MINIMAL | 1 -
 sys/i386/conf/MINIMAL  | 1 -
 2 files changed, 2 deletions(-)
Comment 7 Stefan B. 2022-02-21 19:10:05 UTC
Hmm, just noticed that the commits refer to x86?!?
Does this mean i386 and? or? amd64?

My systems are all amd64, so I can only say for amd64 that making suspend/resume work by removing vesa.ko from the kernel. I haven't done any tests on i386 yet.
Comment 8 Ed Maste freebsd_committer freebsd_triage 2022-02-21 20:16:23 UTC
(In reply to Stefan B. from comment #7)
Both i386 and amd64.
Comment 9 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 03:00:07 UTC
^Triage: assign to committer.

To committer: does this need MFC to 13?
Comment 10 Ed Maste freebsd_committer freebsd_triage 2024-01-22 14:32:18 UTC
Unassign: I did not commit a fix for this issue, just removed VESA from 14 and later. The removal won't be merged back to stable/13.

Someone will need to investigate the underlying issue and produce a real fix.