Bug 224069 - (Fix included) Use of uninitalized register value in vesa.ko, causing X, text console and suspend/resume to fail
Summary: (Fix included) Use of uninitalized register value in vesa.ko, causing X, text...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-03 23:58 UTC by Stefan B.
Modified: 2017-12-04 23:06 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan B. 2017-12-03 23:58:26 UTC
PROBLEMS:
-Switching between X and text console fails, screen is garbled or no video at all
-Resuming after suspend results either in garbled or no video at all

ANALYSYS:
As I was getting annoyed by so many people affected by the buggy vesa.ko, and felt frustrated that these problems with X, particulartly switching between X and text console and the failure of suspend/resume to restore the screen, I finally took a look into vesa.ko and found immediately a thing very wrong.

In line 515 of /usr/src/sys/dev/fb/vesa.c you see this:
  /* regs.R_DL = STATE_SIZE; */

The DL register contains the subfunction of the BIOS call retrieving the buffer size. It must not be undefined, as it is unpredictable which subfunction will get called.

Thus I wish I knew who has commented out that line for what reason.
I think this justifies an explanation.
Because, the commenting out of this line caused all these problems, which have done big damage to FreeBSD's reputation regarding its usability as desktop OS.


FIX:
Uncomment that line.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2017-12-04 00:43:52 UTC
(In reply to Stefan B. from comment #0)
> Thus I wish I knew who has commented out that line for what reason.

(Note: jkim is already on the Cc: here, so I don't need to take any action.)

fwiw, there's a way to look that up.

Look up the src file on FreeBSD's svnweb:

  https://svnweb.freebsd.org/base/head/sys/dev/fb/vesa.c

Do an "annotate" based on the latest commit, here rev 326255:

  https://svnweb.freebsd.org/base/head/sys/dev/fb/vesa.c?annotate=326255

Search for the commented out line (shows as 517 there rather than 515):

  https://svnweb.freebsd.org/base/head/sys/dev/fb/vesa.c?annotate=326255#l517

Clicking the line number in that block takes you to:

  https://svnweb.freebsd.org/base/head/sys/dev/fb/vesa.c?r1=198250&r2=198251&

Clicking the 'latest revision' link on the right takes you to:

  https://svnweb.freebsd.org/base/head/sys/dev/fb/vesa.c?revision=198251&view=markup

which is a very large commit from jkim Mon Oct 19 20:58:10 2009 UTC (8 years, 1 month ago).  I have no knowledge of what's all involved there.
Comment 2 Conrad Meyer freebsd_committer 2017-12-04 00:59:12 UTC
(In reply to Mark Linimon from comment #1)
Yeah, I copied him based on that commit.

Notably, before that commit, the DX register was initialized to that same constant.  Afterwards, the comment containing the initialization of DL (low half of DX) replaced it.
Comment 3 Stefan B. 2017-12-04 02:03:52 UTC
Now that is a bit ironic...

From the last link in Mark's post, I read in jkim's commit note:

   - Fix a long standing bug in state save/restore function.  
     The state buffer pointer should be ES:BX, not ES:DI according
     to VBE 3.0.  If it ever worked, that's because BX was always zero. :-)

The last sentence made me smile, because it may explain why the uninitialized DL which stood for eight years got undetected that long :)

Anyway, people who do much like jkim, do more mistakes than people who do little. Big commits, easier to overlook a detail...
So no problem, and let's thank jkim for his great contributions!

(Thank you Mark for the good explanation how to find out :)
Comment 4 Jung-uk Kim freebsd_committer 2017-12-04 19:27:39 UTC
You guys are not reading the code right.  x86bios_init_regs(&regs) before the commented line initializes all registers to zero.  In other words, I commented out the line because it is unnecessary, i.e., STATE_SIZE is zero.  I am not really sure what's going on but you can do "sysctl debug.x86bios.call=1" to trace VESA function calls.
Comment 5 Jung-uk Kim freebsd_committer 2017-12-04 19:32:53 UTC
(In reply to Stefan B. from comment #0)
> Because, the commenting out of this line caused all these problems, which have
> done big damage to FreeBSD's reputation regarding its usability as desktop OS.

I implemented amd64 suspend/resume support.  I don't think I deserve the harsh comment. :-(
Comment 6 Jung-uk Kim freebsd_committer 2017-12-04 19:45:02 UTC
FYI, switching between X and text console is NOT handled by FreeBSD console drivers.  In fact, X/DRM/KMS drivers are responsible for saving/restoring frame buffer and GPU state.
Comment 7 Jung-uk Kim freebsd_committer 2017-12-04 21:17:37 UTC
(In reply to Jung-uk Kim from comment #4)
> ... you can do "sysctl debug.x86bios.call=1" to trace VESA function calls.

Do "sysctl debug.x86bios.int=1" to trace VESA interrupt handlers, sorry.
Comment 8 Stefan B. 2017-12-04 21:28:43 UTC
(In reply to Jung-uk Kim from comment #4, #5 and #6)

#4: Didn't know about register zeroing. Such was not common back then when I did VGA programming via hardware and INT10 decades ago using mixed C and assembly back in the 16-bit DOS times.
The sysctl tip is great for debugging!


#5 No, you don't deserve that at all.
When I saw the commit Mark pointed me at, I instantly recognized that it was a lot of work involved. And you know, the bigger the work, the easier it is to overlook a detail. See please my comment #3, too.

And I did not know you were the one who implemented FreeBSD suspend/resume. That was great work, good to know. Thank you!
However, I think the suspend/resume thing has been neglected, and it does not  work for many people. Below I discuss part of possible reasons.
Btw, are you still interested in improving suspend/resume from STR to STD?


#6 That is very interesting, because:
In the FreeBSD forums, there are constant complaints regarding Nvidia cards. 
Garbled display when switching between console and X, and after resume. I have made the discovery that this apparently *only* happens when vesa.ko is present.
And that is in the GENERIC kernel or as loaded module.

In the forums I have talked much of that problem. Apparently all people that followed my advice to build a custom kernel *without* option VESA and switch from vt to sc console (because the default vt console pulls in vesa.ko, if not present in kernel), got rid of these problems.

And, in my naive opinion I think suspend/resume should *not* be broken by just doing kldload vesa.ko. (this is factually the way to reproduce the problem)

So I have the strong feeling that there is a serious problem with the vesa module. But that is just my (possibly misleading) intuition.
Comment 9 Jung-uk Kim freebsd_committer 2017-12-04 22:22:15 UTC
(In reply to Stefan B. from comment #8)
> Btw, are you still interested in improving suspend/resume from STR to STD?

No, not really.

> In the FreeBSD forums, there are constant complaints regarding Nvidia cards.
...
> So I have the strong feeling that there is a serious problem with the vesa
> module. But that is just my (possibly misleading) intuition.

Ah, NVIDIA...  It is a well known problem for many years.  The root cause is NVIDIA does not really support VESA BIOS re-POST and save/restore state calls.  It just checks few bits and returns immediately if they are set/clear.  I have a feeling that they intentionally did it to make reverse engineering harder.  On top of that, they don't provide any information to improve the situation.  Long time ago, I even bought couple of cheap NVIDIA controllers to fix the problem but soon I gave up because I realized it was too complicated and I wasn't desperate.  So, no, VESA module have no "serious" problem.  Just NVIDIA doesn't want us to mess around with their hardware although we own it.

We can add few knobs in the driver to work around these issues and turn them on by default if NVIDIA controller is found.
Comment 10 Stefan B. 2017-12-04 22:54:19 UTC
(In reply to Jung-uk Kim from comment #9)

> We can add few knobs in the driver to work around these issues 
> and turn them on by default if NVIDIA controller is found.

That would be the right thing to do then.
Would save a lot of bad UX.

And the supporting guys on the forums would be happy if these constantly recurring nvidia problems would be mitigated (at least to a degree).
It's not nice to see people going back to Linux just because the text console is unusable etc.

> The root cause is NVIDIA does not really support VESA BIOS re-POST 
> and save/restore state calls.  It just checks few bits and returns 
> immediately if they are set/clear.

Aww. Didn't know that. You did obviously a lot of research. Kudos!

This is a very bad thing to learn, as this could make STD with Nvidia cards virtually impossible without involving X.

(Btw, do you know whether ATI cards behave better in this regard, supporting state saving/restoring?)
Comment 11 Jung-uk Kim freebsd_committer 2017-12-04 23:06:06 UTC
(In reply to Stefan B. from comment #10)
> (Btw, do you know whether ATI cards behave better in this regard, supporting
> state saving/restoring?)

Yes.  I've never heard any complaint about ATI/AMD controllers for many years.