Bug 207898

Summary: kernel linker behaves differently on amd64 vs. i386
Product: Base System Reporter: Don Lewis <truckman>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People CC: desk, emaste, franco, jilles
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
example kernel module source that illustrates the differing kernel linker behavior on amd64 vs i386 none

Description Don Lewis freebsd_committer freebsd_triage 2016-03-11 09:27:29 UTC
Created attachment 168000 [details]
example kernel module source that illustrates the differing kernel linker behavior on amd64 vs i386

If one source file in a kernel module defines a symbol as static and another declares it as extern, the module fails to load on i386, with the kernel logging a message about the symbol being undefined.  I believe this is the correct behavior.  On amd64, the module loads and the code in the second source file is able to access the static variable in the first source file.

In the attached example, the main module source file is able to access static character arrays in the other source file when loaded on an amd64 machine.

The behavior is the same on FreeBSD 10.1 through recent 11.0-CURRENT.  FreeBSD 9.x has not been tested.
Comment 1 Don Lewis freebsd_committer freebsd_triage 2016-03-12 07:56:36 UTC
Most of the kernel linker code is MI, but there is some MD code in /usr/src/sys/{amd64/amd64,i386/i386}/elf_machdep.c.  I didn't see anything suspicious there.

The MI code is difficult to figure out, but that is where I suspect the problem is.  I suspect that whether or not the problem is triggered depends on the order of the relocation entries in the .ko file.  On amd64, I see this when I run nm on the .ko file:

[snip]
                 U module_register_init
0000000000000000 b msg1
                 U msg1
0000000000000050 b msg2
                 U msg2
                 U strcpy
                 U uprintf

On i386, I see this:

[snip]
         U module_register_init
         U msg1
000014c4 b msg1
         U msg2
00001514 b msg2
         U strcpy
         U uprintf

Note that the "b" entries for msg1 and msg2 precede the "U" entries on amd64, but the reverse is true on i386.

Unfortunately this is difficult to test because swapping the order of SRCS does not change the order as reported by nm.
Comment 2 Jilles Tjoelker freebsd_committer freebsd_triage 2016-03-13 22:50:56 UTC
There is another MD aspect of the kernel linker: whether kernel modules are object files (file says "ELF xx-bit yyy relocatable") or DSOs (file says "ELF xx-bit yyy shared object"). Of the architectures you are looking at, i386 uses DSOs and amd64 uses object files.

Using object files may reduce overhead slightly but bypasses functionality that may be useful. For example, DSOs have a symbol table for dynamic linking separate from the one for debugging, while object files only have a single symbol table. Although there is a flag for static (local) symbols, the kernel linker ignores it and some code may have started abusing this ignoring.

Note that, although i386 kernel modules are DSOs, they are not PIC and do not use a GOT and PLT. Therefore, there is no overhead from the DSO format while running the code.
Comment 3 Don Lewis freebsd_committer freebsd_triage 2016-03-14 20:01:32 UTC
The dummynet AQM patch accidentally abused this when it added two new files that referenced an existing static variable one of the pre-existing source files.  This was not caught because the authors only tested on amd64.

Why would the linker ignore the flag for local symbols?  That seems like it could be the source of difficult to debug problems.
Comment 4 Justin Cady 2021-05-10 16:20:52 UTC
> Why would the linker ignore the flag for local symbols?  That seems like it could be the source of difficult to debug problems.

I have the same question, and I can confirm that such problems are very difficult to debug. :(

In my case, Module-A, which depends on Modules B and C, is intended to pick up a globally exported symbol from Module-B, but picks up a _local_ symbol of the same name from Module-C (which happens to be scanned by the linker first). I believe that is because of this bug.

I verified this was causing my issue by editing link_elf_lookup_symbol() in sys/kern/link_elf_obj.c to populate the sym pointer and return success only if the symbol has global binding (STB_GLOBAL). With that change, the local symbol from Module-C is ignored, and the global symbol from Module-B is correctly selected.

1. Am I correct that because of this bug, symbol names on amd64 are effectively required to be unique across all kernel module dependencies?

2. Is there any risk to actually fixing this? I tried to understand all of the potential callers of link_elf_lookup_symbol(), but doing so is not straightforward through all the indirect calls (function pointers, macros). Stated differently: is there any expectation that link_elf_lookup_symbol() should return a local symbol?
Comment 5 Justin Cady 2021-06-09 18:00:28 UTC
If anyone knows the answer to my above questions I would still appreciate it.

But I wanted to share a bit of history I found...an old freebsd-hackers thread titled "Kernel linker and undefined references in KLD":

https://lists.freebsd.org/pipermail/freebsd-hackers/2010-July/032418.html

That thread contains a discussion of this issue along with a proposed patch that apparently was never taken forward. One portion of it is quite similar to the small patch I used to validate my theory (only returning STB_GLOBAL symbols from link_elf_lookup_symbol).
Comment 6 Ed Maste freebsd_committer freebsd_triage 2021-11-05 14:54:17 UTC
(In reply to Don Lewis from comment #3)
> Why would the linker ignore the flag for local symbols?

I suspect there is no satisfying reason. It was probably just never implemented, and had no (or very limited) effect on the set of (relatively simple) in-tree kernel modules.

(In reply to Justin Cady from comment #4)
> 1. Am I correct that because of this bug, symbol names on amd64 are effectively
> required to be unique across all kernel module dependencies?

Yes, it appears that is the case.

> 2. Is there any risk to actually fixing this? I tried to understand all of the
> potential callers of link_elf_lookup_symbol(), but doing so is not
> straightforward through all the indirect calls (function pointers, macros).
> Stated differently: is there any expectation that link_elf_lookup_symbol()
> should return a local symbol?

There are two separate risks:

1) changing link_elf_lookup_symbol's beavhiour, if there are callers other than via module loading that depend on the current behaviour

Notice that this is already covered in Kostik's patch (in the thread you referenced). It adds link_elf_lookup_debug_symbol which continues searching local symbols.

2) breaking in-tree or out-of-tree kernel modules that depend on this

Also in Kostik's patch there were changes to sys/modules/mii and sys/modules/pseudofs, because they were missing entries in EXPORT_SYMS. (I did not look to see if they have been fixed since, or whether there are other newly-introduced cases.)

In-tree modules would have to be fixed if/when this patch goes in, but there may well be third party modules that need similar changes.

All of that said, this is definitely a bug that should be fixed.

I propose:

1) augment Kostik's patch to add a sysctl controlling whether kernel module symbol resolution uses local symbols or not
2) initially set it to maintain the current behaviour (use local symbols)
3) commit to main, send a call for testing
4) MFC to stable/13 (as well as stable/12 if the merge is straightforward) maintaining current behaviour
5) fix any in-tree modules
6) toggle the the sysctl in main (i.e., no longer search local symbols in kernel module lookup)
7) consider emitting a warning if the sysctl is changed by the user
8) wait for FreeBSD 14.0's release
9) remove the sysctl, leaving only no-local-lookup for kernel modules
Comment 7 Justin Cady 2021-11-05 20:49:30 UTC
(In reply to Ed Maste from comment #6)

Thank you, Ed. I appreciate the answers to those questions, and your proposal for resolving this issue looks good to me.

I have not exhaustively searched in-tree modules for missing entries in EXPORT_SYMS, but I am aware of a few that will need updated as a part of this: dtrace.ko and opensolaris.ko, for example.
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2021-11-07 09:21:03 UTC
I revived by 10+ years old patches in https://reviews.freebsd.org/D32878

WRT to sysctl, I am not sure that this is soo great idea, because nobody would
toggle it to help debug.  Might be, it indeed makes sense as the emergency revert
switch, indeed.
Comment 9 Ed Maste freebsd_committer freebsd_triage 2021-11-07 16:13:12 UTC
(In reply to Konstantin Belousov from comment #8)
I agree wide-scale testing is unlikely, but I'm sure some of our downstreams would test. In any case, having it as an emergency revert option seems useful and although I wouldn't change the behaviour in a stable branch having it available as opt-in also seems like it could have some use.
Comment 10 commit-hook freebsd_committer freebsd_triage 2021-12-08 22:00:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=95c20faf11a1af6924f97ec4aafc32d899fea8b0

commit 95c20faf11a1af6924f97ec4aafc32d899fea8b0
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-11-07 08:37:48 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-12-08 21:32:29 +0000

    kernel linker: do not read debug symbol tables for non-debug symbols

    In particular, this prevents resolving locals from other files.
    To access debug symbol tables, add LINKER_LOOKUP_DEBUG_SYMBOL and
    LINKER_DEBUG_SYMBOL_VALUES kobj methods, which are allowed to use
    any types of present symbols in all tables.

    PR:     207898
    Reviewed by:    emaste, markj
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D32878

 sys/kern/kern_linker.c  |  2 +-
 sys/kern/link_elf.c     | 79 +++++++++++++++++++++++++++++++++++++++----------
 sys/kern/link_elf_obj.c | 61 +++++++++++++++++++++++++++++++-------
 sys/kern/linker_if.m    | 12 ++++++++
 4 files changed, 126 insertions(+), 28 deletions(-)
Comment 11 commit-hook freebsd_committer freebsd_triage 2021-12-08 22:00:04 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ecd8245e0d7784bcd78dafce233f303eee765068

commit ecd8245e0d7784bcd78dafce233f303eee765068
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-11-07 09:26:26 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-12-08 21:32:30 +0000

    Kernel linkers: add emergency sysctl to restore old behavior

    allowing linking to static symbols from other files.  Default the new
    settings to true, delaying the change of the kernel linker behavior
    for other day.

    Suggested by:   emaste
    PR:     207898
    Reviewed by:    emaste, markj
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D32878

 sys/kern/link_elf.c     |  9 +++++++++
 sys/kern/link_elf_obj.c | 12 ++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)
Comment 12 commit-hook freebsd_committer freebsd_triage 2021-12-15 01:56:22 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4305fd126c77143f3f3205e201ec413ae2012a0f

commit 4305fd126c77143f3f3205e201ec413ae2012a0f
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-11-07 09:26:26 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-12-15 01:41:29 +0000

    Kernel linkers: add emergency sysctl to restore old behavior

    PR:     207898

    (cherry picked from commit ecd8245e0d7784bcd78dafce233f303eee765068)

 sys/kern/link_elf.c     |  9 +++++++++
 sys/kern/link_elf_obj.c | 12 ++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)
Comment 13 commit-hook freebsd_committer freebsd_triage 2021-12-15 01:56:23 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=da536d64b7ee713d0299d817db172ab0352d26a2

commit da536d64b7ee713d0299d817db172ab0352d26a2
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-11-07 08:37:48 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-12-15 01:41:29 +0000

    kernel linker: do not read debug symbol tables for non-debug symbols

    PR:     207898

    (cherry picked from commit 95c20faf11a1af6924f97ec4aafc32d899fea8b0)

 sys/kern/kern_linker.c  |  2 +-
 sys/kern/link_elf.c     | 79 +++++++++++++++++++++++++++++++++++++++----------
 sys/kern/link_elf_obj.c | 61 +++++++++++++++++++++++++++++++-------
 sys/kern/linker_if.m    | 12 ++++++++
 4 files changed, 126 insertions(+), 28 deletions(-)