Bug 207898 - kernel linker behaves differently on amd64 vs. i386
Summary: kernel linker behaves differently on amd64 vs. i386
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
Depends on:
Reported: 2016-03-11 09:27 UTC by Don Lewis
Modified: 2016-03-14 20:01 UTC (History)
2 users (show)

See Also:

example kernel module source that illustrates the differing kernel linker behavior on amd64 vs i386 (862 bytes, application/x-tar)
2016-03-11 09:27 UTC, Don Lewis
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Don Lewis freebsd_committer 2016-03-11 09:27:29 UTC
Created attachment 168000 [details]
example kernel module source that illustrates the differing kernel linker behavior on amd64 vs i386

If one source file in a kernel module defines a symbol as static and another declares it as extern, the module fails to load on i386, with the kernel logging a message about the symbol being undefined.  I believe this is the correct behavior.  On amd64, the module loads and the code in the second source file is able to access the static variable in the first source file.

In the attached example, the main module source file is able to access static character arrays in the other source file when loaded on an amd64 machine.

The behavior is the same on FreeBSD 10.1 through recent 11.0-CURRENT.  FreeBSD 9.x has not been tested.
Comment 1 Don Lewis freebsd_committer 2016-03-12 07:56:36 UTC
Most of the kernel linker code is MI, but there is some MD code in /usr/src/sys/{amd64/amd64,i386/i386}/elf_machdep.c.  I didn't see anything suspicious there.

The MI code is difficult to figure out, but that is where I suspect the problem is.  I suspect that whether or not the problem is triggered depends on the order of the relocation entries in the .ko file.  On amd64, I see this when I run nm on the .ko file:

                 U module_register_init
0000000000000000 b msg1
                 U msg1
0000000000000050 b msg2
                 U msg2
                 U strcpy
                 U uprintf

On i386, I see this:

         U module_register_init
         U msg1
000014c4 b msg1
         U msg2
00001514 b msg2
         U strcpy
         U uprintf

Note that the "b" entries for msg1 and msg2 precede the "U" entries on amd64, but the reverse is true on i386.

Unfortunately this is difficult to test because swapping the order of SRCS does not change the order as reported by nm.
Comment 2 Jilles Tjoelker freebsd_committer 2016-03-13 22:50:56 UTC
There is another MD aspect of the kernel linker: whether kernel modules are object files (file says "ELF xx-bit yyy relocatable") or DSOs (file says "ELF xx-bit yyy shared object"). Of the architectures you are looking at, i386 uses DSOs and amd64 uses object files.

Using object files may reduce overhead slightly but bypasses functionality that may be useful. For example, DSOs have a symbol table for dynamic linking separate from the one for debugging, while object files only have a single symbol table. Although there is a flag for static (local) symbols, the kernel linker ignores it and some code may have started abusing this ignoring.

Note that, although i386 kernel modules are DSOs, they are not PIC and do not use a GOT and PLT. Therefore, there is no overhead from the DSO format while running the code.
Comment 3 Don Lewis freebsd_committer 2016-03-14 20:01:32 UTC
The dummynet AQM patch accidentally abused this when it added two new files that referenced an existing static variable one of the pre-existing source files.  This was not caught because the authors only tested on amd64.

Why would the linker ignore the flag for local symbols?  That seems like it could be the source of difficult to debug problems.