Summary: | kernel linker behaves differently on amd64 vs. i386 | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Don Lewis <truckman> | ||||
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
Status: | New --- | ||||||
Severity: | Affects Some People | CC: | desk, emaste, franco, jilles | ||||
Priority: | --- | ||||||
Version: | CURRENT | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Most of the kernel linker code is MI, but there is some MD code in /usr/src/sys/{amd64/amd64,i386/i386}/elf_machdep.c. I didn't see anything suspicious there. The MI code is difficult to figure out, but that is where I suspect the problem is. I suspect that whether or not the problem is triggered depends on the order of the relocation entries in the .ko file. On amd64, I see this when I run nm on the .ko file: [snip] U module_register_init 0000000000000000 b msg1 U msg1 0000000000000050 b msg2 U msg2 U strcpy U uprintf On i386, I see this: [snip] U module_register_init U msg1 000014c4 b msg1 U msg2 00001514 b msg2 U strcpy U uprintf Note that the "b" entries for msg1 and msg2 precede the "U" entries on amd64, but the reverse is true on i386. Unfortunately this is difficult to test because swapping the order of SRCS does not change the order as reported by nm. There is another MD aspect of the kernel linker: whether kernel modules are object files (file says "ELF xx-bit yyy relocatable") or DSOs (file says "ELF xx-bit yyy shared object"). Of the architectures you are looking at, i386 uses DSOs and amd64 uses object files. Using object files may reduce overhead slightly but bypasses functionality that may be useful. For example, DSOs have a symbol table for dynamic linking separate from the one for debugging, while object files only have a single symbol table. Although there is a flag for static (local) symbols, the kernel linker ignores it and some code may have started abusing this ignoring. Note that, although i386 kernel modules are DSOs, they are not PIC and do not use a GOT and PLT. Therefore, there is no overhead from the DSO format while running the code. The dummynet AQM patch accidentally abused this when it added two new files that referenced an existing static variable one of the pre-existing source files. This was not caught because the authors only tested on amd64. Why would the linker ignore the flag for local symbols? That seems like it could be the source of difficult to debug problems. > Why would the linker ignore the flag for local symbols? That seems like it could be the source of difficult to debug problems.
I have the same question, and I can confirm that such problems are very difficult to debug. :(
In my case, Module-A, which depends on Modules B and C, is intended to pick up a globally exported symbol from Module-B, but picks up a _local_ symbol of the same name from Module-C (which happens to be scanned by the linker first). I believe that is because of this bug.
I verified this was causing my issue by editing link_elf_lookup_symbol() in sys/kern/link_elf_obj.c to populate the sym pointer and return success only if the symbol has global binding (STB_GLOBAL). With that change, the local symbol from Module-C is ignored, and the global symbol from Module-B is correctly selected.
1. Am I correct that because of this bug, symbol names on amd64 are effectively required to be unique across all kernel module dependencies?
2. Is there any risk to actually fixing this? I tried to understand all of the potential callers of link_elf_lookup_symbol(), but doing so is not straightforward through all the indirect calls (function pointers, macros). Stated differently: is there any expectation that link_elf_lookup_symbol() should return a local symbol?
If anyone knows the answer to my above questions I would still appreciate it. But I wanted to share a bit of history I found...an old freebsd-hackers thread titled "Kernel linker and undefined references in KLD": https://lists.freebsd.org/pipermail/freebsd-hackers/2010-July/032418.html That thread contains a discussion of this issue along with a proposed patch that apparently was never taken forward. One portion of it is quite similar to the small patch I used to validate my theory (only returning STB_GLOBAL symbols from link_elf_lookup_symbol). (In reply to Don Lewis from comment #3) > Why would the linker ignore the flag for local symbols? I suspect there is no satisfying reason. It was probably just never implemented, and had no (or very limited) effect on the set of (relatively simple) in-tree kernel modules. (In reply to Justin Cady from comment #4) > 1. Am I correct that because of this bug, symbol names on amd64 are effectively > required to be unique across all kernel module dependencies? Yes, it appears that is the case. > 2. Is there any risk to actually fixing this? I tried to understand all of the > potential callers of link_elf_lookup_symbol(), but doing so is not > straightforward through all the indirect calls (function pointers, macros). > Stated differently: is there any expectation that link_elf_lookup_symbol() > should return a local symbol? There are two separate risks: 1) changing link_elf_lookup_symbol's beavhiour, if there are callers other than via module loading that depend on the current behaviour Notice that this is already covered in Kostik's patch (in the thread you referenced). It adds link_elf_lookup_debug_symbol which continues searching local symbols. 2) breaking in-tree or out-of-tree kernel modules that depend on this Also in Kostik's patch there were changes to sys/modules/mii and sys/modules/pseudofs, because they were missing entries in EXPORT_SYMS. (I did not look to see if they have been fixed since, or whether there are other newly-introduced cases.) In-tree modules would have to be fixed if/when this patch goes in, but there may well be third party modules that need similar changes. All of that said, this is definitely a bug that should be fixed. I propose: 1) augment Kostik's patch to add a sysctl controlling whether kernel module symbol resolution uses local symbols or not 2) initially set it to maintain the current behaviour (use local symbols) 3) commit to main, send a call for testing 4) MFC to stable/13 (as well as stable/12 if the merge is straightforward) maintaining current behaviour 5) fix any in-tree modules 6) toggle the the sysctl in main (i.e., no longer search local symbols in kernel module lookup) 7) consider emitting a warning if the sysctl is changed by the user 8) wait for FreeBSD 14.0's release 9) remove the sysctl, leaving only no-local-lookup for kernel modules (In reply to Ed Maste from comment #6) Thank you, Ed. I appreciate the answers to those questions, and your proposal for resolving this issue looks good to me. I have not exhaustively searched in-tree modules for missing entries in EXPORT_SYMS, but I am aware of a few that will need updated as a part of this: dtrace.ko and opensolaris.ko, for example. I revived by 10+ years old patches in https://reviews.freebsd.org/D32878 WRT to sysctl, I am not sure that this is soo great idea, because nobody would toggle it to help debug. Might be, it indeed makes sense as the emergency revert switch, indeed. (In reply to Konstantin Belousov from comment #8) I agree wide-scale testing is unlikely, but I'm sure some of our downstreams would test. In any case, having it as an emergency revert option seems useful and although I wouldn't change the behaviour in a stable branch having it available as opt-in also seems like it could have some use. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=95c20faf11a1af6924f97ec4aafc32d899fea8b0 commit 95c20faf11a1af6924f97ec4aafc32d899fea8b0 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-11-07 08:37:48 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-12-08 21:32:29 +0000 kernel linker: do not read debug symbol tables for non-debug symbols In particular, this prevents resolving locals from other files. To access debug symbol tables, add LINKER_LOOKUP_DEBUG_SYMBOL and LINKER_DEBUG_SYMBOL_VALUES kobj methods, which are allowed to use any types of present symbols in all tables. PR: 207898 Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32878 sys/kern/kern_linker.c | 2 +- sys/kern/link_elf.c | 79 +++++++++++++++++++++++++++++++++++++++---------- sys/kern/link_elf_obj.c | 61 +++++++++++++++++++++++++++++++------- sys/kern/linker_if.m | 12 ++++++++ 4 files changed, 126 insertions(+), 28 deletions(-) A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ecd8245e0d7784bcd78dafce233f303eee765068 commit ecd8245e0d7784bcd78dafce233f303eee765068 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-11-07 09:26:26 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-12-08 21:32:30 +0000 Kernel linkers: add emergency sysctl to restore old behavior allowing linking to static symbols from other files. Default the new settings to true, delaying the change of the kernel linker behavior for other day. Suggested by: emaste PR: 207898 Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32878 sys/kern/link_elf.c | 9 +++++++++ sys/kern/link_elf_obj.c | 12 ++++++++++-- 2 files changed, 19 insertions(+), 2 deletions(-) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4305fd126c77143f3f3205e201ec413ae2012a0f commit 4305fd126c77143f3f3205e201ec413ae2012a0f Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-11-07 09:26:26 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-12-15 01:41:29 +0000 Kernel linkers: add emergency sysctl to restore old behavior PR: 207898 (cherry picked from commit ecd8245e0d7784bcd78dafce233f303eee765068) sys/kern/link_elf.c | 9 +++++++++ sys/kern/link_elf_obj.c | 12 ++++++++++-- 2 files changed, 19 insertions(+), 2 deletions(-) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=da536d64b7ee713d0299d817db172ab0352d26a2 commit da536d64b7ee713d0299d817db172ab0352d26a2 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2021-11-07 08:37:48 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2021-12-15 01:41:29 +0000 kernel linker: do not read debug symbol tables for non-debug symbols PR: 207898 (cherry picked from commit 95c20faf11a1af6924f97ec4aafc32d899fea8b0) sys/kern/kern_linker.c | 2 +- sys/kern/link_elf.c | 79 +++++++++++++++++++++++++++++++++++++++---------- sys/kern/link_elf_obj.c | 61 +++++++++++++++++++++++++++++++------- sys/kern/linker_if.m | 12 ++++++++ 4 files changed, 126 insertions(+), 28 deletions(-) |
Created attachment 168000 [details] example kernel module source that illustrates the differing kernel linker behavior on amd64 vs i386 If one source file in a kernel module defines a symbol as static and another declares it as extern, the module fails to load on i386, with the kernel logging a message about the symbol being undefined. I believe this is the correct behavior. On amd64, the module loads and the code in the second source file is able to access the static variable in the first source file. In the attached example, the main module source file is able to access static character arrays in the other source file when loaded on an amd64 machine. The behavior is the same on FreeBSD 10.1 through recent 11.0-CURRENT. FreeBSD 9.x has not been tested.