Bug 246462

Summary: rtld/dlopen incorrect resolution of symbols [RTLD_DEEPBIND]
Product: Base System Reporter: Martin Birgmeier <d8zNeCFG>
Component: binAssignee: Konstantin Belousov <kib>
Status: Closed FIXED    
Severity: Affects Only Me CC: emaste, kib, markj
Priority: ---    
Version: 12.1-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
patches to upgrade postgresql-mysql_fdw from 2.5.1 to 2.5.3
none
patches to upgrade postgresql-mysql_fdw from 2.5.1 to 2.5.3, updated
none
patches to upgrade postgresql-mysql_fdw from 2.5.1 to 2.5.3, updated again to use -Bsymbolic instead of RTLD_DEEPBIND none

Description Martin Birgmeier 2020-05-14 08:14:57 UTC
Introduction: This issue arose in the context of trying to upgrade a port (databases/postgresql-mysql_fdw) but is actually about the behavior of rtld/dlopen which seems to have changed (probably with the switch to clang/llvm).

Scenario:
- System running FreeBSD 12.1-RELEASE-p4 #4 r360692M
- Ports at latest, mysql57-server and postgresql12-server installed
- Trying to upgrade databases/postgresql-mysql_fdw from its current 2.5.1 to 2.5.3

Note: databases/postgresql-mysql_fdw is a foreign data wrapper which enables a Postgres server to access remote MySQL databases.

Scenario (continued):
- The upgrade is straightforward, just adapt the Makefile and distinfo, and it compiles and can be installed.
- After installation, the local database is extended to be able to access the remote MySQL database using
    CREATE EXTENSION mysql_fdw;
    CREATE_SERVER ...
    CREATE USER MAPPING ...
    GRANT USAGE ON FOREIGN SERVER ...
    CREATE SCHEMA ...
    IMPORT FOREIGN SCHEMA ...
- Then a foreign table is accessed using SELECT * FROM <foreign table>;

Result:
- When accessing the foreign table, the first access retrieves a correct result, and then the postgres process handling the postgres database client crashes with segmentation violation (signal 11), producing a core file.

Scenario (continued):
- Debugging the core file using lldb -c /tmp/postgres.core /usr/local/bin/postgres

Result:
(lldb) thread backtrace
* thread #1, name = 'postgres', stop reason = signal SIGSEGV
  * frame #0: 0x00000000009002eb postgres`pfree + 11
    frame #1: 0x00000000006a50de postgres`list_delete + 206
    frame #2: 0x000000080b4f7d36 libmysqlclient.so`mysql_stmt_close + 102

Explanation: I believe what is happening here is that mysql_stmt_close calls list_delete, but this list_delete gets resolved to the version in the postgres binary instead of the function of the same name in libmysqlclient.so. This is because MySQL and Postgres are using the same names for different functions of their own, and in this case the wrong one is being called.

A while ago, when the databases/postgresql-mysql_fdw port was last upgraded some years ago, this seemingly was not the case. Specifically, in the source code of this port one finds the following comment:

/*
 * mysql_load_library function dynamically load the mysql's library
 * libmysqlclient.so. The only reason to load the library using dlopen
 * is that, mysql and postgres both have function with same name like
 * "list_delete", "list_delete" and "list_free" which cause compiler
 * error "duplicate function name" and erroneously linking with a function.
 * This port of the code is used to avoid the compiler error.
 *
 * #define list_delete mysql_list_delete
 * #include <mysql.h>
 * #undef list_delete
 *
 * But system crashed on function mysql_stmt_close function because
 * mysql_stmt_close internally calling "list_delete" function which
 * wrongly binds to postgres' "list_delete" function.
 *
 * The dlopen function provides a parameter "RTLD_DEEPBIND" which
 * solved the binding issue.
 *
 * RTLD_DEEPBIND:
 * Place the lookup scope of the symbols in this library ahead of the
 * global scope. This means that a self-contained library will use its
 * own symbols in preference to global symbols with the same name contained
 * in libraries that have already been loaded.
 */
bool
mysql_load_library(void)
{
#if defined(__APPLE__) || defined(__FreeBSD__)
        /*
         * Mac OS/BSD does not support RTLD_DEEPBIND, but it still
         * works without the RTLD_DEEPBIND
         */
        mysql_dll_handle = dlopen(_MYSQL_LIBNAME, RTLD_LAZY);
#else
        mysql_dll_handle = dlopen(_MYSQL_LIBNAME, RTLD_LAZY | RTLD_DEEPBIND);
#endif
        if(mysql_dll_handle == NULL)
                return false;

Conclusion: It seems that the behavior of the runtime loader changed such that now the executable's symbols are searched before a library's symbols when resolving references arising from that library (in this case, libmysqlclient.so). It further seems that there are two possible resolutions: Either re-introduce the old behavior (which made it unnecessary to use RTLD_DEEPBIND in FreeBSD), or implement RTLD_DEEPBIND for FreeBSD.

-- Martin
Comment 1 Martin Birgmeier 2020-05-14 08:34:31 UTC
Created attachment 214480 [details]
patches to upgrade postgresql-mysql_fdw from 2.5.1 to 2.5.3

Just for reference, here are the patches for upgrading databases/postgresql-mysql_fdw from 2.5.1 to 2.5.3; they enable the port to be used with Postgres 12 and also add a small patch for mapping more data types from MySQL to Postgres.

When the rtld issue is resolved these patches should be applied in ports.

-- Martin
Comment 2 Ed Maste freebsd_committer freebsd_triage 2020-05-14 17:40:00 UTC
What FreeBSD version was working?
Comment 3 Martin Birgmeier 2020-05-14 18:22:18 UTC
I can't tell. I just started working on this port now, and only from the comment in the code (as cited) I was assuming it had been working at some time.

Most likely it is working on Linux, which seems to have RTLD_DEEPBIND.

But your question is valid - maybe it was never really working on FreeBSD.

-- Martin
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2020-05-14 18:27:34 UTC
Show the minimal example that worked and then stopped.

Generally, what you describe as the bug is an expected behavior: symbols are resolved using the global order of the loaded dso.  If some object wants
a priority for its own symbols, it should specify -Bsymbolic during linking.
Usually it is the object that determines its own correct mode, which means that
it is strange to tune the resulution mode by a dlopen flag.

Is there a description of the RTLD_DEEPBIND semantic ?
Comment 5 Martin Birgmeier 2020-05-14 18:43:12 UTC
Well, I can't "show the minimal example that worked and then stopped" because I only inferred that it has been working at some previous time from the code of mysql_fdw I quoted.

RTLD_DEEPBIND is described in https://linux.die.net/man/3/dlopen

But what you write might be the solution. Does it mean that if the library built by mysql57-client is linked with -Bsymbolic it will, when loaded, preferentially resolve the symbols it requires with those found in the library itself? Basically what is required that mysql_fdw resolves its symbols in the standard way (thereby normally to the libraries installed by postgresql12-client and any other libraries) and only the symbols in libmysqlclient.so should be resolved preferentially to those in libmysqlclient.so itself.

If yes then the linker flags in databases/mysql57-client need to be changed.

-- Martin
Comment 6 Konstantin Belousov freebsd_committer freebsd_triage 2020-05-14 19:10:42 UTC
(In reply to Martin Birgmeier from comment #5)
In principle yes, -Bsymbolic flag is better solution than RTLD_DEEPBIND, because
it is the property of the object, not the caller, to require specific order of
resolution.

But on other principle of adding missed functions to our rtld, could you please
test this patch before using -Bsymbolic:
https://reviews.freebsd.org/D24841
Comment 7 Martin Birgmeier 2020-05-15 11:22:59 UTC
Created attachment 214517 [details]
patches to upgrade postgresql-mysql_fdw from 2.5.1 to 2.5.3, updated

Hi Konstantin,

That was quick - and D24841 is a nice trick!

I applied the patches from D24841 and reinstalled libexec/ld-elf.so.1, libexec/ld-elf32.so.1, and usr/include/dlfcn.h. (I had to adapt them minimally to apply to releng/12.1.) I also modified the port to use RTLD_DEEPBIND for FreeBSD - the new patches are included.

And voila, it is working!

Nonetheless, your original remarks show the better solution: libmysqlclient.so should be compiled with -Bsymbolic (I did not do this in the test above, to be sure). Is there a way to set this flag directly on an already installed .so file?

Thank you,

Martin
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2020-05-15 11:46:01 UTC
(In reply to Martin Birgmeier from comment #7)
Thank you for the testing.

I am not aware of a tool to patch flags in  the linked ELF objects, except of a hex editor (which I sometimes do).  Adding -Bsymbolic to the linker command line, e.g. in a form of -Wl,-Bsymbolic if called from cc(1) driver, is a way to go.
Comment 9 commit-hook freebsd_committer freebsd_triage 2020-05-15 11:58:11 UTC
A commit references this bug:

Author: kib
Date: Fri May 15 11:58:02 UTC 2020
New revision: 361073
URL: https://svnweb.freebsd.org/changeset/base/361073

Log:
  Implement RTLD_DEEPBIND.

  PR:	246462
  Tested by:	Martin Birgmeier <d8zNeCFG@aon.at>
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week
  Differential revision:	https://reviews.freebsd.org/D24841

Changes:
  head/include/dlfcn.h
  head/lib/libc/gen/dlopen.3
  head/libexec/rtld-elf/rtld.c
  head/libexec/rtld-elf/rtld.h
Comment 10 commit-hook freebsd_committer freebsd_triage 2020-05-15 13:50:29 UTC
A commit references this bug:

Author: kib
Date: Fri May 15 13:50:08 UTC 2020
New revision: 361077
URL: https://svnweb.freebsd.org/changeset/base/361077

Log:
  Implement RTLD_DEEPBIND.

  PR:	246462
  Tested by:	Martin Birgmeier <d8zNeCFG@aon.at>
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week
  Differential revision:	https://reviews.freebsd.org/D24841

Changes:
  head/sys/kern/kern_mib.c
Comment 11 commit-hook freebsd_committer freebsd_triage 2020-05-22 13:15:30 UTC
A commit references this bug:

Author: kib
Date: Fri May 22 13:14:22 UTC 2020
New revision: 361380
URL: https://svnweb.freebsd.org/changeset/base/361380

Log:
  MFC r361073:
  Implement RTLD_DEEPBIND.

  PR:	246462

Changes:
_U  stable/12/
  stable/12/include/dlfcn.h
  stable/12/lib/libc/gen/dlopen.3
  stable/12/libexec/rtld-elf/rtld.c
  stable/12/libexec/rtld-elf/rtld.h
Comment 12 commit-hook freebsd_committer freebsd_triage 2020-05-22 13:19:34 UTC
A commit references this bug:

Author: kib
Date: Fri May 22 13:18:44 UTC 2020
New revision: 361382
URL: https://svnweb.freebsd.org/changeset/base/361382

Log:
  MFC r361073:
  Implement RTLD_DEEPBIND.

  PR:	246462

Changes:
_U  stable/11/
  stable/11/include/dlfcn.h
  stable/11/lib/libc/gen/dlopen.3
  stable/11/libexec/rtld-elf/rtld.c
  stable/11/libexec/rtld-elf/rtld.h
Comment 13 Martin Birgmeier 2020-05-25 19:00:29 UTC
Created attachment 214850 [details]
patches to upgrade postgresql-mysql_fdw from 2.5.1 to 2.5.3, updated again to use -Bsymbolic instead of RTLD_DEEPBIND

Attached are an updated set of patches which implement the "better" way of making libmysqlclient.so preferentially resolve symbols within this library itself by adding "-Bsymbolic" when building this library.

Conversely, RTLD_DEEPBIND is not used anymore in postgresql-mysql_fdw.

A few quick tests show that mysql57, postgresql12, and postgresql-mysql_fdw are working as expected with these changes.

-- Maritn