Bug 255698

Summary: dlerror() returns non-NULL after a successful call to dlopen()/dlsym()
Product: Base System Reporter: Eugene M. Kim <astralblue>
Component: binAssignee: Konstantin Belousov <kib>
Status: Closed FIXED    
Severity: Affects Only Me CC: kib, markj
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Eugene M. Kim 2021-05-08 07:10:55 UTC
dlerror() used to return NULL after a successful call to other dl family of functions, such as dlopen() and dlsym().  Starting recently, dlerror() started returning non-NULL message after such successful calls.  The message is initially an empty string, or an error message most recently seen—even if the most recent dlXXXX() call was successful.

Test program:

#include <dlfcn.h>
#include <stdio.h>

#define PRINT_DLERROR() do { \
	char const *err = dlerror(); \
	printf("dlerror()=%p", err); \
	if (err) printf(" (%s)", err); \
	printf("\n"); \
} while (0)

void
test_dlopen(char const *path)
{
	void *handle = dlopen(path, RTLD_NOW);
	printf("dlopen(%s, RTLD_NOW)=%p\n", path, handle);
	PRINT_DLERROR();
	PRINT_DLERROR();
	if (handle)
		dlclose(handle);
}

int
main()
{
	PRINT_DLERROR();
	PRINT_DLERROR();
	test_dlopen("/foobar.so");
	test_dlopen("/usr/lib/libm.so");
	return 0;
}

Expected result (taken on a 12.2-RELEASE-p2 jail, note dlerror() returns NULL upon a successful dlopen()):

dlerror()=0x0
dlerror()=0x0
dlopen(/foobar.so, RTLD_NOW)=0x0
dlerror()=0x8002296c0 (Cannot open "/foobar.so")
dlerror()=0x0
dlopen(/usr/lib/libm.so, RTLD_NOW)=0x80022e800
dlerror()=0x0
dlerror()=0x0

Actual result (taken on a -CURRENT system, built from main branch commit 34abb05be77a32a2b2673fb4b85eaa53a3d1446e, note dlerror() returns non-NULL on the first dlerror() call, even after a successful dlopen() and also before the first dlopen(), also note the dlerror() message after a successful dlopen() is from the previous failure):

34abb05be77a32a2b2673fb4b85eaa53a3d1446e
Comment 1 Eugene M. Kim 2021-05-08 07:14:30 UTC
Actual result (taken on a -CURRENT system, built from main branch commit 34abb05be77a32a2b2673fb4b85eaa53a3d1446e, note dlerror() returns non-NULL on the first dlerror() call, even after a successful dlopen() and also before the first dlopen(), also note the dlerror() message after a successful dlopen() is from the previous failure):

dlerror()=0x800222770 ()
dlerror()=0x0
dlopen(/foobar.so, RTLD_NOW)=0x0
dlerror()=0x800222770 (Cannot open "/foobar.so")
dlerror()=0x0
dlopen(/usr/lib/libm.so, RTLD_NOW)=0x800226808
dlerror()=0x800222770 (Cannot open "/foobar.so")
dlerror()=0x0
Comment 2 Eugene M. Kim 2021-05-08 07:17:43 UTC
I suspect main branch commit 4d9128da54f8f8e2a29190ffb18880c4f116a205 might have introduced this.  FWIW, this breaks software that checks for successful calls to dlopen()/dlsym() using dlerror() == NULL (audio/jack is one of them.)
Comment 3 commit-hook freebsd_committer freebsd_triage 2021-05-10 23:48:47 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=630caa95d46191220dd457c2ae2d06460cb4f71b

commit 630caa95d46191220dd457c2ae2d06460cb4f71b
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-05-10 19:02:19 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-05-10 23:47:00 +0000

    rtld: preserve the 'seen' state of the dlerror message in errmsg_save()

    rtld preserves its current error message around calls to user init/fini
    lists, to not override original error with potential secondary errors
    caused by user code recursing into rtld.  After 4d9128da54f8f8e2a29190,
    the preservation of the string itself is not enough, the 'seen'
    indicator must be preserved as well.  Otherwise, since new code does not
    clear string (it cannot), call to _rtld_error() from errmsg_restore()
    revived whatever message was consumed last.

    Change errmsg_save() to return structure recording both 'seen' indicator
    and the message, if any.

    PR:     255698
    Reported by:    Eugene M. Kim <astralblue@gmail.com>
    Sponsored by:   The FreeBSD Foundation
    MFC after:      3 days

 libexec/rtld-elf/rtld.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2021-05-10 23:57:08 UTC
I committed the fix for the issue reproduced by your example.
It is only for dlopen/dlclose.

But you also mentioned that there is a similar issue with dlsym.
Can you provide a sample demonstrating it (or confirm that after the fix
above, there is no issues you see).

Thanks.
Comment 5 Eugene M. Kim 2021-05-11 22:51:48 UTC
Hi Konstantin, thank you for fixing this!  I rebuilt my world and the dltest program now outputs this:

dlerror()=0x8002227f0 ()
dlerror()=0x0
dlopen(/foobar.so, RTLD_NOW)=0x0
dlerror()=0x8002227f0 (Cannot open "/foobar.so")
dlerror()=0x0
dlopen(/usr/lib/libm.so, RTLD_NOW)=0x800226808
dlerror()=0x0
dlerror()=0x0

This is very close to the expected output, except the first call to dlerror() still returns a non-NULL (albeit empty) message, instead of NULL.  The wording of dlerror(3) makes me believe that returning NULL is more appropriate in this case: "The dlerror() function returns a null-terminated character string describing the last error that occurred during a call to dlopen(), dladdr(), dlinfo(), dlsym(), dlvsym(), dlfunc(), or dlclose(). If no such error has occurred, dlerror() returns a null pointer."

From practical POV, your fix does address the use case where callers check for errors using dlerror() == NULL, as such checks always occur after a call to one of those other functions.
Comment 6 Eugene M. Kim 2021-05-11 22:52:10 UTC
As for the dlsym() case you asked, I must have been mistaken, because I myself could not reproduce the case (before the world build).  Program used:

#include <dlfcn.h>
#include <stdarg.h>
#include <stdio.h>

#define PRINT_DLERROR() do { \
	char const *err = dlerror(); \
	printf("dlerror()=%p", err); \
	if (err) printf(" (%s)", err); \
	printf("\n"); \
} while (0)

void
test_dlsym(char const *path, ...)
{
	va_list va;
	void *handle = dlopen(path, RTLD_NOW);
	char const *symbol;
	void *addr;
	printf("dlopen(%s, RTLD_NOW)=%p\n", path, handle);
	PRINT_DLERROR();
	PRINT_DLERROR();
	va_start(va, path);
	while ((symbol = va_arg(va, char const *)) != NULL) {
		addr = dlsym(handle, symbol);
		printf("dlsym(%s, %s) = %p\n", path, symbol, addr);
		PRINT_DLERROR();
		PRINT_DLERROR();
	}
	if (handle)
		dlclose(handle);
}

int
main()
{
	PRINT_DLERROR();
	PRINT_DLERROR();
	test_dlsym("/foobar.so", 0);
	test_dlsym("/usr/lib/libm.so", "omgwtf", "sqrt", 0);
	return 0;
}

Output:

dlerror()=0x800222770 ()
dlerror()=0x0
dlopen(/foobar.so, RTLD_NOW)=0x0
dlerror()=0x800222770 (Cannot open "/foobar.so")
dlerror()=0x0
dlopen(/usr/lib/libm.so, RTLD_NOW)=0x800226808
dlerror()=0x800222770 (Cannot open "/foobar.so")
dlerror()=0x0
dlsym(/usr/lib/libm.so, omgwtf) = 0x0
dlerror()=0x800222770 (Undefined symbol "omgwtf")
dlerror()=0x0
dlsym(/usr/lib/libm.so, sqrt) = 0x8006a3900
dlerror()=0x0
dlerror()=0x0

(Note the first dlerror() after dlsym(libm.so, sqrt) correctly returned NULL.  I thought I had seen a non-NULL message here; it seems my memory defied me. :-p)
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2021-05-11 23:39:47 UTC
The following should fix the initial dlerror() state.

diff --git a/libexec/rtld-elf/rtld_lock.c b/libexec/rtld-elf/rtld_lock.c
index 94e931c2f760..e501c03f0722 100644
--- a/libexec/rtld-elf/rtld_lock.c
+++ b/libexec/rtld-elf/rtld_lock.c
@@ -60,7 +60,7 @@ void _rtld_atfork_pre(int *) __exported;
 void _rtld_atfork_post(int *) __exported;
 
 static char def_dlerror_msg[512];
-static int def_dlerror_seen_val;
+static int def_dlerror_seen_val = 1;
 
 static char *
 def_dlerror_loc(void)
Comment 8 commit-hook freebsd_committer freebsd_triage 2021-05-12 23:32:35 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=529ab5a75925c9c1eeea0b2712911048119d06ae

commit 529ab5a75925c9c1eeea0b2712911048119d06ae
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-05-11 23:36:09 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-05-12 23:31:29 +0000

    rtld: initialize default dlerror_seen_val as true

    There should be no error after startup.

    PR:     255698
    Reported by:    Eugene M. Kim <astralblue@gmail.com>
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week

 libexec/rtld-elf/rtld_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 9 commit-hook freebsd_committer freebsd_triage 2021-05-19 03:14:05 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=29b896e67ddb055d0b6878ded41738b6a294db0e

commit 29b896e67ddb055d0b6878ded41738b6a294db0e
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-05-10 19:02:19 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-05-19 03:13:14 +0000

    rtld: preserve the 'seen' state of the dlerror message in errmsg_save()

    PR:     255698

    (cherry picked from commit 630caa95d46191220dd457c2ae2d06460cb4f71b)

 libexec/rtld-elf/rtld.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)
Comment 10 commit-hook freebsd_committer freebsd_triage 2021-05-19 03:14:06 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=db4a2731e9ec9723038dc3a63d963336cda8aecd

commit db4a2731e9ec9723038dc3a63d963336cda8aecd
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2021-05-11 23:36:09 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2021-05-19 03:13:14 +0000

    rtld: initialize default dlerror_seen_val as true

    PR:     255698

    (cherry picked from commit 529ab5a75925c9c1eeea0b2712911048119d06ae)

 libexec/rtld-elf/rtld_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)