Bug 199800

Summary: Thread Stack Size - Segmentation Fault
Product: Base System Reporter: mr-spott
Component: threadsAssignee: freebsd-threads (Nobody) <threads>
Status: Closed Feedback Timeout    
Severity: Affects Many People CC: emaste, jhb, markj, mr-spott
Priority: ---    
Version: 10.1-RELEASE   
Hardware: Any   
OS: Any   

Description mr-spott 2015-04-30 10:11:10 UTC
Original Post: https://forums.freebsd.org/threads/thread-stack-size-segmentation-fault.51419/

It seems like FreeBSD has some serious issue with the thread stack size.

# =====================================================================

Since quite some time I have problems with two applications

    OpenLDAP
    Munin

Both seem to suffer under an issue of the same origin. Both receive a SIGSEGV at some sporadic point of execution. No clear pattern of an exact reproduction can be extracted of all debugging done this far.

    Munin runs with more than around 100 plug-ins and about 150 graphs (times 4x, for day, week, month, year).
    OpenLDAP actually does not have that much of an workload. Regular PAM via nslcd(8). Some crashes happened every couple of hours or even only days. I wrote a supervisor script, in order to have a workaround for production usage. But since the mail server (Dovecot & Postfix) are connected to it - LDAP crashes with segmentation fault after a few seconds of work. Notice, that there is only one single test user connected. So now I am at the point where my logs are flooded with connection errors more then with successful connections. The workaround does not qualify to be usable any longer.

This was enough for me to finally go into deeper debugging, than done before related to this issue. It turns out this might pretty much possible a FreeBSD OS bug. I was not happy to find this out, cause I would love to continue using FreeBSD for my server environment.


Here are the relevant links about the OpenLDAP related SIGSEGV issue and the debug output:

FreeBSD Forum:

    https://forums.freebsd.org/threads/openldap-slapd-dies-sporadically.47634/


OpenLDAP Mailing List:

    http://www.openldap.org/lists/openldap-technical/201504/msg00220.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00228.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00237.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00241.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00238.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00239.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00248.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00249.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00250.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00254.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00255.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00256.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00257.html
    http://www.openldap.org/lists/openldap-technical/201504/msg00282.html

Link 11, 12 and 13 point out that it is likely to be a FreeBSD problem described on:

    http://www.openldap.org/lists/openldap-bugs/200506/msg00174.html
    http://lists.freebsd.org/pipermail/freebsd-current/2014-August/051646.html


Also ldd(1) shows a correct link to libthr.so.3

root@FreeBSD [~]$ ldd /usr/local/libexec/slapd
/usr/local/libexec/slapd:
  libldap_r-2.4.so.2 => /usr/local/lib/libldap_r-2.4.so.2 (0x8009a7000)
  liblber-2.4.so.2 => /usr/local/lib/liblber-2.4.so.2 (0x800bf5000)
  libltdl.so.7 => /usr/local/lib/libltdl.so.7 (0x800e03000)
  libcrypt.so.5 => /lib/libcrypt.so.5 (0x80100c000)
  libwrap.so.6 => /usr/lib/libwrap.so.6 (0x80122c000)
  libssl.so.7 => /usr/lib/libssl.so.7 (0x801435000)
  libcrypto.so.7 => /lib/libcrypto.so.7 (0x8016a0000)
  libthr.so.3 => /lib/libthr.so.3 (0x801a93000)
  libc.so.7 => /lib/libc.so.7 (0x801cb8000)


root@FreeBSD [~]$ ls -lach /lib/libthr.so.3
-r--r--r--  1 root  wheel  103K 18 Jan 15:36 /lib/libthr.so.3
Comment 1 mr-spott 2015-05-01 17:54:36 UTC
I just re-compiled the entire system with the same configuration, and it is still the same results. Why is this happening? Please let me know if I can provide some more information to fix this as soon as possible.

Thanks
Comment 2 John Baldwin freebsd_committer freebsd_triage 2015-05-09 11:09:22 UTC
Try setting the environment variable LIBPTHREAD_BIGSTACK_MAIN before starting slapd.  This will use the entire RLIMIT_STACK for the initial thread.  This is now the default behavior in 10-stable and will be the default in 10.2 (so 10.2 should work out of the box).
Comment 3 Ed Maste freebsd_committer freebsd_triage 2016-02-29 20:23:34 UTC
Can you confirm that this is fixed either in 10.2+ or with LIBPTHREAD_BIGSTACK_MAIN?
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2020-04-03 15:00:06 UTC
This PR hasn't seen any activity in a while and it seems that the problem should be fixed on recent versions of FreeBSD.  Please reopen if it is still a problem.