Bug 122750 - net/openldap24-client: nss_ldap not working correctly with OpenLDAP 2.4
Summary: net/openldap24-client: nss_ldap not working correctly with OpenLDAP 2.4
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: Xin LI
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-14 17:20 UTC by Ulrich Spoerlein
Modified: 2010-11-12 20:51 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Spoerlein 2008-04-14 17:20:04 UTC
Ever since I upgraded my LDAP servers to 2.4, *all* of them have some classes
of problems related to LDAP and NSS.

For example, during bootup, some assertions trigger (these are gone, after
the system has finished boot-up)

<dmesg>
Starting privoxy.
Assertion failed: (r != NULL), function ldap_parse_result, file error.c, line 272.
pid 1261 (csh), uid 201: exited on signal 6 (core dumped)

It is *always* privoxy, that is effected. When I was still running
dbus/hald/policykit, they would crash on boot up too. Once I've logged in, I
can restart the services just fine.

But logging in is not working for 60-90 seconds after the getty prompt appears.
I enter my username, then it hangs for several seconds (20-30) and drops me
back to login with an LDAP error.

The third try usually is the charm ...

One very annoying thing is, that I continually get errors like this:
Apr 14 13:43:05 roadrunner sudo: nss_ldap: could not search LDAP server - Server is unavailable
Apr 14 13:43:05 roadrunner sudo: nss_ldap: could not search LDAP server - Server is unavailable
Apr 14 13:43:33 roadrunner xterm: nss_ldap: could not search LDAP server - Server is unavailable
Apr 14 13:43:34 roadrunner xterm: nss_ldap: could not search LDAP server - Server is unavailable
Apr 14 13:47:37 roadrunner sudo: nss_ldap: could not search LDAP server - Server is unavailable
Apr 14 13:47:40 roadrunner xterm: nss_ldap: could not search LDAP server - Server is unavailable
Apr 14 13:47:41 roadrunner xterm: nss_ldap: could not search LDAP server - Server is unavailable

Please note, that LDAP and NSS are set up correctly and they *work*, the
message above is totally bogus!

Another weird thing that has started right around when I switched to OpenLDAP
2.4 is the groups for my user are gone, when under X. Running id(1) on the
console lists all the groups I'm a member of. Running id(1) inside an xterm I
get *no* secondary groups. This is also true, when logging in via ssh.

getent(1) on the other hand works fine.

How-To-Repeat: Upgrade your LDAP client installation from OpenLDAP 2.3 to 2.4. Rebuild nss_ldap and pam_ldap
ports.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2008-04-14 21:23:41 UTC
Responsible Changed
From-To: freebsd-ports-bugs->delphij

Over to maintainer.
Comment 2 Ulrich Spoerlein 2008-05-05 18:40:59 UTC
New data point: Everything seems to work fine, when using ldapi:// URLs
(which is not possible, during system startup, though).

If nss_ldap.conf contains:
uri ldaps://roadrunner.spoerlein.net/ ldaps://coyote.spoerlein.net/ ldaps://vpn.spoerlein.net

Then id(1) under xterm produces:
% id
May  5 19:24:20 roadrunner xterm: nss_ldap: could not search LDAP server - Server is unavailable
uid=1000(uqs) gid=1000(uqs) groups=1000(uqs)

If I change the URIs to
uri ldapi://%2fvar%2frun%2fopenldap%2fldapi/ ldaps://roadrunner.spoerlein.net/ ldaps://coyote.spoerlein.net/ ldaps://vpn.spoerlein.net

I get
% id
uid=1000(uqs) gid=1000(uqs) groups=1000(uqs),999(users),80(www),0(wheel),5(operator),69(network),68(dialer),13(games)


It breaks even more, if I turn on start_tls and use ldapi:// + ldap://
URLs instead. ldapsearch(1) can cope with this setting, but nss_ldap
simply barfs.

nss_ldap via id(1)
May  5 19:26:11 roadrunner slapd[1249]: conn=721 fd=57 ACCEPT from PATH=/var/run/openldap/ldapi (PATH=/var/run/openldap/ldapi)
May  5 19:26:11 roadrunner slapd[1249]: conn=721 op=0 EXT oid=1.3.6.1.4.1.1466.20037
May  5 19:26:11 roadrunner slapd[1249]: conn=721 op=0 STARTTLS
May  5 19:26:11 roadrunner slapd[1249]: conn=721 op=0 RESULT oid= err=0 text=
May  5 19:26:11 roadrunner slapd[1249]: conn=721 fd=57 TLS established tls_ssf=256 ssf=256
May  5 19:26:11 roadrunner slapd[1249]: conn=721 op=1 UNBIND

ldapsearch -Lx -H ldapi://%2fvar%2frun%2fopenldap%2fldapi/ -Z
May  5 19:33:11 roadrunner slapd[1249]: conn=819 fd=57 ACCEPT from PATH=/var/run/openldap/ldapi (PATH=/var/run/openldap/ldapi)
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=0 EXT oid=1.3.6.1.4.1.1466.20037
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=0 STARTTLS
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=0 RESULT oid= err=0 text=
May  5 19:33:11 roadrunner slapd[1249]: conn=819 fd=57 TLS established tls_ssf=256 ssf=256
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=1 BIND dn="" method=128
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=1 RESULT tag=97 err=0 text=
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=2 SRCH base="dc=spoerlein,dc=net" scope=2 deref=0 filter="(objectClass=*)"
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=2 SEARCH RESULT tag=101 err=0 nentries=34 text=
May  5 19:33:11 roadrunner slapd[1249]: conn=819 op=3 UNBIND
May  5 19:33:11 roadrunner slapd[1249]: conn=819 fd=57 closed


So, to see the problem you have to use ldap:// and/or ldaps:// URLs.

and, if you're curious, you can then run while :;do id;done and see you
slapd version 2.4 segfault after 10-30 seconds.

Fun!

Cheers,
Ulrich Spoerlein
Comment 3 Xin LI freebsd_committer 2008-05-07 22:31:46 UTC
State Changed
From-To: open->feedback

Hi, 

Could you please test if 2.4.9 fixed this?  There are a bunch of medium and serious 
priority bugs fixed in this version.
Comment 4 Ulrich Spoerlein 2008-05-14 18:20:10 UTC
Thanks for the update. slapd seems to be running much more stable now.
No change in nss_ldap behaviour though:

Using ldaps:// everything is fine
Using ldapi:// + start_tls id(1) stops working
Using ldap:// + start_tls id(1) will only report primary group, not the
secondary groups

Tested on both 6.3 and 7.0

Cheers,
Ulrich Spoerlein
Comment 5 Xin LI freebsd_committer 2008-05-14 18:39:06 UTC
State Changed
From-To: feedback->open

Submitter indicates that the problem still exists despite other improvements.
Comment 6 Ulrich Spoerlein 2008-05-19 17:18:03 UTC
Oh boy,

now since I've switched to using ldapi:// URLs I rediscovered the
ldapi socket leak, that we were also experiencing with a Cyrus mailserver,
which got it's user/group information from an LDAP server via nss_ldap.

I can no longer bind to this socket, as the maximum has been reached:

May 19 18:08:12 acme slapd[57863]: daemon: 4096 beyond descriptor table size 4096

root@acme: ~# lsof -p 57863 | awk '$5 == "unix" {i+=1} END{print i}'
4071

Restarting slapd of course works around the problem, but this is just
another annoyance with nss_ldap. I think there's a re-written one in the
ports, which I'm going to try next.

root@acme: ~# netstat -an -funix | head -20
Active UNIX domain sockets
Address  Type   Recv-Q Send-Q    Inode     Conn     Refs  Nextref Addr
c964d7e0 stream      0      0        0 c9587e10        0        0 /var/run/openldap/ldapi
c9587e10 stream      0      0        0 c964d7e0        0        0
c9581090 stream      0      0        0 ca1825a0        0        0 /var/run/openldap/ldapi
ca1825a0 stream      0      0        0 c9581090        0        0
c9f78870 stream      0      0        0 ca182240        0        0
ca182240 stream      0      0        0 c9f78870        0        0
c96cf5a0 stream      0      0        0 ca233750        0        0 /var/run/openldap/ldapi
ca233750 stream      0      0        0 c96cf5a0        0        0
c96d2cf0 stream      0      0        0 ca189990        0        0 /var/run/openldap/ldapi
ca189990 stream      0      0        0 c96d2cf0        0        0
ca195750 stream      0      0        0 c9e376c0        0        0 /var/run/openldap/ldapi
c9e376c0 stream      0      0        0 ca195750        0        0
ca1953f0 stream      0      0        0 ca2337e0        0        0 /var/run/openldap/ldapi
ca2337e0 stream      0      0        0 ca1953f0        0        0
c55e1090 stream      0      0        0 c9f79e10        0        0 /var/run/openldap/ldapi
c9f79e10 stream      0      0        0 c55e1090        0        0
ca182900 stream      0      0        0 c741ebd0        0        0 /var/run/openldap/ldapi
c741ebd0 stream      0      0        0 ca182900        0        0

I'm CC'ing Robert Watson, as he might be interessted in the UNIX socket
issue. I will try reproducing it on RELENG_7.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
Comment 7 Maxim Dounin 2008-11-27 15:31:12 UTC
Hello!

I've recently digged into nss_ldap-related assertions we are 
observing periodically in our environment:

Assertion failed: (r != NULL), function ldap_parse_result, file error.c, line 272.
Abort trap: 6

They are indeed start_tls related.  The reason is insufficient 
error checking in ldap-nss.c's do_start_tls().  Patch is fairy 
trivial:

--- ldap-nss.c.backup	Thu Nov  6 23:59:33 2008
+++ ldap-nss.c	Fri Nov  7 00:05:20 2008
@@ -1387,7 +1387,7 @@
     }
 
   rc = ldap_result (session->ls_conn, msgid, 1, timeout, &res);
-  if (rc == -1)
+  if (rc <= 0)
     {
 #if defined(HAVE_LDAP_GET_OPTION) && defined(LDAP_OPT_ERROR_NUMBER)
       if (ldap_get_option (session->ls_conn, LDAP_OPT_ERROR_NUMBER, &rc) != LDAP_SUCCESS)

Anyway, this part of code seems to be rewritten in nss_ldap 1.264 
and update should resolve the issue too (see ports/129030).

Maxim Dounin
Comment 8 Martin Matuska freebsd_committer 2008-12-06 00:24:45 UTC
According to user reports this gets fixed in ports/129445 (was:
ports/129030).
Comment 9 dfilter service freebsd_committer 2008-12-10 16:11:34 UTC
mm          2008-12-10 16:11:25 UTC

  FreeBSD ports repository

  Modified files:
    net/nss_ldap         Makefile distinfo 
    net/nss_ldap/files   patch-ldap-pwd.c 
  Added files:
    net/nss_ldap/files   patch-Makefile.am patch-configure.in 
  Removed files:
    net/nss_ldap/files   patch-Makefile.in patch-configure 
  Log:
  - Update to 1.264 [1]
  - use more autotools [2]
  - fixes assertion problems related to openldap 2.4 [3]
  
  PR:     ports/129445 [1], ports/127675 [2], ports/122750 [3]
  Submitted by:   mm [1], "Eugene M. Kim" <gene@nttmcl.com> [2]
  Approved by:    maintainer (timeout ports/127675, ports/129030, ports/127675)
  
  Revision  Changes    Path
  1.26      +5 -3      ports/net/nss_ldap/Makefile
  1.15      +3 -3      ports/net/nss_ldap/distinfo
  1.1       +39 -0     ports/net/nss_ldap/files/patch-Makefile.am (new)
  1.8       +0 -82     ports/net/nss_ldap/files/patch-Makefile.in (dead)
  1.6       +0 -89     ports/net/nss_ldap/files/patch-configure (dead)
  1.1       +26 -0     ports/net/nss_ldap/files/patch-configure.in (new)
  1.3       +3 -3      ports/net/nss_ldap/files/patch-ldap-pwd.c
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 10 Martin Matuska freebsd_committer 2008-12-12 00:09:59 UTC
State Changed
From-To: open->closed

Fixed in ports/129445.