Bug 122172 - [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6
Summary: [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, fine on amd6
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 6.3-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-28 00:00 UTC by Lee Damon
Modified: 2021-12-21 22:38 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lee Damon 2008-03-28 00:00:03 UTC
 amd(8) is launched on boot (or later) and runs briefly then aborts.
 If it is launched on boot then it never gets past reclaiming all
 the children it starts to help it boot up.  One of the children
 (or the parent in some cases) aborts with a SIG 11.

 The attached gdb & truss output were obtained by starting amd
 manually after boot. It gets past the part where the children
 finish setup but eventually dies. Sometimes it is SIG 10, sometimes
 SIG 11.

 I have a truss output and amd log file available but gnats thought
 they were too big to include in the pr email. The core file and
 amd binary are available for examination if needed.

 The amd.conf and map files are the same on all 10 systems.

Fix: none known.




Script started on Thu Mar 27 15:34:48 2008
goose# gdb -c amd.core amd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
Core was generated by `amd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/X11R6/lib/nss_ldap.so.1...done.
Loaded symbols for /usr/X11R6/lib/nss_ldap.so.1
Reading symbols from /usr/local/lib/libldap-2.3.so.2...done.
Loaded symbols for /usr/local/lib/libldap-2.3.so.2
Reading symbols from /usr/local/lib/liblber-2.3.so.2...done.
Loaded symbols for /usr/local/lib/liblber-2.3.so.2
Reading symbols from /usr/local/lib/libgssapi_krb5.so...done.
Loaded symbols for /usr/local/lib/libgssapi_krb5.so
Reading symbols from /usr/local/lib/libssl.so.5...done.
Loaded symbols for /usr/local/lib/libssl.so.5
Reading symbols from /usr/local/lib/libcrypto.so.5...done.
Loaded symbols for /usr/local/lib/libcrypto.so.5
Reading symbols from /usr/local/lib/libkrb5.so...done.
Loaded symbols for /usr/local/lib/libkrb5.so
Reading symbols from /usr/local/lib/libk5crypto.so...done.
Loaded symbols for /usr/local/lib/libk5crypto.so
Reading symbols from /usr/local/lib/libcom_err.so...done.
Loaded symbols for /usr/local/lib/libcom_err.so
Reading symbols from /usr/local/lib/libkrb5support.so...done.
Loaded symbols for /usr/local/lib/libkrb5support.so
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
307	    if (fp->fh_fs == fs || fs == NULL) {
(gdb) frame 0
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
307	    if (fp->fh_fs == fs || fs == NULL) {
(gdb) list
302	flush_nfs_fhandle_cache(fserver *fs)
303	{
304	  fh_cache *fp;
305	
306	  ITER(fp, fh_cache, &fh_head) {
307	    if (fp->fh_fs == fs || fs == NULL) {
308	      /*
309	       * Only invalidate port info for non-WebNFS servers
310	       */
311	      if (!(fp->fh_fs->fs_flags & FSF_WEBNFS))
(gdb) info frame
Stack level 0, frame at 0xbfbfe450:
 eip = 0x805d8fa in flush_nfs_fhandle_cache
    (/usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307); 
    saved eip 0x805272d
 called by frame at 0xbfbfe470
 source language c.
 Arglist at 0xbfbfe448, args: fs=0x0
 Locals at 0xbfbfe448, Previous frame's sp is 0xbfbfe450
 Saved registers:
  ebp at 0xbfbfe448, eip at 0xbfbfe44c
(gdb) info args
fs = (fserver *) 0x0
(gdb) info locals
fp = (fh_cache *) 0x8
(gdb) print fp
$1 = (fh_cache *) 0x8
(gdb) 

Script done on Thu Mar 27 15:35:19 2008
--- gdb1.out ends here -----c2bTHaJwzL6Dum3AQ72up72mdCCoy3VCPlhVPmmTAOnuudwk
Content-Type: text/plain; name="gdb.out"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="gdb.out"

Script started on Thu Mar 27 14:42:26 2008
goose# gdb -c amd.core amd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...
Core was generated by `amd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/X11R6/lib/nss_ldap.so.1...done.
Loaded symbols for /usr/X11R6/lib/nss_ldap.so.1
Reading symbols from /usr/local/lib/libldap-2.3.so.2...done.
Loaded symbols for /usr/local/lib/libldap-2.3.so.2
Reading symbols from /usr/local/lib/liblber-2.3.so.2...done.
Loaded symbols for /usr/local/lib/liblber-2.3.so.2
Reading symbols from /usr/local/lib/libgssapi_krb5.so...done.
Loaded symbols for /usr/local/lib/libgssapi_krb5.so
Reading symbols from /usr/local/lib/libssl.so.5...done.
Loaded symbols for /usr/local/lib/libssl.so.5
Reading symbols from /usr/local/lib/libcrypto.so.5...done.
Loaded symbols for /usr/local/lib/libcrypto.so.5
Reading symbols from /usr/local/lib/libkrb5.so...done.
Loaded symbols for /usr/local/lib/libkrb5.so
Reading symbols from /usr/local/lib/libk5crypto.so...done.
Loaded symbols for /usr/local/lib/libk5crypto.so
Reading symbols from /usr/local/lib/libcom_err.so...done.
Loaded symbols for /usr/local/lib/libcom_err.so
Reading symbols from /usr/local/lib/libkrb5support.so...done.
Loaded symbols for /usr/local/lib/libkrb5support.so
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
307	    if (fp->fh_fs == fs || fs == NULL) {
(gdb) bt
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
#1  0x0805272d in amqproc_setopt_1_svc (argp=0xbfbfe4a0, rqstp=0xbfbfe9c0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_subr.c:157
#2  0x0805337b in amq_program_1 (rqstp=0xbfbfe9c0, transp=0x80b9080)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_svc.c:215
#3  0x28112673 in svc_getreq_common () from /lib/libc.so.6
#4  0x281126e8 in svc_getreqset () from /lib/libc.so.6
#5  0x0805c2a5 in run_rpc ()
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:294
#6  0x0805c505 in mount_automounter (ppid=2487)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:448
#7  0x0804deaa in main (argc=5, argv=0xbfbfecd0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amd.c:564
(gdb) where
#0  0x0805d8fa in flush_nfs_fhandle_cache (fs=0x0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/ops_nfs.c:307
#1  0x0805272d in amqproc_setopt_1_svc (argp=0xbfbfe4a0, rqstp=0xbfbfe9c0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_subr.c:157
#2  0x0805337b in amq_program_1 (rqstp=0xbfbfe9c0, transp=0x80b9080)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amq_svc.c:215
#3  0x28112673 in svc_getreq_common () from /lib/libc.so.6
#4  0x281126e8 in svc_getreqset () from /lib/libc.so.6
#5  0x0805c2a5 in run_rpc ()
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:294
#6  0x0805c505 in mount_automounter (ppid=2487)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/nfs_start.c:448
#7  0x0804deaa in main (argc=5, argv=0xbfbfecd0)
    at /usr/src/usr.sbin/amd/amd/../../../contrib/amd/amd/amd.c:564
(gdb) goose# exit
exit

Script done on Thu Mar 27 14:42:38 2008
How-To-Repeat:  configure and launch amd on a i386 6.3-STABLE system.
Comment 1 Remko Lodder freebsd_committer freebsd_triage 2008-04-05 09:11:45 UTC
Responsible Changed
From-To: freebsd-i386->freebsd-fs

The backtraces show that amd(8) has a problem, reassign to the 
fs team to investigate this.
Comment 2 John E. Hein 2008-04-08 04:16:37 UTC
This doesn't help your problem directly, but we've been using amd with
NIS maps and 6.3/i386 without any problems.  What's your configuration?

You might have to debug a little further to find out how fp gets set
to NULL.

You could also try the newer version of am-utils in ports just
to see if it behaves differently.

Have you tried searching back from your cvsup date to see when
it stops seg faulting for you?
Comment 3 John E. Hein 2008-04-08 18:52:18 UTC
Lee Damon wrote at 09:40 -0700 on Apr  8, 2008:
 > John Hein wrote:
 > > This doesn't help your problem directly, but we've been using amd with
 > > NIS maps and 6.3/i386 without any problems.  What's your configuration?
 > 
 > The maps are flat files but we use LDAP.
 > 
 > > You could also try the newer version of am-utils in ports just
 > > to see if it behaves differently.
 > 
 > thanks for the hints.  Sadly the version in the ports tree tied the same 
 > horrible death.

You should put that information in the PR (CC restored).


 > > Have you tried searching back from your cvsup date to see when
 > > it stops seg faulting for you?
 > 
 > These are production machines, I can't take them down for the time it 
 > would take to do that :(

Unfortunately, all I have are debugging suggestions...

 - Bring up a non-production machine to play with.

 - Bring up a virtual machine or jail to play with.

 - Start with a bare bones amd config (e.g., without anything
   but the default maps & .conf files).  If there's no core
   dump, then add back parts of your config until it dies.

 - Compile amd with debug on and turn up the debug level to
   see if you get any hints.

 - Trace deeper into the code to find the source of the null ptr.

 - Try asking on the am-utils mailing list.
Comment 4 nomad 2008-04-08 18:59:27 UTC
 > You could also try the newer version of am-utils in ports just
 > to see if it behaves differently.

Just tried, same failure (exited with signal 10).  Corefile & binary are 
available if you want them but the port compile defaulted to no 
debugging and I forgot to turn it on so there's not a lot of information 
there.  Since these are both production machines and amd crashing 
requires the host to reboot I can't easily test again.

nomad
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:45:11 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.