Bug 204013 - net/ntp: latest 4.2.8p4 crashes in background mode
Summary: net/ntp: latest 4.2.8p4 crashes in background mode
Status: Closed DUPLICATE of bug 204048
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: Normal Affects Some People
Assignee: Cy Schubert
URL:
Keywords: crash, needs-qa, regression
Depends on:
Blocks:
 
Reported: 2015-10-25 12:11 UTC by Franco Fichtner
Modified: 2016-04-17 09:30 UTC (History)
2 users (show)

See Also:
koobs: maintainer-feedback+
koobs: merge-quarterly?


Attachments
patch for ntp in base (405 bytes, patch)
2015-10-30 04:28 UTC, Cy Schubert
no flags Details | Diff
patch for ports/net/ntp (908 bytes, patch)
2015-10-30 04:30 UTC, Cy Schubert
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Franco Fichtner 2015-10-25 12:11:58 UTC
Reverting to the previous version works... This is on FreeBSD 10.1 specifically. Core dump backtrace below.  If more info is needed just let me know.

# gdb -c ntpd.core -f /usr/local/sbin/ntpd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libmd.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libmd.so.6
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /usr/local/lib/libcrypto.so.35...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libcrypto.so.35
Reading symbols from /usr/local/lib/libintl.so.8...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libintl.so.8
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0000000801311641 in pthread_mutex_destroy () from /lib/libthr.so.3
[New Thread 802006c00 (LWP 101319/ntpd)]
[New Thread 802006400 (LWP 101172/ntpd)]
(gdb) bt
#0  0x0000000801311641 in pthread_mutex_destroy () from /lib/libthr.so.3
#1  0x0000000801659987 in flockfile () from /lib/libc.so.7
#2  0x000000080163d8ab in rewind () from /lib/libc.so.7
#3  0x0000000801612fed in getservbyname_r () from /lib/libc.so.7
#4  0x000000080162de7f in nsdispatch () from /lib/libc.so.7
#5  0x0000000801613de1 in getservbyname () from /lib/libc.so.7
#6  0x0000000801613ce9 in getservbyname () from /lib/libc.so.7
#7  0x0000000801610e93 in getaddrinfo () from /lib/libc.so.7
#8  0x000000080160ead1 in getaddrinfo () from /lib/libc.so.7
#9  0x0000000000464456 in ?? ()
#10 0x0000000000465340 in ?? ()
#11 0x0000000000467a19 in ?? ()
#12 0x000000080130b4f5 in pthread_create () from /lib/libthr.so.3
#13 0x0000000000000000 in ?? ()
(gdb)
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2015-10-25 12:35:32 UTC
@Franco, thank you for the report. Please attach large snippets of text for future issues/error logs, makes it much easier to follow the thread/context of conversation :)
Comment 2 Franco Fichtner 2015-10-25 12:37:59 UTC
Understood. :)
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2015-10-26 15:22:48 UTC
@Franco, can you test whether setting -L in your ntpd startup args addressed the behaviour?

CC'ing Dan (who mentioned it worked for him for the recent update of NTP in base to the same version)
Comment 4 Dan Langille freebsd_committer freebsd_triage 2015-10-26 15:27:59 UTC
In case it helps, after recent freebsd-update fetch install, ntpd would not stay running on one of my servers.

kernel: pid 73253 (ntpd), uid 0: exited on signal 11 (core dumped)

Adding -L to the flags lets it run:

ntpd_flags="-L -p /var/run/ntpd.pid -f /var/db/ntpd.drift"

NOTE: with this change, it runs, but if I issue a restart, it cores.
Comment 5 Dan Langille freebsd_committer freebsd_triage 2015-10-26 15:46:42 UTC
More often than not, ntpd fails to start.  Eventually it does.  Seems I was lucky the first time I tried -L.

Conclusion: the -L option is not related.
Comment 6 Franco Fichtner 2015-10-26 15:52:38 UTC
I tried `-L', I also flushed all virtual IPs and tried to start again, but no change.

I can't really debug this since `-n' doesn't crash. And then I noticed it would stay up when triggered using the shell. I'm currently using deamon(8) to push it to the background and it stays up as it should with no modifications.
Comment 7 Cy Schubert freebsd_committer freebsd_triage 2015-10-27 00:36:43 UTC
(In reply to Dan Langille from comment #5)
Are you using ntp in ports or base? CURRENT or STABLE?
Comment 8 Dan Langille freebsd_committer freebsd_triage 2015-10-27 00:37:50 UTC
(In reply to Cy Schubert from comment #7)
From base:

[dan@supernews:~] $ pkg info -x ntp
printproto-1.0.5
[dan@supernews:~] $
Comment 9 Dan Langille freebsd_committer freebsd_triage 2015-10-27 00:40:06 UTC
In case it helps:

Oct 26 15:46:11 supernews ntpd[81207]: ntpd 4.2.8p4-a (1): Starting
Oct 26 15:46:11 supernews kernel: pid 81208 (ntpd), uid 0: exited on signal 11 (core dumped)
Comment 10 Cy Schubert freebsd_committer freebsd_triage 2015-10-27 00:55:55 UTC
Could both of you be using libressl (/usr/local/lib/libcrypto.so.35) by any chance? No problem with OpenSSL (in ports or base). I'll try to reproduce on a testbed with libressl installed.
Comment 11 Dan Langille freebsd_committer freebsd_triage 2015-10-27 00:56:33 UTC
I am not.(In reply to Cy Schubert from comment #10)
Comment 12 Dan Langille freebsd_committer freebsd_triage 2015-10-27 00:57:25 UTC
FYI, this server has been continuously upgraded since 4.x I think.
Comment 13 Cy Schubert freebsd_committer freebsd_triage 2015-10-27 00:58:34 UTC
(In reply to Dan Langille from comment #11)
Can you post a backtrace?
Comment 14 Dan Langille freebsd_committer freebsd_triage 2015-10-27 01:01:52 UTC
(In reply to Cy Schubert from comment #13)
I'm looking for you on IRC
Comment 15 Dan Langille freebsd_committer freebsd_triage 2015-10-27 01:08:10 UTC
(In reply to Cy Schubert from comment #13)
I installed via freebsd-update... is my .core useless?
Comment 16 Cy Schubert freebsd_committer freebsd_triage 2015-10-27 02:59:03 UTC
Seeing the following on Dan's machine.

#0  0x0000000800b73b71 in pthread_mutex_destroy () from /lib/libthr.so.3
#1  0x00000008012c2557 in flockfile () from /lib/libc.so.7
#2  0x00000008012a60ab in rewind () from /lib/libc.so.7
#3  0x0000000801270cc0 in getservbyname_r () from /lib/libc.so.7
#4  0x0000000801293cff in nsdispatch () from /lib/libc.so.7
#5  0x0000000801272121 in getservbyname () from /lib/libc.so.7
#6  0x0000000801272029 in getservbyname () from /lib/libc.so.7
#7  0x000000080126e373 in getaddrinfo () from /lib/libc.so.7
#8  0x000000080126ba61 in getaddrinfo () from /lib/libc.so.7
#9  0x00000000004c676b in blocking_getaddrinfo (c=0x801c66700, req=0x801c46300)
    at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/ntp_intres.c:352
#10 0x00000000004c5b82 in blocking_child_common (c=0x801c66700)
    at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/ntp_worker.c:288
#11 0x00000000004c4d5d in blocking_thread (ThreadArg=0x801c66700)
    at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/work_thread.c:663
#12 0x0000000800b6d7c5 in pthread_create () from /lib/libthr.so.3
#13 0x0000000000000000 in ?? ()
(gdb) down 9
#0  0x0000000800b73b71 in pthread_mutex_destroy () from /lib/libthr.so.3
(gdb) up 9
#9  0x00000000004c676b in blocking_getaddrinfo (c=0x801c66700, req=0x801c46300)
    at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/ntp_intres.c:352
352		gai_resp->retcode = getaddrinfo(node, service, &gai_req->hints,
Current language:  auto; currently minimal
slippy$ 

Looks similar to http://bugs.ntp.org/show_bug.cgi?id=1851.
Comment 17 Franco Fichtner 2015-10-27 13:42:31 UTC
Yes, we build against LibreSSL 2.2.4 from ports. I can try OpenSSL tomorrow if needed.
Comment 18 Franco Fichtner 2015-10-28 10:03:21 UTC
Same behaviour when being built against OpenSSL 1.0.2d 9 Jul 2015 from ports: crashes when in background mode. Reverting to 4.2.8p3 brings it back as expected.
Comment 19 Cy Schubert freebsd_committer freebsd_triage 2015-10-29 03:19:11 UTC
Just to document another data point, this also affects ntpd in base on at one reported system.
Comment 20 Cy Schubert freebsd_committer freebsd_triage 2015-10-29 03:27:31 UTC
Information request:

1. Are both systems 10.2-RELEASE updated through freebsd-update?
2, Can both ntp.conf files be posted?
3. Can both kernel config files be posted?
4. ifconfig -a listing (to protect PI, you may block out real IPs)
5. netstat -nr listing (you may block out real IPs but keep them consistent with #4 above).

I cannot reproduce locally on i386 or amd64 on real hardware and virtualbox VMs, updated through buildworld and freebsd-update processes. We're looking for something unique on these systems that is tickling the bug.
Comment 21 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 04:26:12 UTC
I will attach two patches to this ticket. One for ntp in base and the other for the ntp port. If you're using base (dvl@), please apply the base patch and rebuild/reinstall all of ntp.

If you use the port, apply the patch to net/ntp and rebuild/reinstall the port.

Patches to follow.
Comment 22 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 04:28:25 UTC
Created attachment 162588 [details]
patch for ntp in base

For users of ntp in base only: apply this patch from the root of src/, e.g. patch -p0 < base-ntp_worker.c.diff.
Comment 23 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 04:30:39 UTC
Created attachment 162589 [details]
patch for ports/net/ntp

For users of ports/net/ntp only

cd /usr/ports/net/ntp && patch -p0 < ports-net-ntp.diff
make && make deinstall install clean
Comment 24 Kubilay Kocak freebsd_committer freebsd_triage 2015-10-30 05:55:54 UTC
@Cy thanks for the patches, where were they "Obtained from: " ?

References/links/comment in the patch header would be great for our future selves
Comment 25 Franco Fichtner 2015-10-30 08:10:27 UTC
Hi Cy,

The patch does not help.  I have also been unable to reproduce this in VirtualBox with the exact build.  It seems more or less hardware, driver or timing related.

What I'm doing now is using daemon(8) and --nofork, this combination doesn't crash. It's a work around that isn't very nice, but at least it works reliably now.

Would a core dump log with debug symbols help?  Shall we take this to the NTP bug tracker instead?


Cheers,
Franco
Comment 26 Kubilay Kocak freebsd_committer freebsd_triage 2015-10-30 08:16:38 UTC
@Franco, if we want to do that, it will be useful to reduce and isolate the test case as far away from ports as possible.

- Disable all options
- Use default ./configure options
- Isolate compiler flags/args overrides

A debug build that crashes (it might not) with traces should help too.
Comment 27 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 12:45:17 UTC
(In reply to Kubilay Kocak from comment #24)
The patch was my creation, expecting this to be a regression from 4.2.8p3. I do not experience the bug here therefore I cannot test if it fixes anything or not. All my NTP installations, using either src or ports have worked flawlessly since upgrading. Having said that, there is something unique about these sites that is tickling this bug.

Are you yourself also experiencing the bug at your site?
Comment 28 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 12:54:01 UTC
(In reply to Kubilay Kocak from comment #26)
Koobs@, this bug is not limited to ports/net/ntp. It is also in base latest freeebsd-update to 10.2-RELEASE, as described by dvl@. He updated his ntpd using freebsd-update so installed ports have not affected the build process. The problem is in either ntp, libc, or libthr.
Comment 29 Franco Fichtner 2015-10-30 12:55:07 UTC
Cy, we think we pinned this down, see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204048
Comment 30 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 13:04:27 UTC
(In reply to Franco Fichtner from comment #29)
Good catch.
Comment 31 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 19:41:53 UTC
Can you apply r287846 to your 10.2-RELEASE systems, then build/install a new kernel. An errata notice will be out documenting the bug.
Comment 32 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 19:43:19 UTC
Comment on attachment 162588 [details]
patch for ntp in base

Doesn't work.
Comment 33 Cy Schubert freebsd_committer freebsd_triage 2015-10-30 19:46:40 UTC
Comment on attachment 162588 [details]
patch for ntp in base

Doesn't work.
Comment 34 Franco Fichtner 2015-11-02 07:44:37 UTC
Cy, we're shipping OPNsense with the source tree fix on Wednesday. For me the issue is resolved, feel free to close this unless you want to keep this open until the errata hits the interwebs.

Cheers,
Franco
Comment 35 Cy Schubert freebsd_committer freebsd_triage 2015-11-03 20:43:25 UTC

*** This bug has been marked as a duplicate of bug 204048 ***