Reverting to the previous version works... This is on FreeBSD 10.1 specifically. Core dump backtrace below. If more info is needed just let me know. # gdb -c ntpd.core -f /usr/local/sbin/ntpd GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Core was generated by `ntpd'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libmd.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libmd.so.6 Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /usr/local/lib/libcrypto.so.35...(no debugging symbols found)...done. Loaded symbols for /usr/local/lib/libcrypto.so.35 Reading symbols from /usr/local/lib/libintl.so.8...(no debugging symbols found)...done. Loaded symbols for /usr/local/lib/libintl.so.8 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0000000801311641 in pthread_mutex_destroy () from /lib/libthr.so.3 [New Thread 802006c00 (LWP 101319/ntpd)] [New Thread 802006400 (LWP 101172/ntpd)] (gdb) bt #0 0x0000000801311641 in pthread_mutex_destroy () from /lib/libthr.so.3 #1 0x0000000801659987 in flockfile () from /lib/libc.so.7 #2 0x000000080163d8ab in rewind () from /lib/libc.so.7 #3 0x0000000801612fed in getservbyname_r () from /lib/libc.so.7 #4 0x000000080162de7f in nsdispatch () from /lib/libc.so.7 #5 0x0000000801613de1 in getservbyname () from /lib/libc.so.7 #6 0x0000000801613ce9 in getservbyname () from /lib/libc.so.7 #7 0x0000000801610e93 in getaddrinfo () from /lib/libc.so.7 #8 0x000000080160ead1 in getaddrinfo () from /lib/libc.so.7 #9 0x0000000000464456 in ?? () #10 0x0000000000465340 in ?? () #11 0x0000000000467a19 in ?? () #12 0x000000080130b4f5 in pthread_create () from /lib/libthr.so.3 #13 0x0000000000000000 in ?? () (gdb)
@Franco, thank you for the report. Please attach large snippets of text for future issues/error logs, makes it much easier to follow the thread/context of conversation :)
Understood. :)
@Franco, can you test whether setting -L in your ntpd startup args addressed the behaviour? CC'ing Dan (who mentioned it worked for him for the recent update of NTP in base to the same version)
In case it helps, after recent freebsd-update fetch install, ntpd would not stay running on one of my servers. kernel: pid 73253 (ntpd), uid 0: exited on signal 11 (core dumped) Adding -L to the flags lets it run: ntpd_flags="-L -p /var/run/ntpd.pid -f /var/db/ntpd.drift" NOTE: with this change, it runs, but if I issue a restart, it cores.
More often than not, ntpd fails to start. Eventually it does. Seems I was lucky the first time I tried -L. Conclusion: the -L option is not related.
I tried `-L', I also flushed all virtual IPs and tried to start again, but no change. I can't really debug this since `-n' doesn't crash. And then I noticed it would stay up when triggered using the shell. I'm currently using deamon(8) to push it to the background and it stays up as it should with no modifications.
(In reply to Dan Langille from comment #5) Are you using ntp in ports or base? CURRENT or STABLE?
(In reply to Cy Schubert from comment #7) From base: [dan@supernews:~] $ pkg info -x ntp printproto-1.0.5 [dan@supernews:~] $
In case it helps: Oct 26 15:46:11 supernews ntpd[81207]: ntpd 4.2.8p4-a (1): Starting Oct 26 15:46:11 supernews kernel: pid 81208 (ntpd), uid 0: exited on signal 11 (core dumped)
Could both of you be using libressl (/usr/local/lib/libcrypto.so.35) by any chance? No problem with OpenSSL (in ports or base). I'll try to reproduce on a testbed with libressl installed.
I am not.(In reply to Cy Schubert from comment #10)
FYI, this server has been continuously upgraded since 4.x I think.
(In reply to Dan Langille from comment #11) Can you post a backtrace?
(In reply to Cy Schubert from comment #13) I'm looking for you on IRC
(In reply to Cy Schubert from comment #13) I installed via freebsd-update... is my .core useless?
Seeing the following on Dan's machine. #0 0x0000000800b73b71 in pthread_mutex_destroy () from /lib/libthr.so.3 #1 0x00000008012c2557 in flockfile () from /lib/libc.so.7 #2 0x00000008012a60ab in rewind () from /lib/libc.so.7 #3 0x0000000801270cc0 in getservbyname_r () from /lib/libc.so.7 #4 0x0000000801293cff in nsdispatch () from /lib/libc.so.7 #5 0x0000000801272121 in getservbyname () from /lib/libc.so.7 #6 0x0000000801272029 in getservbyname () from /lib/libc.so.7 #7 0x000000080126e373 in getaddrinfo () from /lib/libc.so.7 #8 0x000000080126ba61 in getaddrinfo () from /lib/libc.so.7 #9 0x00000000004c676b in blocking_getaddrinfo (c=0x801c66700, req=0x801c46300) at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/ntp_intres.c:352 #10 0x00000000004c5b82 in blocking_child_common (c=0x801c66700) at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/ntp_worker.c:288 #11 0x00000000004c4d5d in blocking_thread (ThreadArg=0x801c66700) at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/work_thread.c:663 #12 0x0000000800b6d7c5 in pthread_create () from /lib/libthr.so.3 #13 0x0000000000000000 in ?? () (gdb) down 9 #0 0x0000000800b73b71 in pthread_mutex_destroy () from /lib/libthr.so.3 (gdb) up 9 #9 0x00000000004c676b in blocking_getaddrinfo (c=0x801c66700, req=0x801c46300) at /usr/home/cy/svn-stable10/usr.sbin/ntp/libntp/../../../contrib/ntp/libntp/ntp_intres.c:352 352 gai_resp->retcode = getaddrinfo(node, service, &gai_req->hints, Current language: auto; currently minimal slippy$ Looks similar to http://bugs.ntp.org/show_bug.cgi?id=1851.
Yes, we build against LibreSSL 2.2.4 from ports. I can try OpenSSL tomorrow if needed.
Same behaviour when being built against OpenSSL 1.0.2d 9 Jul 2015 from ports: crashes when in background mode. Reverting to 4.2.8p3 brings it back as expected.
Just to document another data point, this also affects ntpd in base on at one reported system.
Information request: 1. Are both systems 10.2-RELEASE updated through freebsd-update? 2, Can both ntp.conf files be posted? 3. Can both kernel config files be posted? 4. ifconfig -a listing (to protect PI, you may block out real IPs) 5. netstat -nr listing (you may block out real IPs but keep them consistent with #4 above). I cannot reproduce locally on i386 or amd64 on real hardware and virtualbox VMs, updated through buildworld and freebsd-update processes. We're looking for something unique on these systems that is tickling the bug.
I will attach two patches to this ticket. One for ntp in base and the other for the ntp port. If you're using base (dvl@), please apply the base patch and rebuild/reinstall all of ntp. If you use the port, apply the patch to net/ntp and rebuild/reinstall the port. Patches to follow.
Created attachment 162588 [details] patch for ntp in base For users of ntp in base only: apply this patch from the root of src/, e.g. patch -p0 < base-ntp_worker.c.diff.
Created attachment 162589 [details] patch for ports/net/ntp For users of ports/net/ntp only cd /usr/ports/net/ntp && patch -p0 < ports-net-ntp.diff make && make deinstall install clean
@Cy thanks for the patches, where were they "Obtained from: " ? References/links/comment in the patch header would be great for our future selves
Hi Cy, The patch does not help. I have also been unable to reproduce this in VirtualBox with the exact build. It seems more or less hardware, driver or timing related. What I'm doing now is using daemon(8) and --nofork, this combination doesn't crash. It's a work around that isn't very nice, but at least it works reliably now. Would a core dump log with debug symbols help? Shall we take this to the NTP bug tracker instead? Cheers, Franco
@Franco, if we want to do that, it will be useful to reduce and isolate the test case as far away from ports as possible. - Disable all options - Use default ./configure options - Isolate compiler flags/args overrides A debug build that crashes (it might not) with traces should help too.
(In reply to Kubilay Kocak from comment #24) The patch was my creation, expecting this to be a regression from 4.2.8p3. I do not experience the bug here therefore I cannot test if it fixes anything or not. All my NTP installations, using either src or ports have worked flawlessly since upgrading. Having said that, there is something unique about these sites that is tickling this bug. Are you yourself also experiencing the bug at your site?
(In reply to Kubilay Kocak from comment #26) Koobs@, this bug is not limited to ports/net/ntp. It is also in base latest freeebsd-update to 10.2-RELEASE, as described by dvl@. He updated his ntpd using freebsd-update so installed ports have not affected the build process. The problem is in either ntp, libc, or libthr.
Cy, we think we pinned this down, see: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204048
(In reply to Franco Fichtner from comment #29) Good catch.
Can you apply r287846 to your 10.2-RELEASE systems, then build/install a new kernel. An errata notice will be out documenting the bug.
Comment on attachment 162588 [details] patch for ntp in base Doesn't work.
Cy, we're shipping OPNsense with the source tree fix on Wednesday. For me the issue is resolved, feel free to close this unless you want to keep this open until the errata hits the interwebs. Cheers, Franco
*** This bug has been marked as a duplicate of bug 204048 ***