Summary: | [hast] regression, unable to start hast, pid XXX (hastd), uid 0: exited on signal 6 (core dumped) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | emz | ||||||
Component: | kern | Assignee: | Eugene Grosbein <eugen> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Some People | CC: | emaste, eugen, net, pjd | ||||||
Priority: | --- | Keywords: | regression | ||||||
Version: | 11.1-RELEASE | Flags: | eugen:
mfc-stable12+
eugen: mfc-stable11+ |
||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
emz
2018-04-12 10:09:33 UTC
Now hastd contunues to constantly dump cores even when I'm trying to launch it on a single node as a primary. Backtrace: # gdb /sbin/hastd /hastd.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Core was generated by `/sbin/hastd'. Program terminated with signal 6, Aborted. Reading symbols from /lib/libgeom.so.5...Reading symbols from /usr/lib/debug//lib/libgeom.so.5.debug...done. done. Loaded symbols for /lib/libgeom.so.5 Reading symbols from /lib/libutil.so.9...Reading symbols from /usr/lib/debug//lib/libutil.so.9.debug...done. done. Loaded symbols for /lib/libutil.so.9 Reading symbols from /lib/libcrypto.so.8...Reading symbols from /usr/lib/debug//lib/libcrypto.so.8.debug...done. done. Loaded symbols for /lib/libcrypto.so.8 Reading symbols from /lib/libthr.so.3...Reading symbols from /usr/lib/debug//lib/libthr.so.3.debug...done. done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...done. done. Loaded symbols for /lib/libc.so.7 Reading symbols from /lib/libbsdxml.so.4...Reading symbols from /usr/lib/debug//lib/libbsdxml.so.4.debug...done. done. Loaded symbols for /lib/libbsdxml.so.4 Reading symbols from /lib/libsbuf.so.6...Reading symbols from /usr/lib/debug//lib/libsbuf.so.6.debug...done. done. Loaded symbols for /lib/libsbuf.so.6 Reading symbols from /libexec/ld-elf.so.1...Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...done. done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x000000080155a84a in thr_kill () from /lib/libc.so.7 (gdb) bt #0 0x000000080155a84a in thr_kill () from /lib/libc.so.7 #1 0x000000080155a814 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 #2 0x000000080155a789 in abort () at /usr/src/lib/libc/stdlib/abort.c:65 #3 0x0000000000414579 in pjdlog_abort (func=0x420e3f "descriptors_assert", file=0x420aeb "/usr/src/sbin/hastd/hastd.c", line=303, failedexpr=0x0, fmt=<value optimized out>) at /usr/src/sbin/hastd/pjdlog.c:613 #4 0x0000000000408267 in descriptors_assert (res=0x80204b400, pjdlogmode=<value optimized out>) at /usr/src/sbin/hastd/hastd.c:303 #5 0x00000000004146eb in hastd_primary (res=0x80204b400) at /usr/src/sbin/hastd/primary.c:1030 #6 0x000000000040a55a in check_signals () at /usr/src/sbin/hastd/hastd.c:359 #7 0x0000000000408852 in main (argc=<value optimized out>, argv=<value optimized out>) at /usr/src/sbin/hastd/hastd.c:1138 #8 0x0000000000403b0f in _start () #9 0x000000080064f000 in ?? () #10 0x0000000000000000 in ?? () (gdb) Now I got the situation when the secondary node logs the mesages "Descriptor X is open (pipe or FIFO), but should be closed.", and I noticed that in the same time the primary node logs this: Apr 15 19:51:02 gw1 hastd[50310]: [hastb] (primary) Unable to receive handshake header from tcp4://192.168.0.248: Socket is not connected. Apr 15 19:51:10 gw1 hastd[50306]: [hasta] (primary) Unable to receive handshake header from tcp4://192.168.0.248: Socket is not connected. Apr 15 19:51:12 gw1 hastd[50310]: [hastb] (primary) Unable to receive handshake header from tcp4://192.168.0.248: Socket is not connected. The fd that the hast is complaining about is a pipe, for instance now it's blaming fd 7 to be open, and fd 7 is a pipe: PID COMM FD T V FLAGS REF OFFSET PRO NAME 22817 hastd text v r r------- - - - /sbin/hastd 22817 hastd cwd v d r------- - - - / 22817 hastd root v d r------- - - - / 22817 hastd 0 v c rw------ 3 0 - /dev/null 22817 hastd 1 v c rw------ 3 0 - /dev/null 22817 hastd 2 v c rw------ 3 0 - /dev/null 22817 hastd 3 v r -w---n-l 1 0 - /var/run/hastd.pid 22817 hastd 4 s - rw------ 1 0 UDS /var/run/hastctl 22817 hastd 5 s - rw------ 1 0 TCP 192.168.0.247:8457 0.0.0.0:0 22817 hastd 6 s - rw------ 1 0 UDD /var/run/logpriv 22817 hastd 7 p - rw------ 4 0 - - 22817 hastd 9 s - rw------ 1 0 UDS - 22817 hastd 13 s - rw------ 1 0 UDS - 22817 hastd 16 s - rw------ 1 0 UDS - Created attachment 192583 [details]
HAST nodebug patch
This patch removes several debug lines leading to SIGABRT.
This patch removes the debugging leading to SIGABRT, the author is ae@freebsd.org. ae@ is pessimistic about it, saying that it solves nothing, and just makes HAST to contunue by removing the abort() call in pjdlog_abort(). However, this got me a working HAST, first time without a bunch of warning/error/assert messages on both nodes. This probably need more attention. Created attachment 220060 [details]
proposed fix
hastd wrongly assumes that parent process (f.e. shell) starts it with no high file descriptors open, use closefrom() to fix
Dear submitter, is this problem still relevant to you? If so, please try attached one-line patch. It solves the problem for me. A commit references this bug: Author: eugen Date: Sun Nov 29 13:45:54 UTC 2020 New revision: 368148 URL: https://svnweb.freebsd.org/changeset/base/368148 Log: hastd(8) assumes it has no extra file descriptors opened and aborts otherwise, so call closefrom() early. PR: 227461 MFC after: 2 weeks Changes: head/sbin/hastd/hastd.c ^Triage: Assign to committer resolving, prior assignee CC'd A commit references this bug: Author: eugen Date: Fri Dec 18 12:20:30 UTC 2020 New revision: 368758 URL: https://svnweb.freebsd.org/changeset/base/368758 Log: MFC r368148: fix hastd(8) hastd(8) assumes it has no extra file descriptors opened and aborts otherwise, so call closefrom() early. PR: 227461 Changes: _U stable/12/ stable/12/sbin/hastd/hastd.c A commit references this bug: Author: eugen Date: Fri Dec 18 12:24:33 UTC 2020 New revision: 368759 URL: https://svnweb.freebsd.org/changeset/base/368759 Log: MFC r368148: fix hastd(8) hastd(8) assumes it has no extra file descriptors opened and aborts otherwise, so call closefrom() early. PR: 227461 Changes: _U stable/11/ stable/11/sbin/hastd/hastd.c Fixed and merged to stable branches. |