Based on some initial testing, a samba port built on a base system based from the git repo core dumps. A samba port built on a base system based of the svn repo does not.
The core dump demonstrates:
(lldb) thread backtrace all
* thread #1, name = 'smbd', stop reason = signal SIGILL
* frame #0: 0x00000008016c7ff6 libsmbconf.so.0`___lldb_unnamed_symbol100$$libsmbconf.so.0 + 118
frame #1: 0x0000000801bc087c libmessages-dgm-samba4.so`___lldb_unnamed_symbol32$$libmessages-dgm-samba4.so + 108
frame #2: 0x0000000801bbf4f7 libmessages-dgm-samba4.so`___lldb_unnamed_symbol18$$libmessages-dgm-samba4.so + 615
frame #3: 0x0000000802eb3e5c libtevent.so.0`tevent_common_invoke_fd_handler + 140
frame #4: 0x0000000802eb6cdd libtevent.so.0`___lldb_unnamed_symbol40$$libtevent.so.0 + 1901
frame #5: 0x0000000802eb3071 libtevent.so.0`_tevent_loop_once + 225
frame #6: 0x0000000802eb4fe1 libtevent.so.0`tevent_req_poll + 49
frame #7: 0x0000000001031dbe smbd`___lldb_unnamed_symbol25$$smbd + 622
frame #8: 0x0000000001030728 smbd`main + 2824
frame #9: 0x000000000102d0f2 smbd`_start + 226
I'm wondering that maybe the base system built from the git repo exports the system version wrong? Hence the SIGILL.
I jumped from r368387 (svn) to r368820+7d8ff3245227-c255291(main) (git)
As extra info, I tested multiple samba versions, 4.12, 4.13 they all have the same behavior.
I had same problem on my FreeBSD CURRENT main-c255394-gf20c0e33195.
And I have tried to recompile net/samba412 with following options in /etc/make.conf.
The rebuilt smbd server works for me.
It seems about a compiler problem.
Created attachment 221054 [details]
Default to llvm10 for the samba built. This does not result in core dumps of the daemon. llvm11 from ports is also fine (for now), as there have been patches in base which are not present in the port version. Not sure for how long. I was building llvm10 anyway for mesa-libs.
Thanks for the direction Yuichiro NAITO works fine now!
CC llvm maintainer from base. Maybe he knows the proper solution.
I've reproduced the SIGILL on 13.0-CURRENT main-c255407-g4f4111d2c5ab with samba413-4.13.1_1, and I'm doing some debugging. No clues yet. :)
What seems to happen is that messaging_recv_cb() has a variable length array (aka VLA) 'fds64', which is initialized with a zero count, and this is undefined behavior:
Program received signal SIGSEGV, Segmentation fault.
0x0000000801c784a7 in messaging_recv_cb (ev=0x805475060, msg=0x7fffffffdbe8 "\035#", msg_len=98, fds=0x7fffffffdbdc, num_fds=0, private_data=0x80546e300) at ../../source3/lib/messages.c:394
394 int64_t fds64[MIN(num_fds, INT8_MAX)];
(gdb) print num_fds
$6 = 0
Created attachment 221099 [details]
Fix zero-sized VLAs in messaging part of net/samba413
Here is a patch for net/samba413 which should fix the undefined behavior with zero-sized VLAs in lib/source3/messages*.c. I will also attach patches for samba411 and samba412.
Created attachment 221100 [details]
Fix zero-sized VLAs in messaging part of net/samba412
Created attachment 221101 [details]
Fix zero-sized VLAs in messaging part of net/samba411
My advice would be to upstream these patches to Samba. In fact, they should probably do a full sweep of their source for these possibly zero-sizes VLAs, and compile the whole of Samba with -fsanitize=undefined, then doing a full regression test.
(I tried adding -fsanitize=undefined to the CFLAGS of this port, but I could not get the waf build tools to correctly link the various dynamic libraries. So I will gladly leave that to the waf and/or samba experts. :)
@Dries, if you could please check whether one of the patches fixes the crashes for you?
I runtime tested the patch for samba413, no more core dumps. Thanks!
I am observing the same thing on my box recently upgraded to the sources of the git repo when i upgraded my samba:
A samba port built on a git based system repo doesn't work where the previously samba port (samba412) built on the svn repo of the system does.
i don't see any core dumps thought; but the "signal 4" message at start on /var/log/messages:
Dec 30 14:32:41 pcgyver kernel: pid 19312 (smbd), jid 0, uid 0: exited on signal 4
Good news: the patch provided for samba413 seems working for me too.
Reported upstream: https://bugzilla.samba.org/show_bug.cgi?id=14605
Merge request: https://gitlab.com/samba-team/samba/-/merge_requests/1743
The fix was accepted by upstream:
Given maintainer timeout I think its fine you commit it. (> 3 weeks)
A commit references this bug:
Date: Sat Jan 30 13:22:41 UTC 2021
New revision: 563405
net/samba411 net/samba412 net/samba413: Fix zero-sized VLAs
With recent versions of clang, samba could dump core shortly after
startup, terminating with either SIGILL or SIGSEGV.
Investigation showed that samba is using C99 variable length arrays
(VLAs), and in some cases the length of these arrays would become zero.
Since this is undefined behavior, various interesting things would
happen, often ending in segfaults.
Fix this by avoiding to use zero as the length for these VLA
A similar patch was also sent upstream, and was accepted and included in
subsequent samba releases.
See also: https://bugzilla.samba.org/show_bug.cgi?id=14605
Reported by: Dries Michiels <email@example.com>