Bug 252157

Summary: net/samba411 net/samba412 net/samba413: samba core dumps
Product: Ports & Packages Reporter: Dries Michiels <driesm>
Component: Individual Port(s)Assignee: Timur I. Bakeyev <timur>
Status: Closed FIXED    
Severity: Affects Only Me CC: contact, dim, driesm, martin.mato, naito.yuichiro
Priority: --- Flags: bugzilla: maintainer-feedback? (timur)
dim: merge-quarterly+
Version: Latest   
Hardware: Any   
OS: Any   
See Also: https://bugzilla.samba.org/show_bug.cgi?id=14605
Attachments:
Description Flags
samba.diff
none
Fix zero-sized VLAs in messaging part of net/samba413
none
Fix zero-sized VLAs in messaging part of net/samba412
none
Fix zero-sized VLAs in messaging part of net/samba411 none

Description Dries Michiels freebsd_committer freebsd_triage 2020-12-26 11:04:32 UTC
Based on some initial testing, a samba port built on a base system based from the git repo core dumps. A samba port built on a base system based of the svn repo does not.

The core dump demonstrates:
(lldb) thread backtrace all
* thread #1, name = 'smbd', stop reason = signal SIGILL
  * frame #0: 0x00000008016c7ff6 libsmbconf.so.0`___lldb_unnamed_symbol100$$libsmbconf.so.0 + 118
    frame #1: 0x0000000801bc087c libmessages-dgm-samba4.so`___lldb_unnamed_symbol32$$libmessages-dgm-samba4.so + 108
    frame #2: 0x0000000801bbf4f7 libmessages-dgm-samba4.so`___lldb_unnamed_symbol18$$libmessages-dgm-samba4.so + 615
    frame #3: 0x0000000802eb3e5c libtevent.so.0`tevent_common_invoke_fd_handler + 140
    frame #4: 0x0000000802eb6cdd libtevent.so.0`___lldb_unnamed_symbol40$$libtevent.so.0 + 1901
    frame #5: 0x0000000802eb3071 libtevent.so.0`_tevent_loop_once + 225
    frame #6: 0x0000000802eb4fe1 libtevent.so.0`tevent_req_poll + 49
    frame #7: 0x0000000001031dbe smbd`___lldb_unnamed_symbol25$$smbd + 622
    frame #8: 0x0000000001030728 smbd`main + 2824
    frame #9: 0x000000000102d0f2 smbd`_start + 226

I'm wondering that maybe the base system built from the git repo exports the system version wrong? Hence the SIGILL.

I jumped from r368387 (svn) to r368820+7d8ff3245227-c255291(main) (git)
Comment 1 Dries Michiels freebsd_committer freebsd_triage 2020-12-26 11:06:52 UTC
As extra info, I tested multiple samba versions, 4.12, 4.13 they all have the same behavior.
Comment 2 Yuichiro NAITO 2020-12-28 11:43:51 UTC
I had same problem on my FreeBSD CURRENT main-c255394-gf20c0e33195.
And I have tried to recompile net/samba412 with following options in /etc/make.conf.

```
CC=clang11
CXX=clang++11
CPP=clang-cpp11
```

The rebuilt smbd server works for me.
It seems about a compiler problem.
Comment 3 Dries Michiels freebsd_committer freebsd_triage 2020-12-28 15:38:55 UTC
Created attachment 221054 [details]
samba.diff

Default to llvm10 for the samba built. This does not result in core dumps of the daemon. llvm11 from ports is also fine (for now), as there have been patches in base which are not present in the port version. Not sure for how long. I was building llvm10 anyway for mesa-libs.
Comment 4 Dries Michiels freebsd_committer freebsd_triage 2020-12-28 15:39:55 UTC
Thanks for the direction Yuichiro NAITO works fine now!
Comment 5 Dries Michiels freebsd_committer freebsd_triage 2020-12-28 16:51:37 UTC
CC llvm maintainer from base. Maybe he knows the proper solution.
Comment 6 Dimitry Andric freebsd_committer freebsd_triage 2020-12-30 11:09:18 UTC
I've reproduced the SIGILL on 13.0-CURRENT main-c255407-g4f4111d2c5ab with samba413-4.13.1_1, and I'm doing some debugging. No clues yet. :)
Comment 7 Dimitry Andric freebsd_committer freebsd_triage 2020-12-30 11:56:55 UTC
What seems to happen is that messaging_recv_cb() has a variable length array (aka VLA) 'fds64[]', which is initialized with a zero count, and this is undefined behavior:

Program received signal SIGSEGV, Segmentation fault.
0x0000000801c784a7 in messaging_recv_cb (ev=0x805475060, msg=0x7fffffffdbe8 "\035#", msg_len=98, fds=0x7fffffffdbdc, num_fds=0, private_data=0x80546e300) at ../../source3/lib/messages.c:394
394             int64_t fds64[MIN(num_fds, INT8_MAX)];
(gdb) print num_fds
$6 = 0

Digging deeper.
Comment 8 Dimitry Andric freebsd_committer freebsd_triage 2020-12-30 13:10:44 UTC
Created attachment 221099 [details]
Fix zero-sized VLAs in messaging part of net/samba413

Here is a patch for net/samba413 which should fix the undefined behavior with zero-sized VLAs in lib/source3/messages*.c. I will also attach patches for samba411 and samba412.
Comment 9 Dimitry Andric freebsd_committer freebsd_triage 2020-12-30 13:11:13 UTC
Created attachment 221100 [details]
Fix zero-sized VLAs in messaging part of net/samba412
Comment 10 Dimitry Andric freebsd_committer freebsd_triage 2020-12-30 13:11:32 UTC
Created attachment 221101 [details]
Fix zero-sized VLAs in messaging part of net/samba411
Comment 11 Dimitry Andric freebsd_committer freebsd_triage 2020-12-30 13:15:36 UTC
My advice would be to upstream these patches to Samba. In fact, they should probably do a full sweep of their source for these possibly zero-sizes VLAs, and compile the whole of Samba with -fsanitize=undefined, then doing a full regression test.

(I tried adding -fsanitize=undefined to the CFLAGS of this port, but I could not get the waf build tools to correctly link the various dynamic libraries. So I will gladly leave that to the waf and/or samba experts. :)

@Dries, if you could please check whether one of the patches fixes the crashes for you?
Comment 12 Dries Michiels freebsd_committer freebsd_triage 2020-12-30 13:32:12 UTC
I runtime tested the patch for samba413, no more core dumps. Thanks!
Comment 13 Martin MATO 2020-12-30 13:58:41 UTC
I am observing the same thing on my box recently upgraded to the sources of the git repo when i upgraded my samba: 
A samba port built on a git based system repo doesn't work  where the previously samba port (samba412) built on the svn repo of the system does.

i don't see any core dumps thought; but the "signal 4" message at start on /var/log/messages:

Dec 30 14:32:41 pcgyver kernel: pid 19312 (smbd), jid 0, uid 0: exited on signal 4


Good news: the patch provided for samba413 seems working for me too.

Thanks
Comment 14 Dimitry Andric freebsd_committer freebsd_triage 2021-01-01 21:18:15 UTC
Reported upstream: https://bugzilla.samba.org/show_bug.cgi?id=14605
Merge request: https://gitlab.com/samba-team/samba/-/merge_requests/1743
Comment 15 Dimitry Andric freebsd_committer freebsd_triage 2021-01-04 14:02:24 UTC
The fix was accepted by upstream:

https://git.samba.org/samba.git/?p=samba.git;a=commitdiff;h=3e96c95d41e4ccd0bf43b3ee78af644e2bc32e30
Comment 16 Dimitry Andric freebsd_committer freebsd_triage 2021-01-10 18:55:31 UTC
Ping? :)
Comment 17 Dries Michiels freebsd_committer freebsd_triage 2021-01-23 19:32:24 UTC
Given maintainer timeout I think its fine you commit it. (> 3 weeks)
Comment 18 commit-hook freebsd_committer freebsd_triage 2021-01-30 13:23:00 UTC
A commit references this bug:

Author: dim
Date: Sat Jan 30 13:22:41 UTC 2021
New revision: 563405
URL: https://svnweb.freebsd.org/changeset/ports/563405

Log:
  net/samba411 net/samba412 net/samba413: Fix zero-sized VLAs

  With recent versions of clang, samba could dump core shortly after
  startup, terminating with either SIGILL or SIGSEGV.

  Investigation showed that samba is using C99 variable length arrays
  (VLAs), and in some cases the length of these arrays would become zero.
  Since this is undefined behavior, various interesting things would
  happen, often ending in segfaults.

  Fix this by avoiding to use zero as the length for these VLA
  declarations.

  A similar patch was also sent upstream, and was accepted and included in
  subsequent samba releases.

  See also: https://bugzilla.samba.org/show_bug.cgi?id=14605

  Reported by:	Dries Michiels <driesm.michiels@gmail.com>
  PR:		252157
  MFH:		2021Q1

Changes:
  head/net/samba411/Makefile
  head/net/samba411/files/patch-source3_lib_messages.c
  head/net/samba412/Makefile
  head/net/samba412/files/patch-source3_lib_messages.c
  head/net/samba413/Makefile
  head/net/samba413/files/patch-source3_lib_messages.c