Bug 275597 - Samba: smbd sometimes aborts by PANIC when 'vfs objects = cap'
Summary: Samba: smbd sometimes aborts by PANIC when 'vfs objects = cap'
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Timur I. Bakeyev
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-12-07 10:55 UTC by uratan
Modified: 2024-02-11 07:21 UTC (History)
5 users (show)

See Also:


Attachments
dmesg output of evaluating 14.0R (7.76 KB, text/plain)
2023-12-07 10:55 UTC, uratan
no flags Details
actual full path names of error dirs/files (2.12 KB, text/plain)
2023-12-07 10:57 UTC, uratan
no flags Details
a script to make a set of test files that is used by the debugging (121.51 KB, text/plain)
2023-12-18 01:45 UTC, uratan
no flags Details
direntcpy.c test case (597 bytes, text/plain)
2023-12-29 18:41 UTC, martin
no flags Details
readdir.c - readdir() test code (579 bytes, text/plain)
2023-12-31 12:46 UTC, uratan
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description uratan 2023-12-07 10:55:41 UTC
Created attachment 246852 [details]
dmesg output of evaluating 14.0R

I started evaluating FreeBSD 14.0R(i386) for use as my home server.
And I have noticed that Samba: smbd often aborts abnormally like below.
  +----------------------------
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.286037,  0] ../../lib/util/fault.c:172(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   ===============================================================
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307281,  0] ../../lib/util/fault.c:176(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   INTERNAL ERROR: Signal 11: Segmentation fault in pid 10655 (4.16.11)
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307391,  0] ../../lib/util/fault.c:181(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307505,  0] ../../lib/util/fault.c:182(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   ===============================================================
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307601,  0] ../../lib/util/fault.c:184(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   PANIC (pid 10655): Signal 11: Segmentation fault in 4.16.11
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.370441,  0] ../../lib/util/fault.c:245(log_stack_trace)
  |Dec  4 13:06:58 oxygen smbd[10655]:   BACKTRACE:
  |Dec  4 13:06:58 oxygen smbd[10655]:    #0 log_stack_trace + 0x43 [ip=0x203e84d3] [sp=0xffbfd35c]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #1 smb_panic_log + 0x6c [ip=0x203e835c] [sp=0xffbfd96c]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #2 smb_panic + 0x1a [ip=0x203e86da] [sp=0xffbfd980]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #3 fault_setup + 0xae [ip=0x203e82ee] [sp=0xffbfd994]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #4 _pthread_sigmask + 0x5c9 [ip=0x21162db9] [sp=0xffbfda2c]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #5 _pthread_setschedparam + 0x969 [ip=0x21162289] [sp=0xffbfdd40]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #6 <unknown symbol> [ip=0xffbff004] [sp=0xffbfdd80]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #7 vfs_readdirname + 0x56 [ip=0x2019ea46] [sp=0xffbfe15c]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #8 ReadDirName + 0x111 [ip=0x201396f1] [sp=0xffbfe18c]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #9 smbd_dirptr_get_entry + 0x161 [ip=0x20137f81] [sp=0xffbfe1c8]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #10 smbd_dirptr_lanman2_entry + 0x151 [ip=0x2016f231] [sp=0xffbfe350]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #11 smbd_smb2_request_process_query_directory + 0xd9b [ip=0x201dbadb] [sp=0xffbfe480]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #12 smbd_smb2_request_process_query_directory + 0x884 [ip=0x201db5c4] [sp=0xffbfe578]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #13 smbd_smb2_request_dispatch + 0xce4 [ip=0x201c4834] [sp=0xffbfe604]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #14 smbd_smb2_process_negprot + 0x244a [ip=0x201c9cba] [sp=0xffbfe678]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #15 tevent_common_invoke_fd_handler + 0x8b [ip=0x2056b64b] [sp=0xffbfe6cc]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #16 tevent_context_same_loop + 0xdd2 [ip=0x2056e612] [sp=0xffbfe6f4]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #17 _tevent_loop_once + 0xcf [ip=0x2056a5af] [sp=0xffbfe740]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #18 tevent_common_loop_wait + 0x2f [ip=0x2056a7bf] [sp=0xffbfe768]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #19 _tevent_loop_wait + 0x1c [ip=0x2056a84c] [sp=0xffbfe784]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #20 smbd_process + 0x795 [ip=0x201b2205] [sp=0xffbfe798]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #21 main + 0x4670 [ip=0xe690] [sp=0xffbfe7fc]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #22 tevent_common_invoke_fd_handler + 0x8b [ip=0x2056b64b] [sp=0xffbfe8b4]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #23 tevent_context_same_loop + 0xdd2 [ip=0x2056e612] [sp=0xffbfe8dc]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #24 _tevent_loop_once + 0xcf [ip=0x2056a5af] [sp=0xffbfe924]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #25 tevent_common_loop_wait + 0x2f [ip=0x2056a7bf] [sp=0xffbfe94c]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #26 _tevent_loop_wait + 0x1c [ip=0x2056a84c] [sp=0xffbfe968]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #27 main + 0x2ab9 [ip=0xcad9] [sp=0xffbfe97c]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #28 main + 0x15e9 [ip=0xb609] [sp=0xffbfe9a0]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #29 __libc_start1 + 0x155 [ip=0x205e38c5] [sp=0xffbfebcc]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #30 _start + 0x36 [ip=0x9d06] [sp=0xffbfebf0]
  |Dec  4 13:07:01 oxygen smbd[10655]: [2023/12/04 13:07:01.547212,  0] ../../source3/lib/dumpcore.c:310(dump_core)
  |Dec  4 13:07:01 oxygen smbd[10655]:   unable to change to %N.core
  |Dec  4 13:07:01 oxygen smbd[10655]:   refusing to dump core
  +----------------------------

The various environments and configurations are as follows:
 (hostname is 'oxygen' and using i386 architecture)
  +----------------------------
  |% uname -a
  |FreeBSD oxygen 14.0-RELEASE FreeBSD 14.0-RELEASE #3: Mon Nov 27 22:32:52 JST 2023     uratan@oxygen:/usr/src/sys/i386/compile/OXYGEN i386
  |
  |% pkg info | grep samba
  |samba416-4.16.11     Free SMB/CIFS and AD/DC server and client for Unix
  +----------------------------

  (essence of) /usr/local/etc/smb4.conf
  +----------------------------
  |[global]
  |   workgroup = METXXXXX
  |   server string = Samba Server
  |    local master = no
  |    security     = user
  |    map to guest = Bad User
  |    load printers = yes
  |    printing      = bsd
  |    printcap cache time = 0
  |    unix charset = cp932
  |    dos  charset = cp932
  |   server role = standalone server
  |    hosts allow = 10.
  |    guest account = nobody
  |   log file = /var/log/samba4/log.%m
  |   max log size = 50
  |   dns proxy = no
  |
  |[printers]
  |   comment = All Printers
  |   path = /var/spool/samba4
  |   browsable = no
  |   guest ok = no
  |   writeable = no
  |   printable = yes
  |
  |[maindish]
  |    comment = exported for PC
  |    path    = /home
  |   vfs objects = cap
  |    browseable = yes
  |    guest ok   = yes
  |    guest only = yes
  |    writeable  = yes
  |    create    mask = 0775
  |    directory mask = 0775
  +----------------------------

  (essence of) /etc/rc.conf
  +----------------------------
  |samba_server_enable="YES"
  |nmbd_enable="NO"
  |smbd_enable="YES"
  |winbindd_enable="NO"
  +----------------------------

Confirm attached file: dmesg-today.txt for misc hardware environments.
note: re0 driver is replaced by RealTek's to avoid 'watchdog timeout'.
 (rtl_bsd_drv_v199.04.tgz from
  https://www.realtek.com/ja/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software )
It should have nothing to do with this Samba problem because,
when using GENERIC kernel at first phase, samba PANIC and
re0: watchdog timeout were both seen in kernel messages.
I suspected watchdog timeout was the core cause at first,
but it wasn't. (maybe)

I am using windows7 (also x86/32bit version) as the test client,
and using win32 native executable diff.exe from UnxUtils
in the subsequent reports.
 (see https://unxutils.sourceforge.net/ for UnxUtils)
  +----------------------------
  |U:\> ver
  |Microsoft Windows [Version 6.1.7601]
  |
  |U:\> diff.exe --version
  |diff - GNU diffutils version 2.7
  |
  |U:\> net use
  |--------------------------------------------------------
  |OK  I:  \\silver\maindish  Microsoft Wi...  <== 10.2R server (samba36)
  |OK  J:  \\oxygen\maindish  Microsoft Wi...  <== 14.0R (samba416)
  |
  +----------------------------

hostname 'silver' is my current home server,
running with FreeBSD 10.2R (i386).
  +----------------------------
  |% uname -a
  |FreeBSD silver 10.2-RELEASE FreeBSD 10.2-RELEASE #1: Sun Mar 12 09:07:49 JST 2017     uratan@silver:/usr/src/sys/i386/compile/OXYGEN  i386
  |
  |% pkg info | grep samba
  |samba36-3.6.25_1     Free SMB and CIFS client and server for Unix
  +----------------------------

 (10.2R ? Win7 ? i386 ? The reason why various things are all outdated)
 (is because I am an old man also outdated who strongly likes stability.)

 - * - * -

Show the error situation at first.
 (Please note the difference in drive letter i: or j:)
"x:\win\Hardwares" has: 1454 dirs 9644 files 144GB in total.

These errors reported are caused by smbd PANIC abort.
The error files/dirs are different for each run.
There is NO error that says the file contents are different.
  +----------------------------
  |U:\> diff -rq i:\win\Hardwares j:\win\Hardwares
  |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_4444\igdlh.cat: No such file or directory
  |diff: j:\win\Hardwares\T100Chi\someDir_777: Invalid argument
  |
  |U:\> diff -rq j:\win\Hardwares i:\win\Hardwares
  |diff: j:\win\Hardwares\ATerm\someDir_88: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_4444\igdlh.cat: No such file or directory
  |diff: j:\win\Hardwares\ScanSnap\someDir_6666: Invalid argument
  |diff: j:\win\Hardwares\T100Chi\someDir_777: Invalid argument
  +----------------------------

 - * - * -

I also tried another version of samba: samba413-4.13.17_6.pkg
but the situation was the same, smbd aborts, too.

I analyzed the PANIC abort logs left in the /var/log/messages
from beginning of the evaluation to today.
There are 54 logs in total by both version of samba.
See the summary of BACKTRACEs below, the intermediate flow is slightly
different, but the flow to PANIC abort seems to be almost same.

The <unknown symbol> is very suspicious and, I think,
the failure of ReadDirName() or vfs_readdirname() are not
so far, I think, from the errors reported by diff.exe:
"Invalid argument" or "No such file or directory".
  -----------------------------------------------------------------------------
  |    25 times by samba416                  |    29 times by samba413
   ----25times----------------COMMON FLOW----------29times---------------------
  | _start + 0x36                            | _start + 0x36
  | __libc_start1 + 0x155                    | __libc_start1 + 0x155
  | main + 0x15e9                            | main + 0x199d
  | main + 0x2ab9                            | main + 0x2db9
  | _tevent_loop_wait + 0x1c                 | _tevent_loop_wait + 0x1c
  | tevent_common_loop_wait + 0x2f           | tevent_common_loop_wait + 0x2f
  | _tevent_loop_once + 0xcf                 | _tevent_loop_once + 0xcf
  | tevent_context_same_loop + 0xdd2         | tevent_context_same_loop + 0xdd2
  | tevent_common_invoke_fd_handler + 0x8b   | tevent_common_invoke_fd_handler + 0x8b
  | main + 0x4670                            | main + 0x4925
  | smbd_process + 0x795                     | smbd_process + 0x79b
  | _tevent_loop_wait + 0x1c                 | _tevent_loop_wait + 0x1c
  | tevent_common_loop_wait + 0x2f           | tevent_common_loop_wait + 0x2f
  | _tevent_loop_once + 0xcf                 | _tevent_loop_once + 0xcf
  | tevent_context_same_loop + 0xdd2         | tevent_context_same_loop + 0xdd2
  | tevent_common_invoke_fd_handler + 0x8b   | tevent_common_invoke_fd_handler + 0x8b
  | smbd_smb2_process_negprot + 0x244a       | smbd_smb2_process_negprot + 0x237a
  | smbd_smb2_request_dispatch + 0xXXX       | smbd_smb2_request_dispatch + 0xXXX
    ----15times----------------FLOW-CASE-A-----------6times---------------------
   A smbd_smb2_request_process_create         ( smbd_smb2_request_process_create
   A smbd_smb2_request_process_create         ( smbd_smb2_request_process_create
   A filename_convert                         ( filename_convert
   A                                          ( filename_convert
   A unix_convert        2                    ( unix_convert
   A get_real_filename_full_scan              ( get_real_filename_full_scan
  ----10times----------------FLOW-CASE-B----------23times---------------------
 B smbd_smb2_request_process_query_directory( smbd_smb2_request_process_query_directory
 B smbd_smb2_request_process_query_directory( smbd_smb2_request_process_query_directory
 B smbd_dirptr_lanman2_entry                ( smbd_dirptr_lanman2_entry
 B smbd_dirptr_get_entry                    ( smbd_dirptr_get_entry
 B                                          ( can_delete_directory_fsp
   ----25times----------------COMMON FLOW----------29times---------------------
  | ReadDirName + 0x111                      | ReadDirName + 0x10b
  | vfs_readdirname + 0x56                   | vfs_readdirname + 0x53
  | <unknown symbol>                         | <unknown symbol>
  | _pthread_setschedparam + 0x969           | _pthread_setschedparam + 0x969
  | _pthread_sigmask + 0x5c9                 | _pthread_sigmask + 0x5c9
  | fault_setup + 0xae                       | fault_setup + 0xae
  | smb_panic + 0x1a                         | smb_panic + 0x1a
  | smb_panic_log + 0x6c                     | smb_panic_log + 0x6c
  | log_stack_trace + 0x43                   | log_stack_trace + 0x43
  -----------------------------------------------------------------------------

 - * - * -

To narrow down the factors, I added one more share
for debug to /usr/local/etc/smb4.conf.
This share is same as [maindish] except the
configuration "vfs object". (and "comment")
  +----------------------------
  |[m2-test]
  |    comment = test
  |    path    = /home
  |;   vfs objects = cap
  |    browseable = yes
  |    guest ok   = yes
  |    guest only = yes
  |    writeable  = yes
  |    create    mask = 0775
  |    directory mask = 0775
  +----------------------------

Now they are seen like below from Win7 client.
  +----------------------------
  |U:\> net use
  |--------------------------------------------------------
  |OK  I:  \\silver\maindish  Microsoft Wi...  <== 10.2R server (samba36)
  |OK  J:  \\oxygen\maindish  Microsoft Wi...  <== 14.0R (samba416, vfs object=cap)
  |OK  K:  \\oxygen\m2-test   Microsoft Wi...  <== 14.0R (samba416, NO vfs object)
  +----------------------------

See the result of the test below.
"x:\win\Hardwares\N3x50B" has: 622 dirs 3356 files 3.1GB in total.
(I chose an area that did not contain Japanese file names.)
 (Please note the difference in drive letter i:, j: or k:)

There, No error is reported when using drive k:.
So 'vfs objects = cap' has the key of the BUG I think.
I've gotten the obvious response and ...
an unfortunate fact that a error reported from drive i: !!!
"WAO NANTTE KOTTAI" (means "Oh my god !" or "what's the hell" in Japanese)
  +----------------------------
  |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B
  |diff: j:\win\Hardwares\N3x50B\someDir_22: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_333: Invalid argument
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B
  |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B
  |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument
**|diff: i:\win\Hardwares\N3x50B\someDir_22: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_333\libmfxhw32.dll: No such file or directory
  |diff: j:\win\Hardwares\N3x50B\someDir_55555\igdusc32.dll: No such file or directory
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B
  |U:\>
  +----------------------------

 - * - * -

I trusted my current home server silver(10.2R) because
it works well for about 8 years.
However, 74 smbd abortion logs are found in the /var/log/messages
and "Subject: silver daily security run output" mails to root.
But I have never felt that my files were lost.
  +----------------------
  |2016-03-21  03:01:25 (smbd) on signal 6
    :             :
**|2023-12-05  22:21:02 silver kernel: pid 90073 (smbd), uid 0: exited on signal 6
  +----------------------

The configuration of samba36 is almost same as samba416 because
smb4.conf inherited smb.conf of smaba36, except mainly below:
  +---------------------- smb4.conf
  |    security     = user
  |    map to guest = Bad User
  +---------------------- smb.conf for samba36 on silver
  |    security = share
  +----------------------

Also, silver is running on same motherboard to oxygen.
 (except number of CPU core, silver is 2 and oxygen is 4)

 - * - * -

Aside from that, I have issued same test with
replacing charset configuration to:
  +----------------------------
  |    unix charset = ascii
  |    dos  charset = ascii
  +----------------------------
The result is same, smbd aborts only when "vfs objects = cap".

 - * - * -

CONCLUSION

I think...
Samba smbd with configuration 'vfs objects = cap' must have
a bug near vfs_readdirname() from version 3.xx potentially,
and it is exposed when some timing condition is match.


The cap function is necessary for for my home server.
Someone help me !

 - * - * -

From here, I would like to state my feelings, etc at last.

(a) Why "I have never felt that my files were lost." ?
  Because it seems that Explorer.exe retries so seriously against
  any errors in the copy operation both src/dst direction.
   (So this problem does not become so critical because of this.)
      (Actually, it may not do any real harm to me either....except)
      (I'll be surprised when I look at the /var/log/messages....)

(b) This PANIC abort occurs rather high frequency when:
  (b1) just after starting copy from Expoloer.exe
  (b2) issuing "Properties" from Explorer.exe
        (It seems that, Unlike in the case of copy, NO retries are)
        (made while "Properties", so it reports less capacities or )
        (number of files if the PANIC abort occurs while counting up)

  In both case, Explorer.exe scans only directories
  to gather infos without access to the file contents itself.
  High density of the repeated request to the directories
  is the trigger of this problem, I think.

  For example... like this...
  +----------------------------
  | initial_jobs();
  | while(1) {
  |   /* ready */
  |   wait_for_request();
  |   exec_the_request();
  |   answer_the_request();   /* because the client gets answer at here */
  |   post_jobs_for_ready();   /* next request may be arrived while here */
  | }                           /* before returning to ready... */
  +----------------------------

p.s.
 I noticed just NOW, that the depth of the directory hierarchy might have
 something to do with the problem, so I have also attached a file:
   list-of-err-files.txt
 which has a list of the actual full path names where the error occurred.
Comment 1 uratan 2023-12-07 10:57:12 UTC
Created attachment 246853 [details]
actual full path names of error dirs/files
Comment 2 uratan 2023-12-18 01:45:12 UTC
Created attachment 247109 [details]
a script to make a set of test files that is used by the debugging

I noticed that I could do printf-debugging by setting the
debug level in /usr/local/etc/smb4.conf, ...

  (modification to) /usr/local/etc/smb4.conf
  +-----------------------------------------
  |;   max log size = 50
  |   max log size = 9999
  |   log level = 6
  +-----------------------------------------

... so I decided to start tracking down the cause by myself.

I made samba-4.16.11 (same version as packages) from ports
and gathered information with the DEBUG() macro.
which are output to /var/log/samba4/log.<client-hostname>.

 - * - * -

I confirmed at first the place where the 'Segmentation fault'
were caused at, it was in function cap_readdir() shown below.

  (/usr/ports/net/samba416/work/samba-4.16.11/source3/modules/vfs_cap.c)
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
   88 |                                  struct files_struct *dirfsp,
   89 |                                  DIR *dirp,
   90 |                                  SMB_STRUCT_STAT *sbuf)
   91 |{
   92 |        struct dirent *result;
   93 |        struct dirent *newdirent;
   94 |        char *newname;
   95 |        size_t newnamelen;
   96 |        DEBUG(3,("cap: cap_readdir\n"));
   97 |
   98 |        result = SMB_VFS_NEXT_READDIR(handle, dirfsp, dirp, NULL);
   99 |        if (!result) {
  100 |                return NULL;
  101 |        }
  102 |
  103 |        newname = capdecode(talloc_tos(), result->d_name);
  104 |        if (!newname) {
  105 |                return NULL;
  106 |        }
  107 |        DEBUG(3,("cap: cap_readdir: %s\n", newname));
  108 |        newnamelen = strlen(newname)+1;
  109 |        newdirent = talloc_size(
  110 |                talloc_tos(), sizeof(struct dirent) + newnamelen);
  111 |        if (!newdirent) {
  112 |                return NULL;
  113 |        }
  114 |        talloc_set_name_const(newdirent, "struct dirent");
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
**116 |        memcpy(&newdirent->d_name, newname, newnamelen);
  117 |        return newdirent;
  118 |}
      +------------------------------------------------------------------

In the function cap_readdir(), there are 2 memcpy()s used
just before return statement.  I confirmed that the
'Segmentation fault' were caused only at first memcpy(),
the second memcpy() has no problem, with DEBUG() like below.
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
      |
 -- omitted --
      |
  114 |        talloc_set_name_const(newdirent, "struct dirent");
      |DEBUG(3,("@@u@@ 456\n"));
      |memcpy(&newdirent->d_name, newname, newnamelen);       //// dup of #116
      |DEBUG(3,("@@u@@ 888\n"));
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
      |DEBUG(3,("@@u@@ 789\n"));
**116 |        memcpy(&newdirent->d_name, newname, newnamelen);
      |DEBUG(3,("@@u@@ 999\n"));
  117 |        return newdirent;
  118 |}
      +------------------------------------------------------------------
       |    |
       V    V
  (excerpt of /var/log/samba4/log.argon near PANIC abort with above codes)
  +------------------------------------------------------------------
  |    :
  |  cap: cap_readdir
  |  cap: cap_readdir: igfxCPL.cpl
  |  @@u@@ 456
  |  @@u@@ 888
  |  @@u@@ 789
  |  @@u@@ 999
  |  smbd_dirptr_get_entry: dirptr 0x2af766c0 now at offset 7585
  |  fsp_new: allocated files structure (508 used)
  |  fget_ea_dos_attribute: Cannot get attribute from EA on file win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCPL.cpl: Error = Attribute not found
  |  dos_mode_debug_print: fdos_mode returning (0x80): ""
  |  smbd_dirptr_get_entry mask=[*] found win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCPL.cpl fname=igfxCPL.cpl (igfxCPL.cpl)
  |  file_free: freed files structure 0 (507 used)
  |  cap: cap_readdir
  |  cap: cap_readdir: igfxCUIService.exe
  |  @@u@@ 456
  |  @@u@@ 888
  |  @@u@@ 789
  |  @@u@@ 999
  |  smbd_dirptr_get_entry: dirptr 0x2af766c0 now at offset 7681
  |  fsp_new: allocated files structure (508 used)
  |  fget_ea_dos_attribute: Cannot get attribute from EA on file win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCUIService.exe: Error = Attribute not found
  |  dos_mode_debug_print: fdos_mode returning (0x80): ""
  |  smbd_dirptr_get_entry mask=[*] found win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCUIService.exe fname=igfxCUIService.exe (igfxCUIService.exe)
  |  file_free: freed files structure 0 (507 used)
  |  cap: cap_readdir
  |  cap: cap_readdir: igfxCUIServicePS.dll
  |  @@u@@ 456
  |  @@u@@ 888
  |  ===============================================================
  |  INTERNAL ERROR: Signal 11: Segmentation fault in pid 16090 (4.16.11)
  |  If you are running a recent Samba version, and if you think ....
  |  ===============================================================
  |  PANIC (pid 16090): Signal 11: Segmentation fault in 4.16.11
  |  BACKTRACE:
  |   #0 log_stack_trace + 0x43 [ip=0x203e8643] [sp=0xffbfd35c]
  |    :
  +------------------------------------------------------------------

 - * - * -

I tried various things hoping something would change
the situation for about a week (only within the cap_readdir()).
Finally, after over 20 builds & tests, I reached some solution that,
writing down the first memcpy() by replacing with simple
assignment statements (see below) made the problem disappeared.
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
      |
 -- omitted --
      |
  114 |        talloc_set_name_const(newdirent, "struct dirent");
      |#define Q_HACK_U  1
      |#if Q_HACK_U
      |        newdirent->d_pad1   = result->d_pad1   ;
      |        newdirent->d_namlen = result->d_namlen ;
      |        newdirent->d_pad0   = result->d_pad0   ;
      |        newdirent->d_type   = result->d_type   ;
      |        newdirent->d_reclen = result->d_reclen ;
      |        newdirent->d_off    = result->d_off    ;
      |        newdirent->d_fileno = result->d_fileno ;
      |#else /* Q_HACK_U */
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
      |#endif /* Q_HACK_U */
**116 |        memcpy(&newdirent->d_name, newname, newnamelen);
  117 |        return newdirent;
  118 |}
      +------------------------------------------------------------------
  note:
    I considered that the bottom side of the memory area is more safe
    because the second memcpy() did not cause any problems,
    so it is the reason why the structure member assignments are
    in reverse order (to insert further DEBUG()s if needed).
    Also member d_name[] is not copied manually because
    it is done by second memcpy().

 - * - * -

Since there was nothing unnatural about the pointer values, etc.
by the result of another research with DEBUG() like below, ...
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
      |
 -- omitted --
      |
      |DEBUG(3,("@u@ %lx %lx %x\n", newdirent, result, sizeof(struct dirent)));
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
      +------------------------------------------------------------------
       |    |
       V    V
  (excerpt of /var/log/samba4/log.argon near PANIC abort with above codes)
  +------------------------------------------------------------------
  |    :
  |  cap: cap_readdir: HDXRtGi.inf
  |  @u@ 2a7b6130 22e82e90 118
  |  @@u@@ 999
  |  cap: cap_readdir: HDXSBCH.inf
  |  @u@ 2a7b6430 22e82eb8 118
  |  @@u@@ 999
  |  cap: cap_readdir: HDXSEDS.inf
  |  @u@ 2a7b65b0 22e82ee0 118
  |  @@u@@ 999
  |  cap: cap_readdir: HDXSF.inf
  |  @u@ 2a7b6730 22e82f08 118
  |  ===============================================================
  |  INTERNAL ERROR: Signal 11: Segmentation fault in pid 94386 (4.16.11)
  |    :
  +------------------------------------------------------------------

... I can only assume that there is something in the implementation
of memcpy() that is incompatible with smbd's behavior... (maybe)
 (multi-core ? or threading ?)

Well, however, I feel very very very happy right now
because the diff.exe works well now. (and quiet /var/log/messages)

 - * - * -

ADDITIONAL INFO

[a] I also modified the version message to distinguish my smbd.

  (/usr/ports/net/samba416/work/samba-4.16.11/source3/include/smb.h)
      +------------------------------------------------------------------
   31 |
   32 |/* logged when starting the various Samba daemons */
      |#define Q_HACK_U  1
      |#if Q_HACK_U
      |#define COPYRIGHT_STARTUP_MESSAGE       "Copyright Andrew Tridgell and the Samba Team 1992-2023 +u"
      |#else /* Q_HACK_U */
   33 |#define COPYRIGHT_STARTUP_MESSAGE       "Copyright Andrew Tridgell and the Samba Team 1992-2022"
      |#endif /* Q_HACK_U */
   34 |
      +------------------------------------------------------------------

[b] The subjected 14.0R uses UFS file system. (/home is shared by Samba)
  +-----------------------------------------
  | % mount
  |/dev/ada0p3 on / (ufs, local, noatime)
  |devfs on /dev (devfs)
  |/dev/ada0p4 on /usr (ufs, local, noatime, soft-updates)
  |/dev/ada0p5 on /home (ufs, NFS exported, local, noatime, soft-updates)
  |/dev/md1 on /mnt (ufs, local)
  |fdescfs on /var/run/samba4/fd (fdescfs)
  +-----------------------------------------

[c] The depth of the directory hierarchy might have no relations
    to this problem. Attached sh-scripts: z2-make-test-files.sh
     makes a set of test files that was used by above debugging.
  +-----------------------------------------
  |% sh z2-make-test-files.sh
  |.......
            :
  |          ...... done
  |
  |% du -d2 -m teHWs/
  |620     teHWs/N3x50B-ITX/7464
  |621     teHWs/N3x50B-ITX/7492
  |322     teHWs/N3x50B-ITX/v15.38.2.64.4189
  |261     teHWs/N3x50B-ITX/v15.40.0.4177_BSW
  |1822    teHWs/N3x50B-ITX
  |1822    teHWs/
  |
  |% find teHWs/ -type f | wc -l
  |    1591
  |% find teHWs/ -type d | wc -l
  |     135
  +-----------------------------------------

    Since this problem may or may not appear once per comparison
    by diff.exe below (about 1min), so the frequency of occurrence
    is about less less less than 1/3182 (1591 files * 2 trials) in
    my environment...
  +-----------------------------------------
  |U:\> diff.exe -rq j:\win\tst\teHWs u:\tst\teHWs     (u: is win7 local disk)
  +-----------------------------------------
Comment 3 uratan 2023-12-23 14:37:02 UTC
I will close this topic with this.

/*
 * how to make samba4 from ports  (memorandum for me in the future)
 */                                 (and for those who got here)

[1] Extract files from freebsd-dist/ports.txz
  +------------------------------------------------------
  |# cd /
  |# tar xzf ...somewhere/freebsd-dist/ports.txz  usr/ports/net/samba416
  +------------------------------------------------------
    Also these files are needed to make samba416
  +------------------------------------------------------
  |# tar xzf ...somewhere/freebsd-dist/ports.txz  usr/ports/Keywords
  |# tar xzf ...somewhere/freebsd-dist/ports.txz  usr/ports/Mk
  |# tar xzf ...somewhere/freebsd-dist/ports.txz  usr/ports/Templates
  |# tar xzf ...somewhere/freebsd-dist/ports.txz  usr/ports/lang
  |# tar xzf ...somewhere/freebsd-dist/ports.txz  usr/ports/ports-mgmt
  +------------------------------------------------------

[2] Other preparations

    These packages were required additionally to make in my environment
      | p5-Parse-Yapp-1.21.pkg   | libtextstyle-0.22.pkg
      | cmocka-1.1.5.pkg         | bison-3.8.2,1.pkg
      | p5-JSON-4.10.pkg         | m4-1.4.19,1.pkg
      | pkgconf-2.0.3,1.pkg
    (I wonder... 'make' would have added them on during 'make'ing ?)

    The distfile: samba-4.16.11.tar.gz will be downloaded automatically
    but if you already have it, place it to /usr/ports/distfiles/

[3] Do (first) make
  +------------------------------------------------------
  |# cd /usr/ports/net/samba416
  |# make
  +------------------------------------------------------
    Then some configuration menu is appeared,
    simply select <Cancel> to choose/keep default settings
  +------------------------------------------------------
  |lqqqqqqqqqqqqqqqqqqqqqqqqqqq samba416-4.16.11 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqk
  |x lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk x
  |x x+[x] ADS             Active Directory client(implies LDAP)            x x
  |x x+[x] AD_DC           Active Directory Domain Controller(implies PYTHONx x
  |x x+[ ] CLUSTER         Clustering support                               x x
  |x x+[ ] CUPS            CUPS printing system support                     x x
  |x x+[x] DOCS            Build and/or install documentation               x x
  |x x+[x] FAM             File Alteration Monitor                          x x
  |x x+[ ] GPGME           GpgME support                                    x x
  |x x+[x] LDAP            LDAP client                                      x x
  |x x+[ ] MANDOC          Build manpages from DOCBOOK templates            x x
  |x x+[x] PROFILE         Profiling data                                   x x
  |x x+[x] PYTHON3         Python 3.x bindings or support                   x x
  |x x+[x] QUOTAS          Disk quota support                               x x
  |x x+[ ] SPOTLIGHT       Spotlight server-side search support             x x
  |x x+[x] SYSLOG          Syslog logging support                           x x
  |x x+[x] UTMP            UTMP accounting                                  x x
  |x xqqqqqqqqqqqqqqqqqqqqqqqqqqqqq VFS modules qqqqqqqqqqqqqqqqqqqqqqqqqqqqx x
  |x mqqqqqv(+)qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq64%qqqqqj x
  |tqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqu
  |x                       <  OK  >            <Cancel>                       x
  |mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
  +------------------------------------------------------

    It takes about 30min to finish in my environment,
    I've got same binary to samba416-4.16.11.pkg
  +------------------------------------------------------
  |# pwd
  |/usr/ports/net/samba416
  |
  |# ls -F
  |Makefile                pkg-descr               pkg-plist.cluster
  |distinfo                pkg-plist               pkg-plist.python
  |files/                  pkg-plist.ad_dc         work/
  |
  |# ls -F work/
  |.PLIST.mktmp                            .patch_done.samba416._usr_local
  |.build_done.samba416._usr_local         .stage_done.samba416._usr_local
  |.configure_done.samba416._usr_local     README.FreeBSD
  |.extract_done.samba416._usr_local       pkg-message
  |.license-catalog.mk                     samba-4.16.11/
  |.license-report                         samba_server
  |.license_done.samba416._usr_local       stage/
  |
  |# md5 -r work/stage/usr/local/sbin/smbd \
  |                   /usr/local/sbin/smbd
  |e281293697932aff5d3c32c211da3a4b work/stage/usr/local/sbin/smbd
  |e281293697932aff5d3c32c211da3a4b /usr/local/sbin/smbd
  +------------------------------------------------------

[4] How to make again after changing source codes
  +------------------------------------------------------
  |# pwd
  |/usr/ports/net/samba416
  |
  |# cd work/samba-4.16.11/
  |# vim source3/modules/vfs_cap.c
  |# cd ../../
  |
  |# make
  |  ...                        <-- do <Cancel> for the configuration menu
  |===> Options unchanged
  |#                            ... ???? NOTHING is DONE !!!!
  |
  |# ls -lrt work/
  |total 1
  |-rw-r--r--   1 root wheel     0 Dec 23 13:45 .extract_done.samba416._usr_local
  |-rw-r--r--   1 root wheel   215 Dec 23 13:45 .license-catalog.mk
  |-rw-r--r--   1 root wheel    93 Dec 23 13:45 .license-report
  |-rw-r--r--   1 root wheel     0 Dec 23 13:45 .license_done.samba416._usr_local
  |-rw-r--r--   1 root wheel     0 Dec 23 13:45 .patch_done.samba416._usr_local
  |-rw-r--r--   1 root wheel   958 Dec 23 13:45 pkg-message
  |-rw-r--r--   1 root wheel  3094 Dec 23 13:45 README.FreeBSD
  |-rw-r--r--   1 root wheel  8412 Dec 23 13:45 samba_server
  |drwxr-xr-x  36 root wheel  2048 Dec 23 13:49 samba-4.16.11
  |-rw-r--r--   1 root wheel     0 Dec 23 13:49 .configure_done.samba416._usr_local
  |-rw-r--r--   1 root wheel     0 Dec 23 14:13 .build_done.samba416._usr_local
  |drwxr-xr-x   4 root wheel   512 Dec 23 14:13 stage
  |-rw-r--r--   1 root wheel 52433 Dec 23 14:17 .PLIST.mktmp
  |-rw-r--r--   1 root wheel     0 Dec 23 14:17 .stage_done.samba416._usr_local
  |
  |# rm  work/.stage_done.samba416._usr_local \
  |      work/.build_done.samba416._usr_local
  |
  |# make
  |  ...                        <-- do <Cancel> for the configuration menu
  |===> Options unchanged
  |                             ... re-make started (takes 5min to finish)
  +------------------------------------------------------

[5] How to install samba4 of ports
  +------------------------------------------------------
  |# pkg info | fgrep samba4
  |samba416-4.16.11        Free SMB/CIFS and AD/DC server and client for Unix
  |# pkg delete samba416-4.16.11
  |
  |# pwd
  |/usr/ports/net/samba416
  |# make install
  |  ...                        <-- do <Cancel> for the configuration menu
  +------------------------------------------------------

[6] How to rollback to samba4 of packages
  +------------------------------------------------------
  |# pwd
  |/usr/ports/net/samba416
  |# make deinstall
  |
  |# pkg intall samba416-4.16.11
  +------------------------------------------------------

    you can confirm installation status from /var/log/messages
     | "pkg-static" means install/deinstall from ports
     | "pkg"        means install/delete    by pkg
  +------------------------------------------------------
  |# bzcat /var/log/messages.0.bz2 | fgrep pkg
           :                :
  |Dec 16 17:34:38 oxygen pkg-static[11504]: samba416-4.16.11 installed
  |Dec 16 18:27:13 oxygen pkg-static[39133]: samba416-4.16.11 deinstalled
  |Dec 16 18:27:27 oxygen pkg-static[39645]: samba416-4.16.11 installed
  |Dec 16 22:23:23 oxygen pkg-static[65811]: samba416-4.16.11 deinstalled
  |Dec 16 22:26:15 oxygen pkg[65987]: pkg upgraded: 1.20.8 -> 1.20.9
  |Dec 16 22:27:52 oxygen pkg[66699]: samba416-4.16.11 installed
  |Dec 16 23:00:40 oxygen pkg[81824]: samba416-4.16.11 deinstalled
  |Dec 16 23:01:17 oxygen pkg-static[82502]: samba416-4.16.11 installed
  |Dec 16 23:24:09 oxygen pkg-static[96335]: samba416-4.16.11 deinstalled
  |Dec 16 23:24:40 oxygen pkg-static[96976]: samba416-4.16.11 installed
  |Dec 17 00:01:49 oxygen pkg-static[13207]: samba416-4.16.11 deinstalled
  |Dec 17 00:02:39 oxygen pkg[13431]: samba416-4.16.11 installed
  |Dec 17 00:32:43 oxygen pkg[27009]: samba416-4.16.11 deinstalled
  |Dec 17 00:33:14 oxygen pkg-static[27678]: samba416-4.16.11 installed
  +------------------------------------------------------
            ... My smbd has been running great ever since!

 (Please let me know if there are any problems with the workflow)
Comment 4 Kirk McKusick freebsd_committer freebsd_triage 2023-12-23 19:41:23 UTC
(In reply to uratan from comment #3)
Thanks for your detailed report and final comment #3 that details how to make it all work.

For completeness, I suggest that you close this bug report as you have figured out how to make it work.
Comment 5 Mark Millard 2023-12-23 21:14:22 UTC
(In reply to Kirk McKusick from comment #4)

I'm confused. Comment #2 includes a patch that has not been
made to the port:

QUOTE
      |#define Q_HACK_U  1
      |#if Q_HACK_U
      |        newdirent->d_pad1   = result->d_pad1   ;
      |        newdirent->d_namlen = result->d_namlen ;
      |        newdirent->d_pad0   = result->d_pad0   ;
      |        newdirent->d_type   = result->d_type   ;
      |        newdirent->d_reclen = result->d_reclen ;
      |        newdirent->d_off    = result->d_off    ;
      |        newdirent->d_fileno = result->d_fileno ;
      |#else /* Q_HACK_U */
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
      |#endif /* Q_HACK_U */
END QUOTE

While a more directly useful variant of the patch would be
appropriate as an attachment, this submittal is reporting
a bug and an example source code fix/workaround. It seems
early to close this bug report.
Comment 6 Kirk McKusick freebsd_committer freebsd_triage 2023-12-23 22:41:15 UTC
(In reply to Mark Millard from comment #5)
I obviously did not read it closely enough as I thought the patch had been made.

I have poked some freebsd folks to try and find out who needs to take action to move this forward.
Comment 7 uratan 2023-12-24 12:38:43 UTC
(In reply to Kirk McKusick from comment #6)

It is OK either to close this bug report here or not
because I have provided enough information for:
 [a] someone who may have the same problem as me.
 [b] someone who may try to reproduce the problem.
 (My role ends here, I think)

 - * - * -

(fully off topic)
By the way, you !!! Kirk McKusick san ??? One of the author of
"The Design and Implementation of the 4.4BSD Operating System" ?
I have the book hard-covered English edition ! (not second-hand)
 (but not yet read it, saving it for my fun in old age/post retirement ;-)

May I use my second name as "A man commented on by Kirk McKusick" ?

And..., if you know something, tell me that my 8 years ago report
about setting of disk Command Queuing and CAM sort_io_queue:
 https://forums.freebsd.org/threads/bad-response-while-reading-long-long-files.54768/
was helpful/meaningful or not..., if if if you know or someone please.
 (I confirmed the problem is not appeared on 14.0R)
Comment 8 Joseph Mingrone freebsd_committer freebsd_triage 2023-12-24 15:23:33 UTC
Thanks for the thorough description of the problem and proposed solution.

I extracted your changes, created an i386-only patch for net/samba416, and added the maintainer, timur@FreeBSD.org.

https://reviews.freebsd.org/D43171

I also reassigned the bug here to timur@, but I believe he's been unavailable for the past few months, so we might have to wait two weeks for a maintainer timeout before committing.  

Ideally, we would understand why the first memcpy() is failing.  Maybe there is something to submit upstream.

Finally, unless there is a compelling reason to use i386, you might consider using amd64 for your home server if your current hardware supports it.  Starting with FreeBSD 13, i386 was demoted to a tier 2 architecture, and it's projected to be unsupported as of FreeBSD 15. 

https://www.freebsd.org/platforms/
Comment 9 martin 2023-12-29 18:41:31 UTC
Created attachment 247338 [details]
direntcpy.c test case

The memcpy() will fail if result+sizeof(struct dirent) is in unmapped memory.  E.g. the output from the attached direntcpy.c ends with this for me on amd64 12.4-RELEASE-p9:

19 ...................................................................................................................Segmentation fault (core dumped)

It seems that readdir() only guarantees that memory up to result+result->d_reclen is readable.

It crashes on Linux as well (but after more iterations).
Comment 10 uratan 2023-12-31 12:46:48 UTC
Created attachment 247374 [details]
readdir.c - readdir() test code

(In reply to martin from comment #9)

> It seems that readdir() only guarantees that
> memory up to result+result->d_reclen is readable.

It was a blind spot that there was a problem on the src/readout side!
But I could not reproduce crash situation by your test code
on both 10.2R(i386) 14.0R(amd64), so confirm them by printf().

Here is example result of attached code: readdir.c on 14.0R(amd64).
There are no extra spaces between each entries.
  +------------------------------------------------------------------
  |% ls -lf      (-f specifies no-sort)
  |total 1
  |drwxr-xr-x  2 uratan nobody   512 Dec 30 21:14 ./
  |drwxr-xr-x  5 uratan nobody   512 Dec 30 21:13 ../
  |-rw-r--r--  1 uratan nobody     0 Dec 30 21:07 x12345678901234567890
  |-rw-r--r--  1 uratan nobody     0 Dec 30 21:08 y123456789
  |-rw-r--r--  1 uratan nobody     0 Dec 30 21:08 z123
  |-rwxr--r--  1 uratan nobody   579 Dec 30 21:08 readdir.c*
  |-rw-r--r--  1 uratan nobody    43 Dec 30 21:14 typescript
  |-rwxr-xr-x  1 uratan nobody 10488 Dec 30 21:08 readdir*
  |
  |% ./readdir
  |---- dp @ 0x20d69ba0c000 (+36106106093568)
  | name:   .
  | reclen: 32
  |---- dp @ 0x20d69ba0c020 (+32)
  | name:   ..
  | reclen: 32
  |---- dp @ 0x20d69ba0c040 (+32)
  | name:   x12345678901234567890
  | reclen: 48             -------------+  the difference of pointer dp
  |---- dp @ 0x20d69ba0c070 (+48)   <---+  is identical to d_reclen
  | name:   y123456789
  | reclen: 40
  |---- dp @ 0x20d69ba0c098 (+40)
  | name:   z123
  | reclen: 32
  |---- dp @ 0x20d69ba0c0b8 (+32)
  | name:   readdir.c
  | reclen: 40
  |---- dp @ 0x20d69ba0c0e0 (+40)
  | name:   typescript
  | reclen: 40
  |---- dp @ 0x20d69ba0c108 (+40)
  | name:   readdir
  | reclen: 32
  |%
  +------------------------------------------------------------------

The mystery that remains is ...  (In my comment #0)

> The error files/dirs are different for each run.

There is certainly a trend in the files and dirs where errors occur,
but it does not mean that they occur in the same place every time.
 (One could say that's exactly the features of the memory leak bug...)
Comment 11 martin 2024-01-09 13:42:11 UTC
I can make my test case crash on i386 10.4-RELEASE-p13 if the malloc is changed to 5000 instead of 4096 but there is no guarantee because it depends on the layout of memory.

Also, it will only crash for large directories, because readdir() calls getdirentries() with a buffer of 4096 bytes and the crashing entry has to be near to the end of that buffer.  That is why I chose /usr/bin for the test.

Your test code is unlikely to crash because it doesn't get the bad layout of memory without looping and malloc.

The trend in smbd will be difficult to see because of other memory allocation.
Comment 12 Joseph Mingrone freebsd_committer freebsd_triage 2024-01-09 14:57:49 UTC
Does the latest patch at https://reviews.freebsd.org/D43171 look good to everyone?
Comment 13 martin 2024-01-09 19:05:26 UTC
Looks like it will work to me.  Style-wise I'm not sure if copying each field is good, rather than using memcpy with d_reclen bytes.
Comment 14 martin 2024-01-09 20:07:59 UTC
FTR, I've just created https://bugzilla.samba.org/show_bug.cgi?id=15554 about this issue upstream.
Comment 15 Joseph Mingrone freebsd_committer freebsd_triage 2024-01-10 18:07:42 UTC
(In reply to martin from comment #13)
Review updated.  Could a samba user test and confirm we are all set?
Comment 16 martin 2024-01-23 16:20:48 UTC
Perhaps uratan can test it?
Comment 17 commit-hook freebsd_committer freebsd_triage 2024-02-11 04:24:39 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=3fb51f85c5f397a427eca02936c935cba048a06e

commit 3fb51f85c5f397a427eca02936c935cba048a06e
Author:     Joseph Mingrone <jrm@FreeBSD.org>
AuthorDate: 2023-12-24 14:41:30 +0000
Commit:     Joseph Mingrone <jrm@FreeBSD.org>
CommitDate: 2024-02-11 04:20:00 +0000

    net/samba416: Patch to prevent abnormal smbd abort

    Update a call to memcpy() because readdir() only guarantees memory up to
    result+result->d_reclen is readable.  Under certain conditions,
    result+sizeof(struct dirent) landed in unmapped memory.

    Most of the legwork to pinpoint the problem, as well as a solution
    similar to the one applied here, was submitted by uratan@miomio.jp.
    Martin Simmons <martin@lispworks.com> contributed to understanding the
    problem and wrote a useful test case.

    PR:             275597
    Approved by:    maintainer timeout
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D43171

 net/samba416/Makefile                                     |  2 +-
 net/samba416/files/patch-source3_modules_vfs__cap.c (new) | 14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)
Comment 18 Joseph Mingrone freebsd_committer freebsd_triage 2024-02-11 04:26:07 UTC
Thanks everyone.  Please re-open if there are still problems.