Bug 245715 - net/samba413: INTERNAL ERROR: Signal 11: Segmentation fault in pid xxxxx; problem with vfs_dirsort
Summary: net/samba413: INTERNAL ERROR: Signal 11: Segmentation fault in pid xxxxx; pro...
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Timur I. Bakeyev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-18 04:09 UTC by Joshua Kinard
Modified: 2021-01-19 16:54 UTC (History)
3 users (show)

See Also:
bugzilla: maintainer-feedback? (timur)


Attachments
Samba config file for net/samba413 (2.62 KB, text/plain)
2020-11-02 01:51 UTC, Joshua Kinard
no flags Details
Crash log snippet from net/samba413 (4.30 KB, text/plain)
2020-11-02 01:55 UTC, Joshua Kinard
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Kinard 2020-04-18 04:09:26 UTC
net/samba410 built from ports with most features turned off (just AESNI, SYSLOG, UTMP, GSSAPI_BUILTIN, and ZEROCONF_NONE) will periodically crash with a SIGSEGV.  The trigger seems to be if I store a couple of icon (*.ico) files on the exported samba share and then try to use those icons in a shortcut object (*.lnk), and then try to trigger Windows to refresh its icon cache (by executing "C:\Windows\System32\ie4uinit.exe -show").  Furthermore, these shortcut objects have to be stored in a "quick launch" toolbar folder that I have added to my Windows taskbar (right-click -> Toolbars -> New Toolbar -> ...).

Under that setup, by re-running the icon cache refresh command, one or more of the shortcut objects will lose their icons and default to the blank document icon stred in "C:\Windows\System32\shell32.dll".  When that happens, if I then try to manually reset the icon by reselecting it, I may, or may not, get an error stating that Windows can no longer find the icon on the remote samba share path.  Sometimes, when re-selecting the icon file in the file dialog, Windows will then claim that it cannot read the icon file.

When one of these two events happens, it MAY indicate that the smbd process on the FreeBSD server has SIGSEGV'ed.  But not always.  You kinda have to play around with it to trigger it.  It seems running the icon cache refresh command several times, with a small pause between each run, has the best chance of triggering it.

When it does happen, this is the output I get (hostname scrubbed):

Apr 17 21:59:12 foo smbd[97026]: [2020/04/17 21:59:12.993900,  0] ../../lib/util/become_daemon.c:136(daemon_ready)
Apr 17 21:59:12 foo smbd[97026]:   daemon_ready: daemon 'smbd' finished starting up and ready to serve connections
Apr 17 22:00:37 foo smbd[164]: [2020/04/17 22:00:37.766807,  0] ../../lib/util/fault.c:79(fault_report)
Apr 17 22:00:37 foo smbd[164]:   ===============================================================
Apr 17 22:00:37 foo smbd[164]: [2020/04/17 22:00:37.767238,  0] ../../lib/util/fault.c:80(fault_report)
Apr 17 22:00:37 foo smbd[164]:   INTERNAL ERROR: Signal 11 in pid 164 (4.10.14)
Apr 17 22:00:37 foo smbd[164]:   If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting
Apr 17 22:00:37 foo smbd[164]: [2020/04/17 22:00:37.767282,  0] ../../lib/util/fault.c:86(fault_report)
Apr 17 22:00:37 foo smbd[164]:   ===============================================================
Apr 17 22:00:37 foo smbd[164]: [2020/04/17 22:00:37.767304,  0] ../../source3/lib/util.c:824(smb_panic_s3)
Apr 17 22:00:37 foo smbd[164]:   PANIC (pid 164): internal error
Apr 17 22:00:37 foo smbd[164]: [2020/04/17 22:00:37.767983,  0] ../../lib/util/fault.c:265(log_stack_trace)
Apr 17 22:00:37 foo smbd[164]:   BACKTRACE: 6 stack frames:
Apr 17 22:00:37 foo smbd[164]:    #0 0x36b07d6eaa77 <log_stack_trace+0x37> at /usr/local/lib/samba4/libsamba-util.so.0
Apr 17 22:00:37 foo smbd[164]:    #1 0x36b085ee512d <smb_panic_s3+0x4d> at /usr/local/lib/samba4/libsmbconf.so.0
Apr 17 22:00:37 foo smbd[164]:    #2 0x36b07d6ea867 <smb_panic+0x17> at /usr/local/lib/samba4/libsamba-util.so.0
Apr 17 22:00:37 foo smbd[164]:    #3 0x36b07d6eac4e <log_stack_trace+0x20e> at /usr/local/lib/samba4/libsamba-util.so.0
Apr 17 22:00:37 foo smbd[164]:    #4 0x36b07d6ea849 <fault_setup+0x59> at /usr/local/lib/samba4/libsamba-util.so.0
Apr 17 22:00:37 foo smbd[164]:    #5 0x36b0b6a593c0 <_pthread_sigmask+0x530> at /lib/libthr.so.3
Apr 17 22:00:37 foo smbd[164]: [2020/04/17 22:00:37.768103,  0] ../../source3/lib/dumpcore.c:310(dump_core)
Apr 17 22:00:37 foo smbd[164]:   unable to change to %N.core
Apr 17 22:00:37 foo smbd[164]:   refusing to dump core

It also seems that I can trigger this crash more often under the newly-minted net/samba411 port, but I did not extensively test that before falling back to net/samba410.

I understand if that is somewhat convoluted, and it WILL require a Windows system to actually try and test for this, but I don't have much else to offer.  Browsing the share normally from Windows Explorer works fine and icons cache and load fine from virtually any other place in Windows EXCEPT from a shortcut object (*.lnk) in a toolbar added to the taskbar.  I don't know if there is something different about how shortcuts in a toolbar folder added to the taskbar behave differently when fetching their icon resources or what.

I've tried turning the Samba logging level up to 7, but that did not add any additional context to the above crash log.  I also tried attach gdb to the parent process, but the crashing process is some child process that is spawned only when doing file operations, and I am not sure how to trace that.
Comment 1 Joshua Kinard 2020-11-02 01:45:06 UTC
Adding an update here and changing the bug title, because I can reproduce a similar error (or the same one, not sure) in net/samba413 (4.13.0) as well as Samba-4.12.8.  And I believe I have reproduction steps that are easier to simulate than what I originally described.  I have further isolated the problem to what I believe is an issue in Samba's vfs_dirsort module.

First, does anyone know how I can rebuild the Samba port with full debugging symbols turned on?  Current backtrace dumps from the crashing processes are missing debug information.  Getting that turned on and reproducing the crash would probably help a lot here.

Steps to reproduce:

Requirements:
  - Windows 10-based machine
  - 7-Zip 20.00 alpha x64
  - Snort.org subscriber ruleset tar.gz archive (snortrules-snapshot-xxxxx.tar.gz)
  - Samba share on ZFS
  - 'dirsort' loaded in "vfs objects"

Steps:
  - On the Windows machine, map a drive letter to the Samba share
  - Using 7-zip, open the snortrules-snapshot-xxxxx.tar.gz archive
  - Drag and drop the "rules" subfolder directly into the share
  - Attempt to double-click on the folder
  - Crash should be triggered (may have to repeat or F5 a bunch of times)

As far as I can tell, some content in one of the Snort rule files may be a trigger, but I cannot work out if it's a specific file, combination of files, file sizes, number of files, etc.  I tried stripping off any and all acls via 'setfacl -b *', and removed execute permissions via 'chmod -x *', but that has had no effect.  I tried moving files elsewhere and refreshing the folder, but still couldn't isolate it to a specific file or files.

After I gave up on that, I started removing vfs modules, starting with the zfsacl and freebsd ones, but that didn't work.  After removing dirsort, however, I have been unable to reproduce the crash via the new method described here.  I can reproduce the icon issue, but apparently without the crash described in my original comment.  So likely, the crash issue is tied to something in the dirsort module on FreeBSD platforms, while the icon issue is something else.  Fixing the crash is more important, so changing the bug to match that is best.

I also suspect that the use of 7-zip to unpack the Snort rules archive is important.  I tried unpacking that archive directly on the NAS machine's shell using tar zfx, and was unable to get it to crash via Samba afterward.  When you drag and drop files from an archive to their destination, 7-zip will first extract to %TMPDIR% on the Windows host, then move the files to the final destination.  Something in this process apparently sets up conditions that enable the crash to happen, but I cannot determine what these conditions are.
Comment 2 Joshua Kinard 2020-11-02 01:51:20 UTC
Created attachment 219294 [details]
Samba config file for net/samba413

"hosts allow" and "workgroup" have been redacted.  Everything else is as-is.
Comment 3 Joshua Kinard 2020-11-02 01:55:28 UTC
Created attachment 219295 [details]
Crash log snippet from net/samba413

One specific set of crash dump output when the problem is triggered.  Multiple such dumps get generated when the error is reproduced, eventually flooding 'dmesg' out.  And "foo" is the redacted name for the machine.  Unlike the net/samba410 crash, the backtrace info is missing the symbol names, so it's impossible to work out exactly what happened.