Bug 250906 - net/samba419: "samba-tool domain backup offline" hangs
Summary: net/samba419: "samba-tool domain backup offline" hangs
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: FreeBSD Samba Team
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-06 16:36 UTC by ml
Modified: 2025-01-26 17:26 UTC (History)
3 users (show)

See Also:
bugzilla: maintainer-feedback? (timur)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ml 2020-11-06 16:36:51 UTC
#samba-tool domain backup offline --targetdir .
running backup on dirs: /var/db/samba4/private /var/db/samba4 /usr/local/etc
Starting transaction on /var/db/samba4/private/secrets
(...)

What is really hanged is a subprocess that samba-tool starts:
/usr/local/bin/tdbbackup -s .copy.tdb /var/db/samba4/private/secrets.ldb 



This is a  long standing issue since Samba 4.10 (which introduced this command).
Now I tried upgrading to 4.12, but nothing changed.

A discussion on Samba's mailing list suggested this might be caused by an older version of TDB and that that library should be bundled.

Building (in Poudriere) with SAMBA4_BUNDLED_TDB=yes, however will produce the following:
# samba-tool domain backup offline --targetdir .
running backup on dirs: /var/db/samba4/private /var/db/samba4 /usr/local/etc
Starting transaction on /var/db/samba4/private/secrets
ERROR(<class 'FileNotFoundError'>): uncaught exception - [Errno 2] No such file or directory: '/root/bin/tdbbackup': '/root/bin/tdbbackup'
  File "/usr/local/lib/python3.7/site-packages/samba/netcmd/__init__.py", line 186, in _run
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/samba/netcmd/domain_backup.py", line 1061, in run
    self.backup_secrets(paths.private_dir, lp, logger)
  File "/usr/local/lib/python3.7/site-packages/samba/netcmd/domain_backup.py", line 954, in backup_secrets
    self.offline_tdb_copy(secrets_path + '.ldb')
  File "/usr/local/lib/python3.7/site-packages/samba/netcmd/domain_backup.py", line 928, in offline_tdb_copy
    tdb_copy(path, backup_path, readonly=True)
  File "/usr/local/lib/python3.7/site-packages/samba/tdb_util.py", line 40, in tdb_copy
    status = subprocess.check_call(tdbbackup_cmd, close_fds=True, shell=False)
  File "/usr/local/lib/python3.7/subprocess.py", line 358, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/local/lib/python3.7/subprocess.py", line 339, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
A transaction is still active in ldb context [0x800a3cae0] on /var/db/samba4/private/secrets.ldb
Comment 1 Rene Ladan freebsd_committer freebsd_triage 2022-12-18 12:37:31 UTC
samba412 expired and got removed, can you retry this with a later version?
Comment 2 ml 2022-12-18 15:03:30 UTC
(In reply to Rene Ladan from comment #1)

The problem persists with 4.16.7.
Comment 3 Michael Osipov freebsd_committer freebsd_triage 2025-01-10 16:53:14 UTC
Can someone retry?
Comment 4 ml 2025-01-12 11:57:20 UTC
(In reply to Michael Osipov from comment #3)
Same hangs as before, with Samba 4.19, compiled with SAMBA4_BUNDLED_LDB=yes, but still not bundling TDB.
Notice this is still on 2024Q4; I'll have packages for 2025Q1 in a few days.
I might then try bundling TDB too.
Comment 5 Michael Osipov freebsd_committer freebsd_triage 2025-01-12 18:44:25 UTC
(In reply to ml from comment #4)

Thanks for the confirmation, updating title.
Comment 6 ml 2025-01-21 09:49:20 UTC
(In reply to ml from comment #4)

I tried Samba from 2025Q1, with and without bundled TDB: it still hangs.
Comment 7 Michael Osipov freebsd_committer freebsd_triage 2025-01-21 09:51:28 UTC
(In reply to ml from comment #6)
Can you try to trace again and see which process exactly hangs?
Comment 8 ml 2025-01-22 18:24:57 UTC
(In reply to Michael Osipov from comment #7)

The process that hangs is still "/usr/local/bin/tdbbackup -s .copy.tdb /var/db/samba4/private/secrets.ldb -r" and it's stuck in fcntl.

I tried "truss"ing it, but it obviously gives not output.
If needed, I can compile tdb with debug info and try to attach gdb to it.
Anything else that might be useful?
Comment 9 Michael Osipov freebsd_committer freebsd_triage 2025-01-22 18:41:25 UTC
(In reply to ml from comment #8)
Better than nothing. We should at least see what is passed to fcntl...
Comment 10 ml 2025-01-26 17:26:39 UTC
I think I found out what the problem is.

As I said, the process that locks is "/usr/local/bin/tdbbackup -s .copy.tdb /var/db/samba4/private/secrets.ldb -r".
Looking at its main(), it uses getopt to interpret all the options; however getopt stops at "/var/db/samba4/private/secrets.ldb", so "-r" is never considered.

Since "-r" is needed to open the databases in read-only mode, the process tries to open them read/write and locks up since they are used by the Samba processes.

The correct way to call tdbbackup would be either
"/usr/local/bin/tdbbackup -s .copy.tdb -r /var/db/samba4/private/secrets.ldb"
or
"/usr/local/bin/tdbbackup -r -s .copy.tdb /var/db/samba4/private/secrets.ldb"




So the problem lies in /usr/local/lib/python3.11/site-packages/samba/tdb_util.py, where we find:
    tdbbackup_cmd = [toolpath, "-s", ".copy.tdb", file1]
    if readonly:
        tdbbackup_cmd.append("-r")

This should be changed to something like:
    if readonly:
        tdbbackup_cmd = [toolpath, "-r", "-s", ".copy.tdb", file1]
    else:
        tdbbackup_cmd = [toolpath, "-s", ".copy.tdb", file1]

With this change, "samba-tool domain backup offline" goes on and possibly succeed.



Now, before I provide a patch, I only got one doubt...
We don't change neither tdb_util.py, nor tdbbackup.c in our ports, so, unless our getopt works differently than Linux's, this is not a FreeBSD problem and should be reported upstream, instead.