Bug 269883 - net/samba416: macOS Time Machine backups broken after contrib/tzcode update in 13.2
Summary: net/samba416: macOS Time Machine backups broken after contrib/tzcode update i...
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Timur I. Bakeyev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-28 20:34 UTC by Dimitry Andric
Modified: 2023-04-18 07:16 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (timur)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dimitry Andric freebsd_committer freebsd_triage 2023-02-28 20:34:49 UTC
TL;DR: the "fruit:zero_file_id" setting should be "yes" by default and we should apply upstream Samba patches for this, otherwise existing Time Machine backups over SMB can get messed up.

Long story:

I recently encounted problems with Apple's Time Machine backing up to a FreeBSD server with samba416-4.16.8 installed. These problems started after I upgraded the base system from 13.1-STABLE (of ~3 months ago) to 13.2-STABLE (as of a few days ago), and then rebuilding all my ports, including the samba416-4.16.8 package.

The problems initially showed as hanging or aborting Time Machine backups, and if you would attempt a "verify backups" action, it would run fsck_apfs on the disk image mounted over SMB, which then showed errors similar to:

/dev/disk5s1: fsck_apfs started at Mon Feb 27 00:20:36 2023
/dev/disk5s1: ** Checking the container superblock.
/dev/disk5s1:    Checking the checkpoint with transaction ID 2199286.
/dev/disk5s1: ** Checking the space manager.
/dev/disk5s1: ** Checking the space manager free queue trees.
/dev/disk5s1: ** Checking the object map.
/dev/disk5s1: ** Checking volume /dev/rdisk5s1.
/dev/disk5s1: ** Checking the APFS volume superblock.
/dev/disk5s1:    The volume Backups of mac was formatted by newfs_apfs (1677.81.1) and last modified by apfs_kext (2142.81.1).
/dev/disk5s1: ** Checking the object map.
/dev/disk5s1: warning: (oid 0x2126b29c) om: btn: invalid o_cksum (0x1700608e55352f42)
/dev/disk5s1:    Object map is invalid.
/dev/disk5s1: ** The volume /dev/rdisk5s1 was found to be corrupt and cannot be repaired.
/dev/disk5s1: ** Verifying allocated space.
/dev/disk5s1: ** The volume /dev/disk5s1 could not be verified completely.
/dev/disk5s1: fsck_apfs completed at Mon Feb 27 00:20:44 2023

Even if you would rollback the share (with zfs rollback) to a known-good state, i.e. which had successfully verified OK in the past, it would *still* get fsck_apfs errors like above.

However, when I reinstalled the samba416-4.16.8 package from the old poudriere packages directory, which had been built with 13.1-STABLE, it all worked fine again, and full fsck_apfs runs were completely OK!

So what was the cause for the difference, even if the port versions were exactly the same? It turned out to be quite a deep rabbit hole!

After a *lot* of experimentation, swapping back .so files from "good" and "bad" packages, and even going so far as to swap .o files from "good" and "bad" builds and re-linking them, I found that the culprit was in libsamba-util.so.0, specifically the lib/util/time.c.26.o file.

This file got compiled differently on 13.2-STABLE than on 13.1-STABLE:

On 13.2-STABLE, the TIME_T_MAX define would have been set by the configure script, to the value 67768036191676799ll.

On 13.1-STABLE, the TIME_T_MAX define would *not* have been set by the configure script, and time.h would then define it as:

  #define TIME_T_MAX MIN(INT32_MAX,_TYPE_MAXIMUM(time_t))

which effectively becomes INT32_MAX, i.e. 0x7fffffff.

This was also visible in one the changed lines in the build logs (I had logs from both the poudriere run with 13.1 world, and with 13.2 world):

1606c1606
< Checking for the maximum value of the 'time_t' type                                             : not found 
---
> Checking for the maximum value of the 'time_t' type                                             : ok 

So on 13.1 it could not find the maximum value, while on 13.2 it could. The reason for this is a recent contrib/tzcode update, <https://cgit.freebsd.org/src/commit/?id=93cc70bf9ca7>, which now makes gmtime(0x7fffffffffffffffll) fail, whereas it succeeded before. Samba uses this check in its configure script.

It turns out that Samba uses this TIME_T_MAX value in all kinds of places, but most importantly (in some cases) it used to generate SMB file IDs! If TIME_T_MAX is a different value, some files might get completely different file ID numbers, and apparently this greatly confuses the Apple SMB client.

The code deriving file IDs from timestamps was added for Samba bug 14928 in <https://git.samba.org/?p=samba.git;a=commitdiff;h=23fbf0bad03>, around the Samba 4.15.4 release.

But later, after talking to Apple people, they have ripped out this whole thing again, in <https://git.samba.org/?p=samba.git;a=commitdiff;h=643da37fd13>:

    smbd: remove itime and file_id logic and code

    This bases File-Ids on the inode numbers again. The whole stuff was
    added because at that time Apple clients

    1. would be upset by inode number reusage and

    2. had a client side bug in their fallback implemetentation that
    assigns File-Ids on the client side in case the server provides
    File-Ids of 0.

    After discussion with folks at Apple it should be safe these days to
    rely on the Mac to generate its own File-Ids and let Samba return 0
    File-Ids.

and its follow-up, <https://git.samba.org/?p=samba.git;a=commitdiff;h=24f4bea5b8e>:

    vfs_fruit: change default for "fruit:zero_file_id" option to yes

    After discussion with folks at Apple it should be safe these days to rely on the
    Mac to generate its own File-Ids and let Samba return 0 File-Ids.

    Signed-off-by: Ralph Boehme <slow@samba.org>
    Reviewed-by: Jeremy Allison <jra@samba.org>

For now, if anybody encounters this bug with Apple's Time Machine, you should work around it by setting "fruit:zero_file_id = yes" in your smb4.conf, either in the [global] section, or in the specific shares for Time Machine (i.e. those with "fruit:time machine = yes").

But it would be nice if we could import the two above Samba commits:

  <https://git.samba.org/?p=samba.git;a=commit;h=643da37fd13>
  <https://git.samba.org/?p=samba.git;a=commit;h=24f4bea5b8e>

because that seems a lot safer. At the least the last one, which is trivial because it only sets the default for "fruit:zero_file_id" to "yes".
Comment 1 thomas 2023-04-18 07:16:14 UTC
Thank you for investigating this. I can confirm that the suggested workaround works for me. My two Time Machine clients started doing backups again. So far I cannot see any unintended effects.