Bug 259093

Summary: Panic: NULL pointer dereference on "zfs send --raw" of encrypted filesystem
Product: Base System Reporter: Peter Jeremy <peterj>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: Open ---    
Severity: Affects Only Me CC: grahamperrin, pi, sgubdsbeerf
Priority: --- Keywords: crash, needs-qa
Version: 13.0-STABLE   
Hardware: Any   
OS: Any   
See Also: https://github.com/openzfs/zfs/pull/12438
https://github.com/openzfs/zfs/issues/12275#event-6888665864

Description Peter Jeremy freebsd_committer freebsd_triage 2021-10-12 08:23:36 UTC
I'm running 13-stable fdbbd118faab but the code is identical in HEAD.

Looking at the backtrace:
#16 <signal handler called>
#17 dmu_dump_write (dscp=dscp@entry=0xfffffe02501abc30, type=<optimized out>,
    object=<optimized out>, offset=<optimized out>, offset@entry=0,
    lsize=<optimized out>, lsize@entry=131072, psize=psize@entry=131072,
    bp=0x0, data=0xfffffe02d94a6000)
    at /usr/src/sys/contrib/openzfs/module/zfs/dmu_send.c:493
#18 0xffffffff80410a3c in do_dump (dscp=dscp@entry=0xfffffe02501abc30,
    range=range@entry=0xfffff805fd82d900)
    at /usr/src/sys/contrib/openzfs/module/zfs/dmu_send.c:1016
#19 0xffffffff8040ead3 in dmu_send_impl (dspp=<optimized out>,
    dspp@entry=0xfffffe02501abdf0)
    at /usr/src/sys/contrib/openzfs/module/zfs/dmu_send.c:2537
#20 0xffffffff8040d8fd in dmu_send_obj (pool=<optimized out>,
    pool@entry=0xfffffe02d3b61000 "tank/compat@20210604bu", tosnap=10690,
    fromsnap=11065, embedok=<optimized out>, embedok@entry=1,
    large_block_ok=<optimized out>, large_block_ok@entry=2,
    compressok=<optimized out>, compressok@entry=4, rawok=8, savedok=0,
    outfd=1, off=0xfffffe02501ac070, dsop=0xfffffe02501ac058)
    at /usr/src/sys/contrib/openzfs/module/zfs/dmu_send.c:2695

dmu_send.c:493 is "ASSERT(!BP_IS_EMBEDDED(bp));" which dereferences bp
with no checks for NULL, whereas dmu_send.c:1016 explicitly passes NULL
to dmu_dump_write() as bp.  This is obviously a bug somewhere.

Looking at the comment at lines 1006-1008, it seems the code expects
that raw sends will always have large block sends enabled, avoiding the
problematic code block.  And zfs-send(8) says that --raw implies
--large-block if the source is not encrypted.  But even if I explicitly
specify --large-block then the code panics in the same way.  (And
--large-block as on option doesn't actually make sense with --raw
because the send stream must match what's on local disk by definition).
Comment 1 Peter Jeremy freebsd_committer freebsd_triage 2021-10-15 11:40:50 UTC
I've modified my kernel to return an error, instead of panicing, and done some investigating:
* The problem affects 3 (out of 81) filesystems I have.
* Those 3 filesystems don't have any unusual configuration.
* The encrypted pool reports no errors on a scrub.
* I still have the original pool (from before I did the encryption) and I can do a "zfs send" of those filesystems from that pool without error.
* I created a third pool, with encryption enabled, and copied/encrypted those 3 filesystems from the original pool to the third pool.  Doing a send from that pool fails in the same way.
* The error reports that it can't send an intermediate snapshot within the filesystem (the snapshots are different for each filesystem).  If I delete the snapshot that reports the error then the error moves to the next most recent snapshot.
* Creating the third pool with a different encryption key has no effect (unsurprising but I thought I'd check).

I'm doing a scrub of the original pool but would be surprised if it reports any errors.

At this point, it looks like there's something in the original filesystems that doesn't cause any issue with an unencrypted filesystem but breaks once that filesystem is encrypted.

If anyone wants to investigate, I'm happy to share send streams from 2 of the filesystems - with xz, one shrinks to 39.5MB and the other to 35.8MB.
Comment 2 Peter Jeremy freebsd_committer freebsd_triage 2021-10-17 04:18:40 UTC
I've had a rummage around OpenZFS and this is https://github.com/openzfs/zfs/issues/12275, which is, unfortunately, still open without a fix.
Comment 3 Graham Perrin freebsd_committer freebsd_triage 2022-11-01 23:35:41 UTC
Is this reproducible with recent 13.1-STABLE?

(In reply to Peter Jeremy from comment #2)

Closed 2022-06-27 (PR <https://github.com/openzfs/zfs/pull/12438> merged), subsequent commits.