Bug 204661 - zdb isn't able to examine root dataset of a pool
Summary: zdb isn't able to examine root dataset of a pool
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 10.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: Andriy Gapon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-18 16:50 UTC by Christopher Forgeron
Modified: 2016-08-23 07:47 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christopher Forgeron 2015-11-18 16:50:31 UTC
Hi,

 This is an issue on FreeBSD, but not Solaris. 

 It works as I expect on a Solaris 11.3-BETA machine. 

 A file in the root of a zfs dataset doesn't seem to take the set recordsize, yet if I create a sub-dataset, it does.

 Perhaps I am not using zdb correctly to examine the file in the root, as it's displaying like it's a DSL Directory, not a ZFS Plain File, but if that's the case, then zdb's switches are different for FreeBSD than from Solaris. 

 This behaviour has existed for a while, for all the 10.x's I'm quite sure, and still happens today on a 10.2-p7 RELEASE machine. 

 Example:

# zpool create pool92_1 da1 da11 da9 da12 da13 da14
# zfs set recordsize=64k pool92_1
# zfs get recordsize pool92_1
NAME      PROPERTY    VALUE    SOURCE
pool92_1  recordsize  64K      local

# cd /pool92_1
# dd if=/dev/random of=./test_file bs=1M count=12
12+0 records in
12+0 records out
12582912 bytes transferred in 0.325594 secs (38645997 bytes/sec)

# ls -i
8 test_file

# zdb -dd pool92_1 8  
Dataset mos [META], ID 0, cr_txg 4, 144K, 45 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         8    1    16K    512      0    512    0.00  DSL directory

# zfs get recordsize pool92_1
NAME      PROPERTY    VALUE    SOURCE
pool92_1  recordsize  64K      local

# zfs create pool92_1/folder
# zfs get recordsize pool92_1/folder
NAME             PROPERTY    VALUE    SOURCE
pool92_1/folder  recordsize  64K      inherited from pool92_1


# cd folder
# dd if=/dev/random of=./test_file bs=1M count=12
12+0 records in
12+0 records out
12582912 bytes transferred in 0.384501 secs (32725305 bytes/sec)

# ls -i
8 test_file
# zdb -dd pool92_1/folder 8
Dataset pool92_1/folder [ZPL], ID 49, cr_txg 44, 12.1M, 8 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         8    3    16K    64K  12.0M  12.0M  100.00  ZFS plain file


The full dumps if you're interested are:

# zdb -dddd pool92_1 8
Dataset mos [META], ID 0, cr_txg 4, 126K, 52 objects, rootbp DVA[0]=<2:446000:1000> DVA[1]=<3:443800:200> DVA[2]=<4:408000:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique triple size=800L/200P birth=60L/60P fill=52 cksum=be53af6aa:46998543bc9:d8c388544dcd:1cb577de44fcab

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         8    1    16K    512      0    512    0.00  DSL directory
                                        256   bonus  DSL directory
        dnode flags: 
        dnode maxblkid: 0
                creation_time = Wed Nov 18 11:59:47 2015
                head_dataset_obj = 0
                parent_dir_obj = 2
                origin_obj = 0
                child_dir_zapobj = 10
                used_bytes = 0
                compressed_bytes = 0
                uncompressed_bytes = 0
                quota = 0
                reserved = 0
                props_zapobj = 9
                deleg_zapobj = 0
                flags = 1
                used_breakdown[HEAD] = 0
                used_breakdown[SNAP] = 0
                used_breakdown[CHILD] = 0
                used_breakdown[CHILD_RSRV] = 0
                used_breakdown[REFRSRV] = 0

# zdb -dddd pool92_1/folder 8
Dataset pool92_1/folder [ZPL], ID 49, cr_txg 44, 12.1M, 8 objects, rootbp DVA[0]=<2:443000:1000> DVA[1]=<3:442800:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=60L/60P fill=8 cksum=b55f2d725:46568e61db8:e07315731014:1eacd8bf5cb44a

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         8    3    16K    64K  12.0M  12.0M  100.00  ZFS plain file
                                        168   bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED 
        dnode maxblkid: 191
        path    /test_file
        uid     0
        gid     0
        atime   Wed Nov 18 12:03:10 2015
        mtime   Wed Nov 18 12:03:10 2015
        ctime   Wed Nov 18 12:03:10 2015
        crtime  Wed Nov 18 12:03:10 2015
        gen     47
        mode    100644
        size    12582912
        parent  4
        links   1
        pflags  40800000004


On a Solaris 11.3-BETA machine, here's what I see:

root@solaris175:~# zpool create pool175 c2t1d0
root@solaris175:~# zfs set recordsize=64k pool175
root@solaris175:/pool175# dd if=/dev/random of=./test_file bs=1073741824 count=120
root@solaris175:/pool175# ls -all
total 267
drwxr-xr-x   2 root     root           3 Nov 18 16:39 .
drwxr-xr-x  27 root     sys           30 Nov 18 16:37 ..
-rw-r--r--   1 root     root      124800 Nov 18 16:39 test_file

root@solaris175:/pool175# ls -i
        10 test_file

root@solaris175:/pool175# zdb -dd pool175 10
Dataset pool175 [ZPL], ID 18, cr_txg 1, 161K, 8 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
        10    2    16K    64K   130K   128K  100.00  ZFS plain file


Yes, there are two differences here (I couldn't use bs=1M for dd, and I only used one disk) between the FreeBSD and Solaris machines, but I feel it's still valid enough to illustrate my point.

I do believe something more than just the reporting is off, as workloads that benefit from recordsize=64k are slower in the root dataset than in a sub-dataset.
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2015-11-18 19:17:36 UTC
zdb in solaris seems to have diverged from the historical behaviour.
zdb -d $pool $N means dumping information on an object with ID $N in the MOS (meta object set) of a pool $pool.
It seems that currently it is impossible to dump an object in the root dataset of the pool.  The same issue exists on illumos as well.
I've seen a proposal for zdb -d $pool/ $N to do what you want (note the trailing slash), but I can't find it right now.
Comment 2 Christopher Forgeron 2015-11-18 19:24:31 UTC
Ah yes, I tried 'zdb -d $pool/ $N' when my first attempt didn't work, so it does seem a natural way to find the info.

I would suggest the most error-free way would be to follow Solaris' lead on this, as my command syntax shouldn't be different when I'm in the root dataset or not.

Of course this is probably a non-trivial update. 

Any recommendations for an alternate way to watch IO size for a file? a dtrace script that targets an inode perhaps?
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2015-11-18 21:13:25 UTC
(In reply to Christopher Forgeron from comment #2)
Well, on the other hand the way of examining the MOS would have to change then.
We are not following Oracle ZFS since it became closed source (again).
We are following OpenZFS, so we might be incompatible with the other ZFS.

recordsize should be honoured for all datasets.  I am not aware of any bug in that area.
Comment 4 commit-hook freebsd_committer freebsd_triage 2016-07-20 11:15:49 UTC
A commit references this bug:

Author: avg
Date: Wed Jul 20 11:15:33 UTC 2016
New revision: 303086
URL: https://svnweb.freebsd.org/changeset/base/303086

Log:
  MFV r303083: 7164 zdb should be able to open the root dataset

  Note: conversion of the manual page change from roff to mdoc is mine.

  illumos/illumos-gate@b702644a6eb66615d67b492fd73ecd9efa11fc7d
  https://github.com/illumos/illumos-gate/commit/b702644a6eb66615d67b492fd73ecd9efa11fc7d

  https://www.illumos.org/issues/7164
    If the pool/dataset command-line argument is specified with a trailing
    slash, for example, "tank/", we should interpret it as the topmost
    dataset (rather than the whole pool)

  Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
  Reviewed by: Matthew Ahrens <mahrens@delphix.com>
  Approved by: Robert Mustacchi <rm@joyent.com>
  Author: Tim Chase <tim@chase2k.com>
  PR:		204661
  MFC after:	1 week
  Relnotes:	yes

Changes:
_U  head/cddl/contrib/opensolaris/cmd/zdb/
  head/cddl/contrib/opensolaris/cmd/zdb/zdb.8
  head/cddl/contrib/opensolaris/cmd/zdb/zdb.c