280884 – wrong media size for 4k disks

Bug 280884 - wrong media size for 4k disks

Summary: wrong media size for 4k disks

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	15.0-CURRENT
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	Roger Pau Monné

URL:
Keywords:

Depends on:
Blocks:

Reported:	2024-08-17 14:57 UTC by Christian Kujau
Modified:	2024-10-17 22:08 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
xenstore-ls -fp when freebsd is running with 4k disk attached (74.03 KB, text/plain) 2024-08-18 22:38 UTC, Christian Kujau	no flags	Details
Proposed fix v1 (5.16 KB, patch) 2024-08-26 16:14 UTC, Roger Pau Monné	no flags	Details \| Diff
Proposed fix v2 (7.34 KB, patch) 2024-08-28 13:41 UTC, Roger Pau Monné	no flags	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Christian Kujau 2024-08-17 14:57:31 UTC

Booting 15.0-CURRENT (main-n271670-bd4f2023bb05) as a Xen DomU (hvm mode) prints:

================================
# dmesg | grep xbd
xbd0: 49152MB <Virtual Block Device> at device/vbd/51712ata1: reset tp1 mask=03 ostat0=50 ostat1=00
xbd0: features: flush, write_barrier
xbd0: synchronize cache commands enabled.

xbd1: 30720MB <Virtual Block Device> at device/vbd/51728xbd0: disk error  on xenbusb_front0
xbd1: features: flush, write_barrier
xbd1: synchronize cache commands enabled.
================================

But xbd0 (the first disk configured) is supposed to be 6 GB in size. The same disks from the Xen Dom0 (running a Linux distribution):

================================
# lsblk -o NAME,SIZE,FSTYPE,TYPE,LOG-SEC,PHY-SEC /dev/vg0/freebsd-disk0 /dev/vgteamlite/freebsd-disk1
NAME                      SIZE FSTYPE     TYPE LOG-SEC PHY-SEC
vg0-freebsd--disk0          6G            lvm     4096    4096
vgteamlite-freebsd--disk1  30G zfs_member lvm      512     512
================================

NB: while a "disk error" is reported for xbd1, the disk can be used in FreeBSD just fine. zpool import works and the correct size is reported.

Trying to partition xbd0 (the 6 GB disk) results in all kinds of errors:

================================
# gpart create -s gpt xbd0
xbd0: disk error cmd=write 1-8 status: ffffffff
gpart: Input/output error

# gpart list xbd0
Geom name: xbd0
modified: true
state: OK
fwheads: 255
fwsectors: 63
last: 12582906
first: 6
entries: 128
scheme: GPT
Consumers:
1. Name: xbd0
   Mediasize: 51539607552 (48G)
   Sectorsize: 4096
   Mode: r1w1e1

# gnop create -S 4k xbd0
GEOM_NOP: Device xbd0.nop created.
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 12582849-12582880 status: ffffffff
xbd0: disk error cmd=read 12582910-12582917 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 12582910-12582917 status: ffffffff
xbd0: disk error cmd=read 1-8 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 2-17 status: ffffffff
xbd0: disk error cmd=read 2-17 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff

# file -Ls /dev/xbd0*
xbd0: disk error cmd=read 0-2039 status: ffffffff
xbd0: disk error cmd=read 255-262 status: ffffffff
/dev/xbd0:     ERROR: cannot read `/dev/xbd0' (Ixbd0: disk error cmd=read 0-2039nput/output erro status: ffffffff
xbd0: disk error cmd=read 2r)
55-262 status: ffffffff
/dev/xbd0.nop: ERROR: cannot read `/dev/xbd0.nop' (Input/output error)
================================


Trying to play tricks with sector sizes doesn't work either:

================================
# gnop destroy xbd0.nop
GEOM_NOP: Device xbd0.nop removed.

# gnop create -S 4096 /dev/xbd0
GEOM_NOP: Device xbd0.nop created.
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 12582849-12582880 status: ffffffff
xbd0: disk error cmd=read 12582910-12582917 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 12582910-12582917 status: ffffffff
xbd0: disk error cmd=read 1-8 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff
xbd0: disk error cmd=read 2-17 status: ffffffff
xbd0: disk error cmd=read 2-17 status: ffffffff
xbd0: disk error cmd=read 12582911-12582918 status: ffffffff

# zpool create -o ashift=12 zroot /dev/xbd0.nop
xbd0: disk error cmd=read 4-227 status: ffffffff
xbd0: disk error cmd=read 68-291 status: ffffffff
xbd0: disk error cmd=read 12582788-12583011 status: ffffffff
xbd0: disk error cmd=read 12582852-12583075 status: ffffffff
cannot create 'zroot': no such pool or dataset
================================

The same happens on FreeBSD 14.1, but switching to 15.0-CURRENT did not help.

Comment 1 Roger Pau Monné freebsd_committer

2024-08-18 08:53:06 UTC

Im currently on PTO and won't be able to look into this until the 26th.  Can you paste the output of `xenstore-ls -fp` from dom0 when the FreeBSD guest is running with the 4K disk attached?

Thanks.

Comment 2 Christian Kujau 2024-08-18 22:38:51 UTC

Created attachment 252901 [details]
xenstore-ls -fp when freebsd is running with 4k disk attached

Comment 3 Christian Kujau 2024-08-18 22:44:48 UTC

Output attached, but other disks show up as well. The important disks for the FreeBSD DomU would be:

# xl block-list freebsd
Vdev  BE  handle state evt-ch ring-ref BE-path
51712 0   36     4     40     -1       /local/domain/0/backend/vbd/36/51712
51728 0   36     4     41     -1       /local/domain/0/backend/vbd/36/51728
51744 0   36     1     -1     -1       /local/domain/0/backend/qdisk/36/51744

...with:

* "vbd/36/51712" being the 4k disk (xbd0 in FreeBSD. 48 GB reported instead of 6 GB)
* "vbd/36/51728" being the 512 sector disk (xbd1 in FreeBSD, 30 GB in size)
* "qdisk/36/51744" the FreeBSD-15 ISO image

Comment 4 Christian Kujau 2024-08-18 22:51:11 UTC

Full disclosure: a while ago I discussed this topic on netbsd-users[0] and although the thread appear to to be inconclusive, Manuel Bouyer was able to fix this in NetBSD. Afterwards I opened an OpenBSD bug[1] for the same issue and summarized the NetBSD story there, but nothing came of it.

So, thanks for taking a stab at this, maybe these pointers are helpful!

[0] https://mail-index.netbsd.org/netbsd-users/2023/07/20/msg029875.html
[1] https://marc.info/?l=openbsd-bugs&m=169274922517463&w=4

Comment 5 Roger Pau Monné freebsd_committer

2024-08-26 16:14:26 UTC

Created attachment 253106 [details]
Proposed fix v1

Can you please give the following patch a try?  I don't have a setup with a 4K logical sector disk right now, so it's a bit hard for me to test the fix.

You will need to apply the patch to CURRENT (or maybe a 14 source), rebuild the kernel (make -jX kernel) and reboot the guest.

Thanks, Roger.

Comment 6 Christian Kujau 2024-08-28 01:37:26 UTC

Great, that looks much better:

====================================================
# dmesg | grep xbd2                        
xbd2: 6144MB <Virtual Block Device> at device/vbd/51744 on xenbusb_front0
xbd2: features: flush, write_barrier
xbd2: synchronize cache commands enabled.

# gpart show xbd2                     
=>     63  1572801  xbd2  MBR  (6.0G)
       63  1572801        - free -  (6.0G)
====================================================


I.e. the disk is now 6 GB in size from within the FreeBSD DomU, exactly as it should be. But zpool creation is still not working, or I'm holding it wrong:


====================================================
# gpart create -s gpt xbd2       
xbd2 created

# gpart add -t freebsd-zfs xbd2
xbd2p1 added

# zpool create foobar /dev/xbd2p1
cannot zero first 4096 bytes of '/dev/xbd2p1': Input/output error

# dmesg                          
xbd2: disk error cmd=write 6-13 status: ffffffff

# gpart list xbd2
Geom name: xbd2
modified: false
state: OK
fwheads: 32
fwsectors: 63
last: 1572858
first: 6
entries: 128
scheme: GPT
Providers:
1. Name: xbd2p1
   Mediasize: 6442405888 (6.0G)
   Sectorsize: 4096
   Stripesize: 0
   Stripeoffset: 24576
   Mode: r0w0e0
   efimedia: HD(1,GPT,d56a1317-64dc-11ef-843e-00163eabcd00,0x6,0x17fff5)
   rawuuid: d56a1317-64dc-11ef-843e-00163eabcd00
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 6442405888
   offset: 24576
   type: freebsd-zfs
   index: 1
   end: 1572858
   start: 6
Consumers:
1. Name: xbd2
   Mediasize: 6442450944 (6.0G)
   Sectorsize: 4096
   Mode: r0w0e0
====================================================

Comment 7 Christian Kujau 2024-08-28 01:56:21 UTC

I spoke too soon; executing "zpool create" works, *sometimes*:

========================================================
# gpart create -s gpt xbd2  
xbd2 created

# gpart add -t freebsd-zfs xbd2
xbd2p1 added

# zpool create foobar /dev/xbd2p1 
cannot zero first 4096 bytes of '/dev/xbd2p1': Input/output error

# zpool create foobar /dev/xbd2p1

 --- ??

# zfs create foobar/test
# zfs list foobar
NAME     USED  AVAIL  REFER  MOUNTPOINT
foobar   408K  5.33G    96K  /foobar

# pv -Ss 5300m /dev/random | tee /foobar/test | md5
9912421b2c9e344851b30164bbbc98a9

# md5 /foobar/test 
MD5 (/foobar/test) = 9912421b2c9e344851b30164bbbc98a9

# zpool scrub foobar
.
.
.
# zpool status foobar
  pool: foobar
 state: ONLINE
  scan: scrub repaired 0B in 00:00:11 with 0 errors on Wed Aug 28 03:51:27 2024
config:

	NAME        STATE     READ WRITE CKSUM
	foobar      ONLINE       0     0     0
	  xbd2p1    ONLINE       0     0     0

errors: No known data errors
========================================================

I don't really know what to make of this, i.e. why the zpool create only works on the 2nd attempt, or not at all.

Comment 8 Roger Pau Monné freebsd_committer

2024-08-28 13:41:29 UTC

Created attachment 253146 [details]
Proposed fix v2

Can you please give this updated patch a try?  I think the previous patch was missing one change that was likely causing your issues with `zpool create`.

Thanks, Roger.

Comment 9 Christian Kujau 2024-09-01 22:01:06 UTC

Thanks, that looks even better, and gpart and zpool operations now seem to work every time, not just sometimes:

=====================
# dmesg | grep xbd2
xbd2: 6144MB <Virtual Block Device> at device/vbd/51744 on xenbusb_front0
xbd2: features: flush, write_barrier
xbd2: synchronize cache commands enabled.

# gpart create -s gpt xbd2
xbd2 created

# gpart add -t freebsd-zfs xbd2
xbd2p1 added

# zpool create foobar /dev/xbd2p1
# zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
foobar        516K  5.33G    96K  /foobar

# pv -Ss 512m /dev/random | tee /foobar/test | md5
1ce0e20d7e47832875d8490b7d7f7675

# zfs unmount foobar 
# zpool export foobar 
# sync
# zpool import foobar
# zfs list
NAME         USED  AVAIL  REFER  MOUNTPOINT
foobar       513M  4.83G   512M  /foobar


# md5 /foobar/test | grep 1ce0e20d7e47832875d8490b7d7f7675
MD5 (/foobar/test) = 1ce0e20d7e47832875d8490b7d7f7675
# zpool scrub foobar
...

# zpool status foobar | grep scrub
  scan: scrub repaired 0B in 00:00:01 with 0 errors on Sun Sep  1 23:57:22 2024
=====================

With that it looks like this report can be closed then? Does FreeBSD have a filesystem testsuite, something like xfstests for Linux maybe?

Thanks for your quick help here, this is really awesome!

Comment 10 Roger Pau Monné freebsd_committer

2024-09-02 07:34:37 UTC

The issue comes from a misinterpretation of the block specification when using 4K sector sizes.  Every frontend and backend has implemented this slightly different, creating the incompatibilities that you saw.  It's currently under discussion on xen-devel which components should be adjusted and how:

https://lore.kernel.org/xen-devel/ZtBUnzH4sIrFAo0f@macbook.local/

The patch I've provided to you made FreeBSD blkfront match the implemention in Linux blkback, but it's still not clear we want to go that route.

Thanks, Roger.

Comment 11 commit-hook freebsd_committer

2024-10-08 07:30:27 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e7fe85643735ffdcf18ebef81343eaac9b8d2584

commit e7fe85643735ffdcf18ebef81343eaac9b8d2584
Author:     Roger Pau Monné <royger@FreeBSD.org>
AuthorDate: 2024-08-26 11:57:36 +0000
Commit:     Roger Pau Monné <royger@FreeBSD.org>
CommitDate: 2024-10-08 07:29:13 +0000

    xen/blk{front,back}: fix usage of sector sizes different than 512b

    The units of the size reported in the 'sectors' xenbus node is always 512b,
    regardless of the value of the 'sector-size' node.  The sector offsets in
    the ring requests are also always based on 512b sectors, regardless of the
    'sector-size' reported in xenbus.

    Fix both blkfront and blkback to assume 512b sectors in the required fields.

    The blkif.h public header has been recently updated in upstream Xen repository
    to fix the regressions in the specification introduced by later modifications,
    and clarify the base units of xenstore and shared ring fields.

    PR: 280884
    Reported by: Christian Kujau
    MFC after: 1 week
    Sponsored by: Cloud Software Group
    Reviewed by: markj
    Differential revision: https://reviews.freebsd.org/D46756

 sys/dev/xen/blkback/blkback.c   | 22 ++++++++++++++-------
 sys/dev/xen/blkfront/blkfront.c | 43 ++++++++++++++++++++++++++++++-----------
 2 files changed, 47 insertions(+), 18 deletions(-)

Comment 12 commit-hook freebsd_committer

2024-10-15 08:26:00 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=11432d8f076579adbfad6363f0440ebafc5971e5

commit 11432d8f076579adbfad6363f0440ebafc5971e5
Author:     Roger Pau Monné <royger@FreeBSD.org>
AuthorDate: 2024-08-26 11:57:36 +0000
Commit:     Roger Pau Monné <royger@FreeBSD.org>
CommitDate: 2024-10-15 08:12:19 +0000

    xen/blk{front,back}: fix usage of sector sizes different than 512b

    The units of the size reported in the 'sectors' xenbus node is always 512b,
    regardless of the value of the 'sector-size' node.  The sector offsets in
    the ring requests are also always based on 512b sectors, regardless of the
    'sector-size' reported in xenbus.

    Fix both blkfront and blkback to assume 512b sectors in the required fields.

    The blkif.h public header has been recently updated in upstream Xen repository
    to fix the regressions in the specification introduced by later modifications,
    and clarify the base units of xenstore and shared ring fields.

    PR: 280884
    Reported by: Christian Kujau
    MFC after: 1 week
    Sponsored by: Cloud Software Group
    Reviewed by: markj
    Differential revision: https://reviews.freebsd.org/D46756

    (cherry picked from commit e7fe85643735ffdcf18ebef81343eaac9b8d2584)

 sys/dev/xen/blkback/blkback.c   | 22 ++++++++++++++-------
 sys/dev/xen/blkfront/blkfront.c | 43 ++++++++++++++++++++++++++++++-----------
 2 files changed, 47 insertions(+), 18 deletions(-)

Comment 13 commit-hook freebsd_committer

2024-10-15 08:27:01 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9da7b206f9d673a81574d6d08ec430ab49599735

commit 9da7b206f9d673a81574d6d08ec430ab49599735
Author:     Roger Pau Monné <royger@FreeBSD.org>
AuthorDate: 2024-08-26 11:57:36 +0000
Commit:     Roger Pau Monné <royger@FreeBSD.org>
CommitDate: 2024-10-15 08:14:59 +0000

    xen/blk{front,back}: fix usage of sector sizes different than 512b

    The units of the size reported in the 'sectors' xenbus node is always 512b,
    regardless of the value of the 'sector-size' node.  The sector offsets in
    the ring requests are also always based on 512b sectors, regardless of the
    'sector-size' reported in xenbus.

    Fix both blkfront and blkback to assume 512b sectors in the required fields.

    The blkif.h public header has been recently updated in upstream Xen repository
    to fix the regressions in the specification introduced by later modifications,
    and clarify the base units of xenstore and shared ring fields.

    PR: 280884
    Reported by: Christian Kujau
    MFC after: 1 week
    Sponsored by: Cloud Software Group
    Reviewed by: markj
    Differential revision: https://reviews.freebsd.org/D46756

    (cherry picked from commit e7fe85643735ffdcf18ebef81343eaac9b8d2584)

 sys/dev/xen/blkback/blkback.c   | 22 ++++++++++++++-------
 sys/dev/xen/blkfront/blkfront.c | 43 ++++++++++++++++++++++++++++++-----------
 2 files changed, 47 insertions(+), 18 deletions(-)

Comment 14 Mark Johnston freebsd_committer

2024-10-17 22:08:50 UTC

I believe the underlying bug is resolved now, please re-open if I missed something.