Booting 15.0-CURRENT (main-n271670-bd4f2023bb05) as a Xen DomU (hvm mode) prints: ================================ # dmesg | grep xbd xbd0: 49152MB <Virtual Block Device> at device/vbd/51712ata1: reset tp1 mask=03 ostat0=50 ostat1=00 xbd0: features: flush, write_barrier xbd0: synchronize cache commands enabled. xbd1: 30720MB <Virtual Block Device> at device/vbd/51728xbd0: disk error on xenbusb_front0 xbd1: features: flush, write_barrier xbd1: synchronize cache commands enabled. ================================ But xbd0 (the first disk configured) is supposed to be 6 GB in size. The same disks from the Xen Dom0 (running a Linux distribution): ================================ # lsblk -o NAME,SIZE,FSTYPE,TYPE,LOG-SEC,PHY-SEC /dev/vg0/freebsd-disk0 /dev/vgteamlite/freebsd-disk1 NAME SIZE FSTYPE TYPE LOG-SEC PHY-SEC vg0-freebsd--disk0 6G lvm 4096 4096 vgteamlite-freebsd--disk1 30G zfs_member lvm 512 512 ================================ NB: while a "disk error" is reported for xbd1, the disk can be used in FreeBSD just fine. zpool import works and the correct size is reported. Trying to partition xbd0 (the 6 GB disk) results in all kinds of errors: ================================ # gpart create -s gpt xbd0 xbd0: disk error cmd=write 1-8 status: ffffffff gpart: Input/output error # gpart list xbd0 Geom name: xbd0 modified: true state: OK fwheads: 255 fwsectors: 63 last: 12582906 first: 6 entries: 128 scheme: GPT Consumers: 1. Name: xbd0 Mediasize: 51539607552 (48G) Sectorsize: 4096 Mode: r1w1e1 # gnop create -S 4k xbd0 GEOM_NOP: Device xbd0.nop created. xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 12582849-12582880 status: ffffffff xbd0: disk error cmd=read 12582910-12582917 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 12582910-12582917 status: ffffffff xbd0: disk error cmd=read 1-8 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 2-17 status: ffffffff xbd0: disk error cmd=read 2-17 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff # file -Ls /dev/xbd0* xbd0: disk error cmd=read 0-2039 status: ffffffff xbd0: disk error cmd=read 255-262 status: ffffffff /dev/xbd0: ERROR: cannot read `/dev/xbd0' (Ixbd0: disk error cmd=read 0-2039nput/output erro status: ffffffff xbd0: disk error cmd=read 2r) 55-262 status: ffffffff /dev/xbd0.nop: ERROR: cannot read `/dev/xbd0.nop' (Input/output error) ================================ Trying to play tricks with sector sizes doesn't work either: ================================ # gnop destroy xbd0.nop GEOM_NOP: Device xbd0.nop removed. # gnop create -S 4096 /dev/xbd0 GEOM_NOP: Device xbd0.nop created. xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 12582849-12582880 status: ffffffff xbd0: disk error cmd=read 12582910-12582917 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 12582910-12582917 status: ffffffff xbd0: disk error cmd=read 1-8 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff xbd0: disk error cmd=read 2-17 status: ffffffff xbd0: disk error cmd=read 2-17 status: ffffffff xbd0: disk error cmd=read 12582911-12582918 status: ffffffff # zpool create -o ashift=12 zroot /dev/xbd0.nop xbd0: disk error cmd=read 4-227 status: ffffffff xbd0: disk error cmd=read 68-291 status: ffffffff xbd0: disk error cmd=read 12582788-12583011 status: ffffffff xbd0: disk error cmd=read 12582852-12583075 status: ffffffff cannot create 'zroot': no such pool or dataset ================================ The same happens on FreeBSD 14.1, but switching to 15.0-CURRENT did not help.
Im currently on PTO and won't be able to look into this until the 26th. Can you paste the output of `xenstore-ls -fp` from dom0 when the FreeBSD guest is running with the 4K disk attached? Thanks.
Created attachment 252901 [details] xenstore-ls -fp when freebsd is running with 4k disk attached
Output attached, but other disks show up as well. The important disks for the FreeBSD DomU would be: # xl block-list freebsd Vdev BE handle state evt-ch ring-ref BE-path 51712 0 36 4 40 -1 /local/domain/0/backend/vbd/36/51712 51728 0 36 4 41 -1 /local/domain/0/backend/vbd/36/51728 51744 0 36 1 -1 -1 /local/domain/0/backend/qdisk/36/51744 ...with: * "vbd/36/51712" being the 4k disk (xbd0 in FreeBSD. 48 GB reported instead of 6 GB) * "vbd/36/51728" being the 512 sector disk (xbd1 in FreeBSD, 30 GB in size) * "qdisk/36/51744" the FreeBSD-15 ISO image
Full disclosure: a while ago I discussed this topic on netbsd-users[0] and although the thread appear to to be inconclusive, Manuel Bouyer was able to fix this in NetBSD. Afterwards I opened an OpenBSD bug[1] for the same issue and summarized the NetBSD story there, but nothing came of it. So, thanks for taking a stab at this, maybe these pointers are helpful! [0] https://mail-index.netbsd.org/netbsd-users/2023/07/20/msg029875.html [1] https://marc.info/?l=openbsd-bugs&m=169274922517463&w=4
Created attachment 253106 [details] Proposed fix v1 Can you please give the following patch a try? I don't have a setup with a 4K logical sector disk right now, so it's a bit hard for me to test the fix. You will need to apply the patch to CURRENT (or maybe a 14 source), rebuild the kernel (make -jX kernel) and reboot the guest. Thanks, Roger.
Great, that looks much better: ==================================================== # dmesg | grep xbd2 xbd2: 6144MB <Virtual Block Device> at device/vbd/51744 on xenbusb_front0 xbd2: features: flush, write_barrier xbd2: synchronize cache commands enabled. # gpart show xbd2 => 63 1572801 xbd2 MBR (6.0G) 63 1572801 - free - (6.0G) ==================================================== I.e. the disk is now 6 GB in size from within the FreeBSD DomU, exactly as it should be. But zpool creation is still not working, or I'm holding it wrong: ==================================================== # gpart create -s gpt xbd2 xbd2 created # gpart add -t freebsd-zfs xbd2 xbd2p1 added # zpool create foobar /dev/xbd2p1 cannot zero first 4096 bytes of '/dev/xbd2p1': Input/output error # dmesg xbd2: disk error cmd=write 6-13 status: ffffffff # gpart list xbd2 Geom name: xbd2 modified: false state: OK fwheads: 32 fwsectors: 63 last: 1572858 first: 6 entries: 128 scheme: GPT Providers: 1. Name: xbd2p1 Mediasize: 6442405888 (6.0G) Sectorsize: 4096 Stripesize: 0 Stripeoffset: 24576 Mode: r0w0e0 efimedia: HD(1,GPT,d56a1317-64dc-11ef-843e-00163eabcd00,0x6,0x17fff5) rawuuid: d56a1317-64dc-11ef-843e-00163eabcd00 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: (null) length: 6442405888 offset: 24576 type: freebsd-zfs index: 1 end: 1572858 start: 6 Consumers: 1. Name: xbd2 Mediasize: 6442450944 (6.0G) Sectorsize: 4096 Mode: r0w0e0 ====================================================
I spoke too soon; executing "zpool create" works, *sometimes*: ======================================================== # gpart create -s gpt xbd2 xbd2 created # gpart add -t freebsd-zfs xbd2 xbd2p1 added # zpool create foobar /dev/xbd2p1 cannot zero first 4096 bytes of '/dev/xbd2p1': Input/output error # zpool create foobar /dev/xbd2p1 --- ?? # zfs create foobar/test # zfs list foobar NAME USED AVAIL REFER MOUNTPOINT foobar 408K 5.33G 96K /foobar # pv -Ss 5300m /dev/random | tee /foobar/test | md5 9912421b2c9e344851b30164bbbc98a9 # md5 /foobar/test MD5 (/foobar/test) = 9912421b2c9e344851b30164bbbc98a9 # zpool scrub foobar . . . # zpool status foobar pool: foobar state: ONLINE scan: scrub repaired 0B in 00:00:11 with 0 errors on Wed Aug 28 03:51:27 2024 config: NAME STATE READ WRITE CKSUM foobar ONLINE 0 0 0 xbd2p1 ONLINE 0 0 0 errors: No known data errors ======================================================== I don't really know what to make of this, i.e. why the zpool create only works on the 2nd attempt, or not at all.
Created attachment 253146 [details] Proposed fix v2 Can you please give this updated patch a try? I think the previous patch was missing one change that was likely causing your issues with `zpool create`. Thanks, Roger.
Thanks, that looks even better, and gpart and zpool operations now seem to work every time, not just sometimes: ===================== # dmesg | grep xbd2 xbd2: 6144MB <Virtual Block Device> at device/vbd/51744 on xenbusb_front0 xbd2: features: flush, write_barrier xbd2: synchronize cache commands enabled. # gpart create -s gpt xbd2 xbd2 created # gpart add -t freebsd-zfs xbd2 xbd2p1 added # zpool create foobar /dev/xbd2p1 # zfs list NAME USED AVAIL REFER MOUNTPOINT foobar 516K 5.33G 96K /foobar # pv -Ss 512m /dev/random | tee /foobar/test | md5 1ce0e20d7e47832875d8490b7d7f7675 # zfs unmount foobar # zpool export foobar # sync # zpool import foobar # zfs list NAME USED AVAIL REFER MOUNTPOINT foobar 513M 4.83G 512M /foobar # md5 /foobar/test | grep 1ce0e20d7e47832875d8490b7d7f7675 MD5 (/foobar/test) = 1ce0e20d7e47832875d8490b7d7f7675 # zpool scrub foobar ... # zpool status foobar | grep scrub scan: scrub repaired 0B in 00:00:01 with 0 errors on Sun Sep 1 23:57:22 2024 ===================== With that it looks like this report can be closed then? Does FreeBSD have a filesystem testsuite, something like xfstests for Linux maybe? Thanks for your quick help here, this is really awesome!
The issue comes from a misinterpretation of the block specification when using 4K sector sizes. Every frontend and backend has implemented this slightly different, creating the incompatibilities that you saw. It's currently under discussion on xen-devel which components should be adjusted and how: https://lore.kernel.org/xen-devel/ZtBUnzH4sIrFAo0f@macbook.local/ The patch I've provided to you made FreeBSD blkfront match the implemention in Linux blkback, but it's still not clear we want to go that route. Thanks, Roger.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e7fe85643735ffdcf18ebef81343eaac9b8d2584 commit e7fe85643735ffdcf18ebef81343eaac9b8d2584 Author: Roger Pau Monné <royger@FreeBSD.org> AuthorDate: 2024-08-26 11:57:36 +0000 Commit: Roger Pau Monné <royger@FreeBSD.org> CommitDate: 2024-10-08 07:29:13 +0000 xen/blk{front,back}: fix usage of sector sizes different than 512b The units of the size reported in the 'sectors' xenbus node is always 512b, regardless of the value of the 'sector-size' node. The sector offsets in the ring requests are also always based on 512b sectors, regardless of the 'sector-size' reported in xenbus. Fix both blkfront and blkback to assume 512b sectors in the required fields. The blkif.h public header has been recently updated in upstream Xen repository to fix the regressions in the specification introduced by later modifications, and clarify the base units of xenstore and shared ring fields. PR: 280884 Reported by: Christian Kujau MFC after: 1 week Sponsored by: Cloud Software Group Reviewed by: markj Differential revision: https://reviews.freebsd.org/D46756 sys/dev/xen/blkback/blkback.c | 22 ++++++++++++++------- sys/dev/xen/blkfront/blkfront.c | 43 ++++++++++++++++++++++++++++++----------- 2 files changed, 47 insertions(+), 18 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=11432d8f076579adbfad6363f0440ebafc5971e5 commit 11432d8f076579adbfad6363f0440ebafc5971e5 Author: Roger Pau Monné <royger@FreeBSD.org> AuthorDate: 2024-08-26 11:57:36 +0000 Commit: Roger Pau Monné <royger@FreeBSD.org> CommitDate: 2024-10-15 08:12:19 +0000 xen/blk{front,back}: fix usage of sector sizes different than 512b The units of the size reported in the 'sectors' xenbus node is always 512b, regardless of the value of the 'sector-size' node. The sector offsets in the ring requests are also always based on 512b sectors, regardless of the 'sector-size' reported in xenbus. Fix both blkfront and blkback to assume 512b sectors in the required fields. The blkif.h public header has been recently updated in upstream Xen repository to fix the regressions in the specification introduced by later modifications, and clarify the base units of xenstore and shared ring fields. PR: 280884 Reported by: Christian Kujau MFC after: 1 week Sponsored by: Cloud Software Group Reviewed by: markj Differential revision: https://reviews.freebsd.org/D46756 (cherry picked from commit e7fe85643735ffdcf18ebef81343eaac9b8d2584) sys/dev/xen/blkback/blkback.c | 22 ++++++++++++++------- sys/dev/xen/blkfront/blkfront.c | 43 ++++++++++++++++++++++++++++++----------- 2 files changed, 47 insertions(+), 18 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9da7b206f9d673a81574d6d08ec430ab49599735 commit 9da7b206f9d673a81574d6d08ec430ab49599735 Author: Roger Pau Monné <royger@FreeBSD.org> AuthorDate: 2024-08-26 11:57:36 +0000 Commit: Roger Pau Monné <royger@FreeBSD.org> CommitDate: 2024-10-15 08:14:59 +0000 xen/blk{front,back}: fix usage of sector sizes different than 512b The units of the size reported in the 'sectors' xenbus node is always 512b, regardless of the value of the 'sector-size' node. The sector offsets in the ring requests are also always based on 512b sectors, regardless of the 'sector-size' reported in xenbus. Fix both blkfront and blkback to assume 512b sectors in the required fields. The blkif.h public header has been recently updated in upstream Xen repository to fix the regressions in the specification introduced by later modifications, and clarify the base units of xenstore and shared ring fields. PR: 280884 Reported by: Christian Kujau MFC after: 1 week Sponsored by: Cloud Software Group Reviewed by: markj Differential revision: https://reviews.freebsd.org/D46756 (cherry picked from commit e7fe85643735ffdcf18ebef81343eaac9b8d2584) sys/dev/xen/blkback/blkback.c | 22 ++++++++++++++------- sys/dev/xen/blkfront/blkfront.c | 43 ++++++++++++++++++++++++++++++----------- 2 files changed, 47 insertions(+), 18 deletions(-)
I believe the underlying bug is resolved now, please re-open if I missed something.