Bug 267654 - UFS "cylinder checksum failed" on temporary storage or data disk on arm64 vm in Azure
Summary: UFS "cylinder checksum failed" on temporary storage or data disk on arm64 vm ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: Li-Wen Hsu
URL:
Keywords:
: 272719 (view as bug list)
Depends on:
Blocks: 14.0r
  Show dependency treegraph
 
Reported: 2022-11-08 23:27 UTC by Li-Wen Hsu
Modified: 2023-11-17 22:12 UTC (History)
12 users (show)

See Also:


Attachments
adding printf to newfs(1) (485 bytes, patch)
2023-05-24 22:51 UTC, Li-Wen Hsu
no flags Details | Diff
Disk alignment test (1.88 KB, text/plain)
2023-08-14 11:57 UTC, Andrew Turner
no flags Details
vmbus bus_get_dma_tag change (3.84 KB, patch)
2023-08-18 07:37 UTC, schakrabarti@microsoft.com
no flags Details | Diff
patch to libufs for Azure (670 bytes, patch)
2023-08-20 05:04 UTC, Kirk McKusick
no flags Details | Diff
revised proposed fix (1.28 KB, patch)
2023-09-02 06:19 UTC, Kirk McKusick
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Li-Wen Hsu freebsd_committer freebsd_triage 2022-11-08 23:27:07 UTC

    
Comment 1 Li-Wen Hsu freebsd_committer freebsd_triage 2022-11-08 23:34:46 UTC
dmesg: https://gist.github.com/lwhsu/31b70e6a7f4e8e35707093a1b570f88c

Logs of error like:

UFS /dev/da1p1 (/mnt/resource) cylinder checksum failed: cg 0, cgp: 0x9b739572 != bp: 0xfff99985
UFS /dev/da1p1 (/mnt/resource) cylinder checksum failed: cg 1, cgp: 0x2a6a014 != bp: 0x3e3c2145
UFS /dev/da1p1 (/mnt/resource) cylinder checksum failed: cg 3, cgp: 0x55b3a281 != bp: 0x692923d0

The mounted UFS is not writeable:

lwhsu@arm64test:/mnt/resource $ sudo touch a
/mnt/resource: create/symlink failed, no inodes free
touch: a: No space left on device

Have tried following scenarios:
- Using UFS on "OS disk" (da0): no problem
- Using ZFS on da1: no problem
- Using UFS on md(4) using a file on OS disk: no problem
- Adding another data disk (da2): error
Comment 2 Li-Wen Hsu freebsd_committer freebsd_triage 2022-11-08 23:38:49 UTC
Souradeep and Wei, can you help to check if you can reproduce this in hyper-v in local arm64 machine?  Is there any difference between OS disk and Data disk?

Thanks!
Comment 3 Li-Wen Hsu freebsd_committer freebsd_triage 2022-11-15 17:27:15 UTC
Kirk, is there anything suggested for us to dig more into this problem? Thanks!
Comment 4 schakrabarti@microsoft.com 2022-11-15 18:05:08 UTC
(In reply to Li-Wen Hsu from comment #2)
Wei had faced similar problem during bring up. This is from the mail :

We are trying to bring up FreeBSD on ARM64 Hyper-V.  We are seeing the 
> iso image booted up fine and we can install the root disk with ZFS.
> However, when choosing to install the root disk with UFS, we are 
> seeing UFS checksum errors as below and the installation just failed.
>
> ...
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 41, cgp: 0x45721f90 
> !=
> bp: 0x23b0cc8
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 42, cgp: 0xbb1ba737 
> !=
> bp: 0xfc52b46f
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 43, cgp: 0x12671d05 
> !=
> bp: 0x552e0e5d
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 44, cgp: 0x4224a088 
> !=
> bp: 0x56db3d0
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 45, cgp: 0xea211974 
> !=
> bp: 0x913217c9
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 0, cgp: 0xae78ec43 
> !=
> bp: 0x3467ffcc
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 1, cgp: 0x7fc7e750 
> !=
> bp: 0x388ef408
> UFS /dev/da0p2 (/mnt) cylinder checksum failed: cg 2, cgp: 0x81ae5ff7 
> !=
> bp: 0xc6e74caf
>
> /mnt: create/symlink failed, no inodes free
Comment 5 Li-Wen Hsu freebsd_committer freebsd_triage 2022-11-15 18:08:46 UTC
(In reply to schakrabarti@microsoft.com from comment #4)
Thanks, please note that currently UFS works fine as the OS disk (da0), is there anything different in the OS disk and data disks? And did you do any fix for the UFS on OS disk during bring up?
Comment 6 Wei Hu 2022-11-16 01:44:53 UTC
(In reply to Li-Wen Hsu from comment #5)
We never touched UFS during the entire effort of ARM64 bring-up on Hyper-V. I hit this problem when trying to install the OS from iso image. Not sure if this problem exists on OS disk or not as I couldn't complete installing the OS if choosing ufs as root filesystem.
Comment 7 Kirk McKusick freebsd_committer freebsd_triage 2022-11-18 00:37:18 UTC
Is it possible to run fsck_ffs on the filesystem? If so, it will fix the bad checksums and you should then be good to go.

Can you run tunefs on the filesystem? If so, I could easily add an option to disable cylinder group checksums.
Comment 8 Li-Wen Hsu freebsd_committer freebsd_triage 2022-11-18 01:38:47 UTC
(In reply to Kirk McKusick from comment #7)
> Is it possible to run fsck_ffs on the filesystem? If so, it will fix the bad checksums and you should then be good to go.

Thanks! Running fsck_ffs does fix the issue. I attach the log at https://gist.github.com/lwhsu/76e78a680df5f8107039c1a293ec2e42

> Can you run tunefs on the filesystem? If so, I could easily add an option to disable cylinder group checksums.

Yes we can run tunefs on this filesystem. What's more, all this disk is partitioned and formatted after boot. (refer to the above log) It's weird that this filesystem is newly created by gpart and newfs but doesn't function like da0 (OS disk) and mapped md(4) devices.

Is it safe to just disable (checking?) cylinder group checksums?  It seems still weird that we need users to manually turn off cylinder group checksums every time after running newfs.
Comment 9 Kirk McKusick freebsd_committer freebsd_triage 2022-11-18 05:58:51 UTC
Newfs is generating the check hashes using the libufs cgwrite(3) routine. The cgwrite(3) routine calls calculate_crc32c() which comes from sys/libkern/gsb_crc32.c. The kernel check also uses calculate_crc32c() from sys/libkern/gsb_crc32.c so they should be getting the same answer. If the structures were somehow laid out differently that might explain it, but then a lot of other things would also break. I also note that the same calculate_crc32c() function is used for the superblock checksum and the inode checksum which are both working fine which makes it even more of a mystery why the cylinder group checksums are broken.

Another option is to disable cylinder group checksums in newfs(8) using something like this patch:

diff --git a/sbin/newfs/mkfs.c b/sbin/newfs/mkfs.c
index 48091d7882d0..5000adff138f 100644
--- a/sbin/newfs/mkfs.c
+++ b/sbin/newfs/mkfs.c
@@ -499,8 +499,10 @@ mkfs(struct partition *pp, char *fsys)
         */
        if (Oflag > 1) {
                sblock.fs_flags |= FS_METACKHASH;
+#ifndef AZURE
                if (getosreldate() >= P_OSREL_CK_CYLGRP)
                        sblock.fs_metackhash |= CK_CYLGRP;
+#endif
                if (getosreldate() >= P_OSREL_CK_SUPERBLOCK)
                        sblock.fs_metackhash |= CK_SUPERBLOCK;
                if (getosreldate() >= P_OSREL_CK_INODE)
Comment 10 Li-Wen Hsu freebsd_committer freebsd_triage 2022-11-19 08:47:07 UTC
I tried to newfs a partition which is directly on da0 (OS disk), it works and doesn't have any checksum error.  I think this means that there should be something different in os disk (da0) and data disks (da1, 2, ...).

I guess what newfs wants to write and actually on the disk may be different, is there any easier way to debug this? Or it needs to modify newfs to print out the data before writing?
Comment 11 Kirk McKusick freebsd_committer freebsd_triage 2022-11-20 17:24:46 UTC
I do not have any better idea than the debugging idea that you are suggesting. Of interest is verifying that the seek offset is correct and that the checksum of what is written and what ends up on the disk match. Note that the checksum is being calculated and the block written in the libufs routine lib/libufs/cgroup.c. See functions cgwrite(), cgwrite1(), and where the work is actually done cgput().
Comment 12 Gleb Popov freebsd_committer freebsd_triage 2022-12-20 17:02:52 UTC
I bumped into this problem on my RockPro64 with SSD disk attached via an adapter. The symptoms are all the same:

# OK
newfs /dev/nda0
# OK
mount /dev/nda0 /mnt
# spits a ton of "cylinder checksum failed" messages
echo asdasd > /mnt/asd

If I do instead

newfs /dev/nda0
# fixes a ton of cylinder groups
fsck_ffs -y /dev/nda0
mount /dev/nda0 /mnt
# OK
echo asdasd > /mnt/asd
Comment 13 Gleb Popov freebsd_committer freebsd_triage 2022-12-20 17:05:55 UTC
(In reply to Gleb Popov from comment #12)
Forgot to mention, I'm running 13.1-RELEASE.
Comment 14 Kirk McKusick freebsd_committer freebsd_triage 2023-02-27 19:16:05 UTC
Is this still an unresolved issue? If so, I am not sure how to reproduce it which I need to do to try and find a fix for it.
Comment 15 Gleb Popov freebsd_committer freebsd_triage 2023-02-28 04:26:21 UTC
(In reply to Kirk McKusick from comment #14)
Can check again if this is the case for me as RockPro is currenly being used in production.
Comment 16 Wei Hu 2023-02-28 05:07:23 UTC
(In reply to Kirk McKusick from comment #14)
We are still seeing this on ufs data disk on ARM64 Hyper-V guests in Azure. Wonder if Li-Wen (FreeBSD Foundation) can provide you access to a VM in Azure for further debugging.
Comment 17 Li-Wen Hsu freebsd_committer freebsd_triage 2023-05-24 22:51:42 UTC
Created attachment 242385 [details]
adding printf to newfs(1)
Comment 18 Kyle Evans freebsd_committer freebsd_triage 2023-07-24 16:54:46 UTC
I pointed this bug out to Andrew Gierth and he pointed out that it sounds a bit like an alignment issue; the varying printfs you were experimenting with at could have certainly tweaked the alignment of different buffers.

Slapping d_sbunion and d_cgunion in libufs' struct uufsd with an `__attribute__ ((aligned(512)))` seems to be sufficient to alleviate the bug, though that's not really a good long-term solution. Presumably there's something related/hinky in storvsc or vmbus.
Comment 19 Andrew "RhodiumToad" Gierth 2023-07-24 17:00:55 UTC
(In reply to Kyle Evans from comment #18)

There are some test programs in bug #271766 and the commits that fix it that could perhaps be adapted to help narrow down the failure without needing to involve filesystem code.
Comment 20 schakrabarti@microsoft.com 2023-07-25 20:05:32 UTC
*** Bug 272719 has been marked as a duplicate of this bug. ***
Comment 21 Mark Millard 2023-07-25 22:41:26 UTC
None of the comments give enough context to replicate the
same partition start and size or other such to try newfs
with for replication attempts.

Comment #12 indicates a replication without needing Azure
involved. But, again, without enough detail to make an
approximate simulation of the failure context.

It would be nice to have details from a known example that
one could try to use to get a replication, in my context
avoiding requiring Azure be involved but still being a
reasonable replication of various other aspects.

Details from Azure contexts that would not require Azure
to be a reasonable basis for testing could well serve.
Comment 22 Andrew Turner freebsd_committer freebsd_triage 2023-08-07 14:50:56 UTC
On a local arm64 Hyper-V instance on an Windows Dev Kit 2023 I tried the test program from bug #271766. I see a difference with a stride size of 32, but not when it's 64.

I will often, bot not always receive similar data to the following with a stride of 32:

% sudo ./a.out /dev/da1 32
Discrepancies found with page offset 32:
  found at 1fe5:  1f ff ff
  expected 1fe5:  00 00 00
  found at 1fea:  02
  expected 1fea:  00
  found at 1fed:  03 80
  expected 1fed:  00 00
Discrepancies found with page offset 96:
  found at 1fa5:  1f ff ff
  expected 1fa5:  00 00 00
  found at 1faa:  02
  expected 1faa:  00
  found at 1fad:  03 80
  expected 1fad:  00 00
Discrepancies found with page offset 160:
  found at 1f65:  1f ff ff
  expected 1f65:  00 00 00
  found at 1f6a:  02
  expected 1f6a:  00
  found at 1f6d:  03 80
  expected 1f6d:  00 00
Discrepancies found with page offset 224:
  found at 1f25:  1f ff ff
  expected 1f25:  00 00 00
  found at 1f2a:  02
  expected 1f2a:  00
  found at 1f2d:  03 80
  expected 1f2d:  00 00
...
Comment 23 Andrew "RhodiumToad" Gierth 2023-08-07 19:27:31 UTC
(In reply to Andrew Turner from comment #22)

That test program isn't ideal in this case (it could be adapted to be far more comprehensive), but those results do show that (a) corruption is indeed occuring, in this case on reads, and (b) those corruptions are consistently happening just after hitting a page boundary.
Comment 24 Andrew Turner freebsd_committer freebsd_triage 2023-08-07 19:57:00 UTC
I think this is due to how the Hyper-V storvsc driver is using bus_dma.

It looks like it needs physically contiguous memory which may not be the case when bus_dma needs to create a bounce buffer. This can happen when the destination is not cache-line aligned.

A workaround to pass the BUS_DMA_COHERENT flag bus_dma_tag_create, however this should only be done when we know the bus is cache-coherent. It would be better to move this to the vmbus driver and implement bus_get_dma_tag. Under ACPI we can check the _CCA value to set this flag, however due to how the vmbus driver attaches we don't have the acpi handle to get this.
Comment 25 schakrabarti@microsoft.com 2023-08-08 11:08:35 UTC
(In reply to Andrew Turner from comment #24)
I have tested with the suggested change:
diff --git a/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c b/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
index dff2ee709bed..7c908495c6c3 100644
--- a/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
+++ b/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
@@ -949,9 +949,14 @@ storvsc_init_requests(device_t dev)
        struct storvsc_softc *sc = device_get_softc(dev);
        struct hv_storvsc_request *reqp;
        int error, i;
+       unsigned int dma_flag = 0;

        LIST_INIT(&sc->hs_free_list);

+#if defined(__aarch64__)
+       dmag_flag |= BUS_DMA_COHERENT;
+#endif
+
        error = bus_dma_tag_create(
                bus_get_dma_tag(dev),           /* parent */
                1,                              /* alignment */
@@ -962,7 +967,7 @@ storvsc_init_requests(device_t dev)
                STORVSC_DATA_SIZE_MAX,          /* maxsize */
                STORVSC_DATA_SEGCNT_MAX,        /* nsegments */
                STORVSC_DATA_SEGSZ_MAX,         /* maxsegsize */
-               0,                              /* flags */
+               dma_flag,                       /* flags */
                NULL,                           /* lockfunc */
                NULL,                           /* lockfuncarg */
                &sc->storvsc_req_dtag);

But it has not solved the issue:

FS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 811, cgp: 0x4a9c4e5 != bp: 0xb51bda9f
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 812, cgp: 0x54ea7968 != bp: 0xe5586712
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 813, cgp: 0xfd96c35a != bp: 0x4c24dd20
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 814, cgp: 0x3ff7bfd != bp: 0xb24d6587
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 815, cgp: 0xaa83c1cf != bp: 0x1b31dfb5
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 816, cgp: 0x1008f9ad != bp: 0xa1bae7d7
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 817, cgp: 0xb974439f != bp: 0x8c65de5
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 818, cgp: 0x97f71033 != bp: 0x8c2b9d5e

/datadrive: create/symlink failed, no inodes free

[schakrabarti@freebsd-arm-schakrabarti /datadrive]$ Aug  8 09:18:18 freebsd-arm-schakrabarti kernel: pid 961 (touch), uid 1001 inumber 2 on /datadrive: out of inodes
^C
[schakrabarti@freebsd-arm-schakrabarti /datadrive]$ uname -a
FreeBSD freebsd-arm-schakrabarti 14.0-CURRENT FreeBSD 14.0-CURRENT aarch64 1400093 #0 main-n264325-65f8467e3351-dirty: Tue Aug  8 08:43:39 UTC 2023     root@freebsd-arm-schakrabarti:/usr/obj/home/schakrabarti/sandbox/arm64.aarch64/sys/GENERIC arm64
Comment 26 schakrabarti@microsoft.com 2023-08-08 11:13:25 UTC
(In reply to schakrabarti@microsoft.com from comment #25)
I will implement b vmbus_get_dma_tag() and will share the patch
Comment 27 Andrew "RhodiumToad" Gierth 2023-08-08 12:42:25 UTC
(In reply to Andrew Turner from comment #24)

I'm tempted to submit a patch to dd(1) to give it options to use intentionally-misaligned buffers, so that we have a tool for more convenient testing of this stuff. (Between this bug and the crypto one, obviously more testing is needed.)

Any interest?
Comment 28 schakrabarti@microsoft.com 2023-08-09 11:25:55 UTC
(In reply to Andrew Turner from comment #24)
Hi Andrew

Hyper-V is already having _CCA in acpi .

Device (\_SB.VMOD.VMBS)
    {
        Name (STA, 0x0F)
        Name (_ADR, Zero)  // _ADR: Address
        Name (_CCA, One)  // _CCA: Cache Coherency Attribute
        Name (_DDN, "VMBUS")  // _DDN: DOS Device Name
        Name (_HID, "VMBus")  // _HID: Hardware ID
        Name (_UID, Zero)  // _UID: Unique ID
        Method (_DIS, 0, NotSerialized)  // _DIS: Disable Device
        {
            STA &= 0x0D
        }

So, for testing purpose set the storvsc bus_dma_tag as BUS_DMA_COHERENT, but still
the problem is happening.

diff --git a/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c b/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
index dff2ee709bed..7c908495c6c3 100644
--- a/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
+++ b/sys/dev/hyperv/storvsc/hv_storvsc_drv_freebsd.c
@@ -949,9 +949,14 @@ storvsc_init_requests(device_t dev)
        struct storvsc_softc *sc = device_get_softc(dev);
        struct hv_storvsc_request *reqp;
        int error, i;
+       unsigned int dma_flag = 0;

        LIST_INIT(&sc->hs_free_list);

+#if defined(__aarch64__)
+       dmag_flag |= BUS_DMA_COHERENT;
+#endif
+
        error = bus_dma_tag_create(
                bus_get_dma_tag(dev),           /* parent */
                1,                              /* alignment */
@@ -962,7 +967,7 @@ storvsc_init_requests(device_t dev)
                STORVSC_DATA_SIZE_MAX,          /* maxsize */
                STORVSC_DATA_SEGCNT_MAX,        /* nsegments */
                STORVSC_DATA_SEGSZ_MAX,         /* maxsegsize */
-               0,                              /* flags */
+               dma_flag,                       /* flags */
                NULL,                           /* lockfunc */
                NULL,                           /* lockfuncarg */
                &sc->storvsc_req_dtag);

Can you help me here to understand why bus_get_dma_tag() in vmbus is required?
Even if setting the flat in stovsc is not solving the checksum issue.

Also if we need to implement it, where should we get the parameter for bus_dma_tag_create() in vmbus ?
Comment 29 Mark Millard 2023-08-12 18:42:09 UTC
My 8 GiByte RPi4B v1.5 boot logs show:

Aug 11 13:37:06 generic kernel: bcm2835_cpufreq0: <CPU Frequency Control> on cpu0
Aug 11 13:37:06 generic kernel: bcm2835_cpufreq0: Unable to find firmware device
Aug 11 13:37:06 generic kernel: device_attach: bcm2835_cpufreq0 attach returned 6

That is this code that has not changed:

static int
bcm2835_cpufreq_attach(device_t dev)
{
        struct bcm2835_cpufreq_softc *sc;
        struct sysctl_oid *oid;
 
        /* set self dev */
        sc = device_get_softc(dev);
        sc->dev = dev;
        sc->firmware = devclass_get_device(
            devclass_find("bcm2835_firmware"), 0);
        if (sc->firmware == NULL) {
                device_printf(dev, "Unable to find firmware device\n");
                return (ENXIO);
        }
. . .

This does look to me like devclass_find is internal FreeBSD
infrastructure about its own internals, not RPi* specific.
If so, the problem seems likely more general than RPi*'s.
Comment 30 Mark Millard 2023-08-12 18:43:25 UTC
(In reply to Mark Millard from comment #29)

IGNORE: wrong bugzilla entry. Sorry for the noise. No ability to delete.
Comment 31 Andrew Turner freebsd_committer freebsd_triage 2023-08-14 11:57:07 UTC
Created attachment 244092 [details]
Disk alignment test
Comment 32 Andrew Turner freebsd_committer freebsd_triage 2023-08-14 12:10:03 UTC
bus_get_dma_tag is needed as the parent bus may have restrictions on the DMA of a child, e.g. to limit the usable memory.

To know if the BUS_DMA_COHERENT flag can be used in an acpi attachment you can read _CCA. There is an example of reading it in sys/dev/pci/pci_host_generic_acpi.c
Comment 33 schakrabarti@microsoft.com 2023-08-18 07:34:33 UTC
(In reply to Andrew Turner from comment #32)
I have done the change and tested with the test code and no bad alignment in both boot disk data disk, but still the ufs checksum failure is happening.

# ./a.out /dev/da1 bad

# ./a.out /dev/da bad
/dev/da0    /dev/da0p1  /dev/da0p2  /dev/da1    /dev/da1p1
# ./a.out /dev/da bad
/dev/da0    /dev/da0p1  /dev/da0p2  /dev/da1    /dev/da1p1
# ./a.out /dev/da0 bad

#


# geom disk list
Geom name: da0
Providers:
1. Name: da0
   Mediasize: 32246857728 (30G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r2w2e6
   descr: Msft Virtual Disk
   lunname: 4d534654202020208bc8faba34d8cb468624ca4d553eaac1
   lunid: 600224808bc8faba34d8ca4d553eaac1
   ident: (null)
   rotationrate: unknown
   fwsectors: 63
   fwheads: 255

Geom name: da1
Providers:
1. Name: da1
   Mediasize: 536870912000 (500G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   descr: Msft Virtual Disk
   lunname: 4d534654202020201c9c312fabf3254f8826b1ed5d388e7a
   lunid: 600224801c9c312fabf3b1ed5d388e7a
   ident: (null)
   rotationrate: unknown
   fwsectors: 63
   fwheads: 255
Comment 34 schakrabarti@microsoft.com 2023-08-18 07:37:43 UTC
Created attachment 244186 [details]
vmbus bus_get_dma_tag change
Comment 35 Kirk McKusick freebsd_committer freebsd_triage 2023-08-20 05:04:21 UTC
Created attachment 244220 [details]
patch to libufs for Azure

As suggested in Comment #18 this patch to the structure that libufs uses to hold disk blocks works around a bug in Azure where it is unable to correctly read disk blocks into buffers that are not aligned to 512-byte boundaries.
Comment 36 Kirk McKusick freebsd_committer freebsd_triage 2023-08-20 05:08:13 UTC
(In reply to schakrabarti@microsoft.com from comment #33)
See my comment #35 to work around a bug in Azure. Note that you must make this patch and then run `make buildworld' so that libufs and all the programs that use it get recompiled with the aligned disk buffers.

Ideally the problem should be fixed in Azure as the POSIX standard requires disk I/O to be possible in unaligned buffers.
Comment 37 schakrabarti@microsoft.com 2023-08-21 07:28:51 UTC
(In reply to Kirk McKusick from comment #36)
I have tried the patch mentioned in comment #35 but the problem persists.
Also this problem is only seen in arm64 not in x86.
Comment 38 schakrabarti@microsoft.com 2023-08-21 07:36:21 UTC
While trying to do touch in da1p1.
After bus_get_dma_tag patch and libufs patch also the error is happeing:
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 816, cgp: 0x1008f9ad != bp: 0xa1bae7d7
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 817, cgp: 0xb974439f != bp: 0x8c65de5
UFS /dev/da1p1 (/datadrive) cylinder checksum failed: cg 818, cgp: 0x97f71033 != bp: 0x8c2b9d5e
pid 1013 (touch), uid 1001 inumber 2 on /datadrive: out of inodes
Comment 39 schakrabarti@microsoft.com 2023-08-21 08:14:20 UTC
Please ignore my previous comment #38 and comment #37.
It seems the patch works, I had to create the partition and format it with ufs again to take the patch into affect.
Thanks. Please commit it.
Comment 40 Kirk McKusick freebsd_committer freebsd_triage 2023-08-21 20:14:56 UTC
(In reply to schakrabarti@microsoft.com from comment #39)
Thanks for confirming that the patch works.

I will need to run it through the FreeBSD Phabricator review process. I will include you in folks to review it so if any Azure questions arise you can answer them.
Comment 41 schakrabarti@microsoft.com 2023-08-30 06:03:31 UTC
(In reply to Kirk McKusick from comment #40)
Hi Kirk, can please let me know when you are planning to put it in phabricator?
Comment 42 Kirk McKusick freebsd_committer freebsd_triage 2023-09-02 06:19:30 UTC
Created attachment 244573 [details]
revised proposed fix

(In reply to schakrabarti@microsoft.com from comment #41)
Sorry for the delay. I have been on vacation and not keeping up with work email.

Offline comment I got was that using 512-byte alignment is probably overkill. Rather using 64-byte alignment should be sufficient. To that end, could you please test the attached revised patch to see if it also solves your issue.
Comment 43 schakrabarti@microsoft.com 2023-09-04 07:08:23 UTC
(In reply to Kirk McKusick from comment #42)
I have tested with 64 bit alignment and it has worked.
Comment 44 Andrew Turner freebsd_committer freebsd_triage 2023-09-04 16:11:15 UTC
How are you testing? I'm not seeing any issues on either a local arm64 Hyper-V or Azure instance.
Comment 45 Kirk McKusick freebsd_committer freebsd_triage 2023-09-05 06:36:51 UTC
(In reply to schakrabarti@microsoft.com from comment #43)
See https://reviews.freebsd.org/D41724 for the full fix.
Please confirm that it does resolve the issue in Asure.
Comment 46 schakrabarti@microsoft.com 2023-09-05 08:19:03 UTC
(In reply to Andrew Turner from comment #44) This vmbus bus_get_dma_tag change patch is also solving the issue and the libufs change is also solving the issue.
Comment 47 schakrabarti@microsoft.com 2023-09-05 10:34:13 UTC
(In reply to Andrew Turner from comment #44)
 I have raised it https://reviews.freebsd.org/D41728
Comment 48 schakrabarti@microsoft.com 2023-09-05 10:35:15 UTC
(In reply to Kirk McKusick from comment #45)
Looks like https://reviews.freebsd.org/D41728 this solves the problem.
Also it solves the panic during reboot from add_route page fault. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272666
Comment 49 Kirk McKusick freebsd_committer freebsd_triage 2023-09-06 05:43:34 UTC
(In reply to schakrabarti@microsoft.com from comment #48)
Does https://reviews.freebsd.org/D41728 solving the problem mean that https://reviews.freebsd.org/D41724 is no longer needed?
Comment 50 schakrabarti@microsoft.com 2023-09-06 06:43:08 UTC
(In reply to Kirk McKusick from comment #49)
With this https://reviews.freebsd.org/D41728, the ufs checksum error is not coming. We don't need https://reviews.freebsd.org/D41724 for the ufs checksum failure from Azure ARM64 perpective.
Comment 51 Li-Wen Hsu freebsd_committer freebsd_triage 2023-09-10 16:46:35 UTC
(In reply to Kirk McKusick from comment #49)
Yes I have tested in Azure and can confirm that both https://reviews.freebsd.org/D41728 and https://reviews.freebsd.org/D41724 can solve the problem we observed, independently.
Comment 52 commit-hook freebsd_committer freebsd_triage 2023-09-14 07:22:53 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e7a9817b8d328dda04069b65944ce2ed6f54c6f0

commit e7a9817b8d328dda04069b65944ce2ed6f54c6f0
Author:     Souradeep Chakrabarti <schakrabarti@microsoft.com>
AuthorDate: 2023-09-14 07:11:25 +0000
Commit:     Wei Hu <whu@FreeBSD.org>
CommitDate: 2023-09-14 07:11:25 +0000

    Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus

    In ARM64 Hyper-V UFS filesystem is getting corruption and those
    corruptions are consistently happening just after hitting a page
    boundary. It is unable to correctly read disk blocks into buffers
    that are not aligned to 512-byte boundaries.

    It happens because storvsc needs physically contiguous memory which
    may not be the case when bus_dma needs to create a bounce buffer.
    This can happen when the destination is not cache-line aligned.

    Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices
    that are added dynamically via the VMbus protocol and are not
    represented in the ACPI DSDT. Only the top level VMbus node exists
    in the DSDT. As such, on ARM64 these devices don't pick up coherence
    information and default to not hardware coherent.

    PR:             267654, 272666
    Reviewed by:    andrew, whu
    Tested by:      lwhsu
    MFC after:      3 days
    Sponsored by:   Microsoft
    Differential Revision:  https://reviews.freebsd.org/D41728

 sys/dev/hyperv/vmbus/vmbus.c     | 33 +++++++++++++++++++++++++++++++++
 sys/dev/hyperv/vmbus/vmbus_var.h |  1 +
 2 files changed, 34 insertions(+)
Comment 53 commit-hook freebsd_committer freebsd_triage 2023-09-18 10:39:17 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=85bc81352e4b0d0a9da251bacec35eec130eee49

commit 85bc81352e4b0d0a9da251bacec35eec130eee49
Author:     Souradeep Chakrabarti <schakrabarti@microsoft.com>
AuthorDate: 2023-09-14 07:11:25 +0000
Commit:     Wei Hu <whu@FreeBSD.org>
CommitDate: 2023-09-18 10:26:59 +0000

    Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus

    In ARM64 Hyper-V UFS filesystem is getting corruption and those
    corruptions are consistently happening just after hitting a page
    boundary. It is unable to correctly read disk blocks into buffers
    that are not aligned to 512-byte boundaries.

    It happens because storvsc needs physically contiguous memory which
    may not be the case when bus_dma needs to create a bounce buffer.
    This can happen when the destination is not cache-line aligned.

    Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices
    that are added dynamically via the VMbus protocol and are not
    represented in the ACPI DSDT. Only the top level VMbus node exists
    in the DSDT. As such, on ARM64 these devices don't pick up coherence
    information and default to not hardware coherent.

    PR:             267654, 272666
    Reviewed by:    andrew, whu
    Tested by:      lwhsu
    MFC after:      3 days
    Sponsored by:   Microsoft
    Differential Revision:  https://reviews.freebsd.org/D41728

    (cherry picked from commit e7a9817b8d328dda04069b65944ce2ed6f54c6f0)

 sys/dev/hyperv/vmbus/vmbus.c     | 33 +++++++++++++++++++++++++++++++++
 sys/dev/hyperv/vmbus/vmbus_var.h |  1 +
 2 files changed, 34 insertions(+)
Comment 54 commit-hook freebsd_committer freebsd_triage 2023-09-18 15:00:10 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=48a799af88c0c77963ffb23eba797f85f86751dd

commit 48a799af88c0c77963ffb23eba797f85f86751dd
Author:     Souradeep Chakrabarti <schakrabarti@microsoft.com>
AuthorDate: 2023-09-14 07:11:25 +0000
Commit:     Wei Hu <whu@FreeBSD.org>
CommitDate: 2023-09-18 14:57:57 +0000

    Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus

    In ARM64 Hyper-V UFS filesystem is getting corruption and those
    corruptions are consistently happening just after hitting a page
    boundary. It is unable to correctly read disk blocks into buffers
    that are not aligned to 512-byte boundaries.

    It happens because storvsc needs physically contiguous memory which
    may not be the case when bus_dma needs to create a bounce buffer.
    This can happen when the destination is not cache-line aligned.

    Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices
    that are added dynamically via the VMbus protocol and are not
    represented in the ACPI DSDT. Only the top level VMbus node exists
    in the DSDT. As such, on ARM64 these devices don't pick up coherence
    information and default to not hardware coherent.

    Approved by:    re (gjb)
    PR:             267654, 272666
    Reviewed by:    andrew, whu
    Tested by:      lwhsu
    Sponsored by:   Microsoft
    Differential Revision:  https://reviews.freebsd.org/D41728

    (cherry picked from commit e7a9817b8d328dda04069b65944ce2ed6f54c6f0)
    (cherry picked from commit 85bc81352e4b0d0a9da251bacec35eec130eee49)

 sys/dev/hyperv/vmbus/vmbus.c     | 33 +++++++++++++++++++++++++++++++++
 sys/dev/hyperv/vmbus/vmbus_var.h |  1 +
 2 files changed, 34 insertions(+)
Comment 55 Ed Maste freebsd_committer freebsd_triage 2023-09-19 17:09:24 UTC
With 48a799af88c0 in releng/14.0 now, should this PR be resolved?
Comment 56 Wei Hu 2023-09-21 10:09:19 UTC
(In reply to Ed Maste from comment #55)
I would love to close this bug. But I can't find the place to change the status to resolved in the bug report.
Comment 57 Ed Maste freebsd_committer freebsd_triage 2023-09-21 14:17:25 UTC
(In reply to Wei Hu from comment #56)
There's a "Status:" dropdown before the first comment, and after changing it to Closed a second dropdown will appear with the resolution. I'll change it to Closed/FIXED.
Comment 58 commit-hook freebsd_committer freebsd_triage 2023-11-17 22:12:30 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=772430dd67955850942d689714ab982da24257ba

commit 772430dd67955850942d689714ab982da24257ba
Author:     Kirk McKusick <mckusick@FreeBSD.org>
AuthorDate: 2023-11-17 22:10:29 +0000
Commit:     Kirk McKusick <mckusick@FreeBSD.org>
CommitDate: 2023-11-17 22:11:24 +0000

    Ensure I/O buffers in libufs(3) are 128-byte aligned.

    Various disk controllers require their buffers to be aligned to a
    cache-line size (128 bytes). For buffers allocated in structures,
    ensure that they are 128-byte aligned. Use aligned_malloc to allocate
    memory to ensure that the returned memory is 128-byte aligned.

    While we are here, we replace the dynamically allocated inode buffer
    with a buffer allocated in the uufsd structure just as the superblock
    and cylinder group buffers do.

    This can be removed if/when the kernel is fixed. Because this problem
    has existed on one I/O subsystem or another since the 1990's, we
    are probably stuck with dealing with it forever.

    The problem most recent showed up in Azure, see:
        https://reviews.freebsd.org/D41728
        https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267654
    Before these fixes were applied, it was confirmed that the changes
    in this commit also fixed the issue in Azure.

    Reviewed-by: Warner Losh, kib
    Tested-by:   Souradeep Chakrabarti of Microsoft (earlier version)
    PR:          267654
    Differential Revision: https://reviews.freebsd.org/D41724

 lib/libufs/Makefile         |  2 +-
 lib/libufs/block.c          | 45 +++++++++++------------------------
 lib/libufs/inode.c          | 16 ++-----------
 lib/libufs/libufs.h         | 57 ++++++++++++++++++++++++++++++---------------
 lib/libufs/sblock.c         |  3 ++-
 lib/libufs/type.c           | 12 ++++++----
 lib/libufs/ufs_disk_close.3 |  9 ++++++-
 sbin/fsck_ffs/fsck.h        | 15 ++++++++++++
 sbin/fsck_ffs/fsutil.c      | 11 ++++-----
 sbin/fsck_ffs/inode.c       |  3 +--
 sbin/fsck_ffs/main.c        |  1 -
 sbin/fsck_ffs/pass5.c       |  1 -
 sbin/fsck_ffs/setup.c       | 11 ++++-----
 sbin/fsck_ffs/suj.c         |  3 +--
 14 files changed, 98 insertions(+), 91 deletions(-)