Bug 276522 - Setting LUN block size in ctl.conf to 4K causes mismatched block size and crashes in initiators
Summary: Setting LUN block size in ctl.conf to 4K causes mismatched block size and cra...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-22 11:12 UTC by balchen
Modified: 2024-01-26 22:26 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description balchen 2024-01-22 11:12:58 UTC
Overview:

If the LUN block size in ctl.conf is set to 4K, Windows and ESXi iSCSI initiators will crash while formatting the iSCSI device. With LUN block size in ctl.conf set to 512, everything is fine.

Steps to reproduce:

Mirrored vdevs with ashift=12 on 2 x Samsung 870 QVO 4 TB
Zpool on 2 x said vdevs
Zvol with volblocksize=8K
CTL LUN on the zvol with backend=block, block size=4K

Connect Windows or ESXi initiator to the target and create a partition/datastore to fill the entire device.

Actual result:

ESXi fails during datastore creation saying:

   Operation failed, diagnostics report: Unable to create Filesystem, please see VMkernel log for more details: Failed to create VMFS on device t10.FreeBSD_xxxxxxxx_vm_datastore__:1

Looking at the ESXi logs, I see this:

   2024-01-21T22:11:02.389Z cpu2:2098049)WARNING: ScsiDeviceIO: 11450: Mode Sense cmd reported block size 4096, does not match the current logical block size 512(with physical block size 4096) for device.
   2024-01-21T22:11:02.389Z cpu2:2098049)WARNING: ScsiDeviceIO: 11452: The device t10.FreeBSD_xxxxxxxx_vm_datastore__ is marked format corrupt.
   ....
   2024-01-21T22:11:36.079Z cpu36:2098584)WARNING: iscsi_vmk: iscsivmk_ConnCommandResponse:2369: SCSI command (opcode=0x2a) completed successfully without enough data: 65536 < 131072
   2024-01-21T22:11:36.079Z cpu36:2098584)WARNING: iscsi_vmk: iscsivmk_ConnCommandResponse:2370: Sess [ISID: 00023d000001 TARGET: iqn.xxxxxxxxxxxxxxxxxxxxxxxxx:vmdatastore TPGT: 101 TSIH: 0]


The entire disk management service in Windows crashes and is non-responsive until I disconnect the iSCSI target. When I reconnect the iSCSI target, Windows reports a disk with a partition that is much larger than the disk itself.

Expected results:

Creating and formatting the partition/datastore should work without issues.

Build date and hardware:

XigmaNAS 13.2.05 on FreeBSD 13.2-RELEASE-p1 running on Dell PowerEdge R730XD.
Comment 1 balchen 2024-01-22 20:27:32 UTC
Having researched some more, I found this from https://manpages.ubuntu.com/manpages/xenial/en/man8/sg_format.8.html -- the documentation for the sg_format command:

       When this utility is used without options (i.e. it is only given  a  DEVICE  argument)  it
       prints  out  the  existing  block size and block count derived from two sources. These two
       sources are a block descriptor in the response to a MODE SENSE command and the response to
       a  READ CAPACITY command. The reason for this double check is to detect a "format corrupt"
       state (see NOTES section). This usage will not modify the disk.

This describes my scenario perfectly, including the "format corrupt" message which also appears in the ESXi logs.
Comment 2 Alexander Motin freebsd_committer freebsd_triage 2024-01-26 20:56:40 UTC
You should not change logical block size once you written anything to the disk.  We are not responsible for initiator bugs, but it is expected that most of partition tables and file systems won't handle sector size change well.  Recreate the ZVOL from scratch when changing logical sector size.
Comment 3 balchen 2024-01-26 22:26:56 UTC
It's not about changing the block size after the volume has been written to. It's about the first time the volume is partitioned and formatted.

But the discussion in #276524 is important. If ESXi is misreading the device block sizes, that will likely be the cause of this error.