Overview: If the LUN block size in ctl.conf is set to 4K, Windows and ESXi iSCSI initiators will crash while formatting the iSCSI device. With LUN block size in ctl.conf set to 512, everything is fine. Steps to reproduce: Mirrored vdevs with ashift=12 on 2 x Samsung 870 QVO 4 TB Zpool on 2 x said vdevs Zvol with volblocksize=8K CTL LUN on the zvol with backend=block, block size=4K Connect Windows or ESXi initiator to the target and create a partition/datastore to fill the entire device. Actual result: ESXi fails during datastore creation saying: Operation failed, diagnostics report: Unable to create Filesystem, please see VMkernel log for more details: Failed to create VMFS on device t10.FreeBSD_xxxxxxxx_vm_datastore__:1 Looking at the ESXi logs, I see this: 2024-01-21T22:11:02.389Z cpu2:2098049)WARNING: ScsiDeviceIO: 11450: Mode Sense cmd reported block size 4096, does not match the current logical block size 512(with physical block size 4096) for device. 2024-01-21T22:11:02.389Z cpu2:2098049)WARNING: ScsiDeviceIO: 11452: The device t10.FreeBSD_xxxxxxxx_vm_datastore__ is marked format corrupt. .... 2024-01-21T22:11:36.079Z cpu36:2098584)WARNING: iscsi_vmk: iscsivmk_ConnCommandResponse:2369: SCSI command (opcode=0x2a) completed successfully without enough data: 65536 < 131072 2024-01-21T22:11:36.079Z cpu36:2098584)WARNING: iscsi_vmk: iscsivmk_ConnCommandResponse:2370: Sess [ISID: 00023d000001 TARGET: iqn.xxxxxxxxxxxxxxxxxxxxxxxxx:vmdatastore TPGT: 101 TSIH: 0] The entire disk management service in Windows crashes and is non-responsive until I disconnect the iSCSI target. When I reconnect the iSCSI target, Windows reports a disk with a partition that is much larger than the disk itself. Expected results: Creating and formatting the partition/datastore should work without issues. Build date and hardware: XigmaNAS 13.2.05 on FreeBSD 13.2-RELEASE-p1 running on Dell PowerEdge R730XD.
Having researched some more, I found this from https://manpages.ubuntu.com/manpages/xenial/en/man8/sg_format.8.html -- the documentation for the sg_format command: When this utility is used without options (i.e. it is only given a DEVICE argument) it prints out the existing block size and block count derived from two sources. These two sources are a block descriptor in the response to a MODE SENSE command and the response to a READ CAPACITY command. The reason for this double check is to detect a "format corrupt" state (see NOTES section). This usage will not modify the disk. This describes my scenario perfectly, including the "format corrupt" message which also appears in the ESXi logs.
You should not change logical block size once you written anything to the disk. We are not responsible for initiator bugs, but it is expected that most of partition tables and file systems won't handle sector size change well. Recreate the ZVOL from scratch when changing logical sector size.
It's not about changing the block size after the volume has been written to. It's about the first time the volume is partitioned and formatted. But the discussion in #276524 is important. If ESXi is misreading the device block sizes, that will likely be the cause of this error.