Bug 276522 - Setting LUN block size in ctl.conf to 4K causes mismatched block size and crashes in initiators
Summary: Setting LUN block size in ctl.conf to 4K causes mismatched block size and cra...
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-22 11:12 UTC by balchen
Modified: 2025-01-19 23:30 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description balchen 2024-01-22 11:12:58 UTC
Overview:

If the LUN block size in ctl.conf is set to 4K, Windows and ESXi iSCSI initiators will crash while formatting the iSCSI device. With LUN block size in ctl.conf set to 512, everything is fine.

Steps to reproduce:

Mirrored vdevs with ashift=12 on 2 x Samsung 870 QVO 4 TB
Zpool on 2 x said vdevs
Zvol with volblocksize=8K
CTL LUN on the zvol with backend=block, block size=4K

Connect Windows or ESXi initiator to the target and create a partition/datastore to fill the entire device.

Actual result:

ESXi fails during datastore creation saying:

   Operation failed, diagnostics report: Unable to create Filesystem, please see VMkernel log for more details: Failed to create VMFS on device t10.FreeBSD_xxxxxxxx_vm_datastore__:1

Looking at the ESXi logs, I see this:

   2024-01-21T22:11:02.389Z cpu2:2098049)WARNING: ScsiDeviceIO: 11450: Mode Sense cmd reported block size 4096, does not match the current logical block size 512(with physical block size 4096) for device.
   2024-01-21T22:11:02.389Z cpu2:2098049)WARNING: ScsiDeviceIO: 11452: The device t10.FreeBSD_xxxxxxxx_vm_datastore__ is marked format corrupt.
   ....
   2024-01-21T22:11:36.079Z cpu36:2098584)WARNING: iscsi_vmk: iscsivmk_ConnCommandResponse:2369: SCSI command (opcode=0x2a) completed successfully without enough data: 65536 < 131072
   2024-01-21T22:11:36.079Z cpu36:2098584)WARNING: iscsi_vmk: iscsivmk_ConnCommandResponse:2370: Sess [ISID: 00023d000001 TARGET: iqn.xxxxxxxxxxxxxxxxxxxxxxxxx:vmdatastore TPGT: 101 TSIH: 0]


The entire disk management service in Windows crashes and is non-responsive until I disconnect the iSCSI target. When I reconnect the iSCSI target, Windows reports a disk with a partition that is much larger than the disk itself.

Expected results:

Creating and formatting the partition/datastore should work without issues.

Build date and hardware:

XigmaNAS 13.2.05 on FreeBSD 13.2-RELEASE-p1 running on Dell PowerEdge R730XD.
Comment 1 balchen 2024-01-22 20:27:32 UTC
Having researched some more, I found this from https://manpages.ubuntu.com/manpages/xenial/en/man8/sg_format.8.html -- the documentation for the sg_format command:

       When this utility is used without options (i.e. it is only given  a  DEVICE  argument)  it
       prints  out  the  existing  block size and block count derived from two sources. These two
       sources are a block descriptor in the response to a MODE SENSE command and the response to
       a  READ CAPACITY command. The reason for this double check is to detect a "format corrupt"
       state (see NOTES section). This usage will not modify the disk.

This describes my scenario perfectly, including the "format corrupt" message which also appears in the ESXi logs.
Comment 2 Alexander Motin freebsd_committer freebsd_triage 2024-01-26 20:56:40 UTC
You should not change logical block size once you written anything to the disk.  We are not responsible for initiator bugs, but it is expected that most of partition tables and file systems won't handle sector size change well.  Recreate the ZVOL from scratch when changing logical sector size.
Comment 3 balchen 2024-01-26 22:26:56 UTC
It's not about changing the block size after the volume has been written to. It's about the first time the volume is partitioned and formatted.

But the discussion in #276524 is important. If ESXi is misreading the device block sizes, that will likely be the cause of this error.
Comment 4 Alan Somers freebsd_committer freebsd_triage 2025-01-17 16:44:30 UTC
Can you run sg_format or a similar command on the ESXi initiator and share the results?  And also share the output of "ctladm devlist -v"?
Comment 5 rm@richardmay.net 2025-01-18 07:46:05 UTC
To the best of my knowledge ESXi 4kn support encompasses only local disks.

For remote (SAN) storage ESXi appears tolerant of varying physical block sizes (pblocksize) but is *not* tolerant of anything but 512 byte logical blocks -- in my experience.

Article ID: 327012 at https://knowledge.broadcom.com/external/article?legacyId=2091600 touches on this but doesn't seem to make an explicit declaration here.  Over the years I've spotted other verbiage within VMware docs that seems to corroborate ESXi + remote storage only working with 512e or 512n.

FWIW I'm playing with this using 14.2-RELEASE and ESXi 8.0.3 build 24414501.  I've set:

blocksize 4096
option pblocksize 8192

...and ESXi is unhappy about it.  It sees the empty LUN and appears to work until you begin the format process (from within vCenter).  Then the storage path drops.  I can't find anything in hostd.log, vmkernel.log, or anywhere else that's intuitive or descriptive -- just a dead path as if someone pulled a cable.

It seems fine with:

blocksize 512
option pblocksize 8192

...though "esxcli storage core device capacity list" worryingly lists the Format Type as Unknown.

I'm thinking the best plan here is:

blocksize 512
option pblocksize 4096

...the latter effectively hides zvol volblocksize rather than passing it through. This keeps ESXi logfiles clear and the disk sector format listed as a comforting 512e.

Let me know if I can help test anything. ESXi seems to be a small but persistent pain point when people point it toward *nix block storage targets.  I've got it working fairly well over here.
Comment 6 Alan Somers freebsd_committer freebsd_triage 2025-01-18 20:29:46 UTC
I'm going to close this, because so far it sounds like the bugs are only on the initiators' side.  If there is a bug in our server, it would be if we were reporting two different values for logical block size, from two different commands.  If that is the case, then please reopen the bug.  And include the output of some sg3_utils command that shows the discrepancy.
Comment 7 balchen 2025-01-19 12:20:48 UTC
(In reply to Alan Somers from comment #4)

If you look at #276524, you will see an extensive discussion on this. It seems likely that ESXi has misinterpreted the disks, perhaps because of mismatched info in outdated iSCSI pages.

I don't know how this would have affected the Windows initiator, but it seems it has affected it somehow. If ESXi has never touched the target, Windows seems to have no issues with it. If ESXi has touched the target, Windows also fails. Perhaps the incorrect info ESXi derives is stored in MBR/GPT and used by Windows.

In either case, it seems there is not much else to do here apart from what was already done in #276524.
Comment 8 balchen 2025-01-19 12:32:49 UTC
(In reply to rm@richardmay.net from comment #5)

Thank you for the input.

When I configure my iSCSI target with something other than 512 block size, my ESXi initiator fails with the log output I quoted in the first post. This is ESXi 7.0.3. Maybe something has changed in 8.0.3 to not make it log that, or maybe it's because I'm running ESXi standalone, not using vCenter.

Like you say, running with 512 block size seems the safest bet for ESXi.
Comment 9 rm@richardmay.net 2025-01-19 20:24:12 UTC
(In reply to balchen from comment #8)

I see the [possible] discrepancy after reading your bug report more carefully.

Your ESXi host was somehow detecting "current logical block size 512" and yet "Mode Sense cmd reported block size 4096" per vmkernel.log.

So ctl's response to the mode sense query was correct ("4k logical") and yet ESXi still thought "current logical block size" was 512 -- the implication being ctl *could* have given a wrong answer to a prior and different query.  It begs the question of what gave ESXi that impression.

ESXi detecting an existing on-disk format from a prior boot could be one source.  ESXi is notoriously "sticky" about block devices -- it seems to cache details about said devices and their filesystems and throws toys out of the crib when something different shows up purporting to carry the same volume.  Or when a different volume arrives on a familiar device.  Hence the existence of hacks like LVM.EnableResignature, SCSI.CompareLUNNumber, and LVM.DisallowSnapshotLUN.

For sure 512e (i.e. blocksize 512 with option pblocksize 4096) is almost certainly the best path forward.  I've checked the kernel logs from my experiments and found your exact same error on five occasions.  An explicit 512e setup "just works" thankfully.
Comment 10 balchen 2025-01-19 23:30:03 UTC
(In reply to rm@richardmay.net from comment #9)

Yeah, it's hard to figure out exactly what's going on. The varying and inconsistent terminology used (block size, sector size, physical, logical, etc) makes it even harder. It does seem possible that the ESXi iSCSI driver is conflating some parameters that shouldn't be conflated, and this happens to work for 512, but not others.

To be honest, I stopped looking a year ago. I will stay with 512 for ESXi targets to make sure.

When this commit (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276524#c20) makes it into a XigmaNAS version, it'd be fun to test again.