Bug 267539 - Bhyve virtio-scsi warnings about illegal SCSI tag reuse
Summary: Bhyve virtio-scsi warnings about illegal SCSI tag reuse
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 13.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-03 12:57 UTC by crest
Modified: 2022-12-05 06:33 UTC (History)
3 users (show)

See Also:


Attachments
Replace the virtio_scsi tags with an atomic 32 bit counter. (2.97 KB, patch)
2022-11-03 12:57 UTC, crest
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description crest 2022-11-03 12:57:14 UTC
Created attachment 237838 [details]
Replace the virtio_scsi tags with an atomic 32 bit counter.

I'm running FreeBSD 13.1/amd64 both as bhyve host and guest. The guests access to block storage is provided via virtio-scsi backed by local ZVOLs. Under heavy disk I/O (e.g. fio benchmarks on a UFS file system of ~30GB) I get these warnings every few seconds inside the guest:

(da1:vtscsi0:0:0:1): WRITE(10). CDB: 2a 00 00 cc 17 88 00 00 08 00
(da1:vtscsi0:0:0:1): CAM status: SCSI Status Error
(da1:vtscsi0:0:0:1): SCSI status: Check Condition
(da1:vtscsi0:0:0:1): SCSI sense: ILLEGAL REQUEST asc:4d,af ((null))
(da1:vtscsi0:0:0:1): Retrying command (per sense data)

(da1:vtscsi0:0:0:1): WRITE(10). CDB: 2a 00 00 cc f6 30 00 00 08 00
(da1:vtscsi0:0:0:1): CAM status: SCSI Status Error
(da1:vtscsi0:0:0:1): SCSI status: Check Condition
(da1:vtscsi0:0:0:1): SCSI sense: ILLEGAL REQUEST asc:4d,20 ((null))
(da1:vtscsi0:0:0:1): Retrying command (per sense data)

(da1:vtscsi0:0:0:1): WRITE(10). CDB: 2a 00 00 30 ad 20 00 00 08 00
(da1:vtscsi0:0:0:1): CAM status: SCSI Status Error
(da1:vtscsi0:0:0:1): SCSI status: Check Condition
(da1:vtscsi0:0:0:1): SCSI sense: ILLEGAL REQUEST asc:4d,f6 ((null))
(da1:vtscsi0:0:0:1): Retrying command (per sense data)

The SCSI sense code reported (ILLEGAL REQUEST asc:4d,XX) decodes to the illegal reuse of a tag value that's already pending. I looked into the virtio_scsi guest driver and it uses the pointer a struct csio (casted to a 64bit uintptr_t) as SCSI tag, but on the host side inside CTL SCSI tags are truncated to 32 bit. I suspect this causes pointers with identical lower 32 bits to map to the same tag as seen by CTL, but an attempt to patch the virtio_scsi guest driver to use an atomic 32 bit counter (cargo culting the iSCSI tag handling) instead didn't solve the problem. I still attached the experimental patch.
Comment 1 Michael Dexter freebsd_triage 2022-11-18 21:19:27 UTC
My test within a Release Engineering 13.1 VM image to reproduce this:

root@freebsd:~ # dd if=/dev/random of=/dev/da0 bs=1m ; dmesg | grep vtscsi
(da0:vtscsi0:0:0:0): WRITE(10). CDB: 2a 00 02 00 1d b8 00 01 e8 00
(da0:vtscsi0:0:0:0): CAM status: SCSI Status Error
(da0:vtscsi0:0:0:0): SCSI status: Check Condition
(da0:vtscsi0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:4d,94 ((null))
(da0:vtscsi0:0:0:0): Retrying command (per sense data)
dd: /dev/da0: end of device
30721+0 records in
30720+0 records out

Setup on the host:

Create a disk image that will be attached to the VM using CTL:

# truncate -s 30G ctl-test.raw

CTL Setup script:

cat ctl-setup.sh

echo ctladm portlist before anything
ctladm portlist -l -v

echo creating port
ctladm port -c -O pp=1 -O vp=0

echo ctladm portlist before after port creation
ctladm portlist -l -v

echo creating LUN type block
ctladm create -b block -d foo -S bar -o file=ctl-test.raw

echo ctladm portlist before after LUN creation
ctladm portlist -l -v

echo listing /dev/cam
ls /dev/cam/

echo running ctladm devlist
ctladm devlist


Download a Release Engineering VM images to /root/vm0.raw or use the full name


VM boot script:

cat ctl-boot.sh
#!/bin/sh

echo Loading vmm.ko if not loaded
kldstat -q -m vmm || kldload vmm.ko || { echo vmm failed to load ; exit 1 ; }

echo Destroying VM vm0 if present
if [ -e "/dev/vmm/vm0" ] ; then
	echo Found /dev/vmm/vm0 and destroying it
	bhyvectl --destroy --vm=vm0 || \
		{ echo VM vm0 failed to destroy ; exit 1 ; }
fi

echo Loading VM vm0
bhyveload -m 4g -d /root/vm0.raw \
	-e beastie_disable=YES \
	-e splash_bmp_load=NO \
	-e loader_logo=NO \
	-e autoboot_delay=3 \
	vm0 || \
	{ echo VM vm0 failed to load ; exit 1 ; }

echo Booting VM vm0
bhyve -m 4g -c 2 -A -H -s 0,hostbridge -s 1,lpc -l com1,stdio \
	-s 3,virtio-blk,/root/vm0.raw \
	-s 4,virtio-scsi,/dev/cam/ctl1.0 \
	vm0
Comment 2 commit-hook freebsd_committer freebsd_triage 2022-12-03 15:38:49 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0acc026dda9e7405f7a88993d1b51e5fc846cbc5

commit 0acc026dda9e7405f7a88993d1b51e5fc846cbc5
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2022-12-03 15:23:29 +0000
Commit:     Alexander Motin <mav@FreeBSD.org>
CommitDate: 2022-12-03 15:23:29 +0000

    CTL: Increase maximum SCSI tag size from 32 to 64 bits.

    SAM-5 specification states maximum size of command identifier (tag),
    defined by specific transports, should not be larger than 64 bits.
    While most of supported transports use 32 bits or less, it was
    reported that virtio-scsi uses 64 bits.  Truncation to 32 bits in
    bhyve code caused false tag conflict errors reported and possibly
    other issues.

    This changes CTL ABI and HA protocol, so CTL_HA_VERSION is bumped.

    While we make HA protocol incompatible, increase default maximum
    number of ports in CTL from 256 to 1024, matching number of LUNs.
    There are many reports from people who need many iSCSI targets with
    only one LUN each.  Increased memory consumption should be less of
    a problem these days.

    PR:     267539

 share/man/man4/ctl.4             |  2 +-
 sys/cam/ctl/ctl.c                | 13 +++++++------
 sys/cam/ctl/ctl_frontend_ioctl.c |  4 ++--
 sys/cam/ctl/ctl_io.h             | 11 +++++------
 sys/cam/ctl/ctl_ioctl.h          |  3 ++-
 sys/cam/ctl/ctl_tpc_local.c      |  2 +-
 sys/cam/ctl/ctl_util.c           |  6 +++---
 sys/cam/ctl/scsi_ctl.c           |  2 +-
 usr.sbin/bhyve/pci_virtio_scsi.c |  4 ++--
 9 files changed, 24 insertions(+), 23 deletions(-)
Comment 3 Alexander Motin freebsd_committer freebsd_triage 2022-12-03 15:43:01 UTC
Since it breaks CTL ABI, I am not sure whether I merge it to stable, but for head host the problem should be fixed now.
Comment 4 commit-hook freebsd_committer freebsd_triage 2022-12-03 17:15:06 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7467a6953683b439f7b31c2b42533cb893ed6be3

commit 7467a6953683b439f7b31c2b42533cb893ed6be3
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2022-12-03 17:05:05 +0000
Commit:     Alexander Motin <mav@FreeBSD.org>
CommitDate: 2022-12-03 17:05:05 +0000

    CTL: Allow userland supply tags via ioctl frontend.

    Before this ioctl frontend always replaced tags with sequential ones.
    It was done for ctladm, that can not keep track of global tag list.
    But in case of virtio-scsi in bhyve we can pass provided tags as-is.
    It should be on virtio-scsi initiator to provide us valid tags.  It
    should allow proper task management, error reporting, etc.  In case
    of several virtio-scsi devices, they should use different CTL ports
    or initiator IDs to avoid conflicts, but this is expected by design.

    PR:     267539

 sys/cam/ctl/ctl_frontend_ioctl.c | 5 +++--
 sys/cam/ctl/ctl_io.h             | 2 +-
 usr.sbin/bhyve/pci_virtio_scsi.c | 2 ++
 3 files changed, 6 insertions(+), 3 deletions(-)
Comment 5 Michael Dexter freebsd_triage 2022-12-05 06:33:47 UTC
The specified test passes with this fix and I will verify CTL/iSCSI for regressions.

Excellent work!