Created attachment 237838 [details] Replace the virtio_scsi tags with an atomic 32 bit counter. I'm running FreeBSD 13.1/amd64 both as bhyve host and guest. The guests access to block storage is provided via virtio-scsi backed by local ZVOLs. Under heavy disk I/O (e.g. fio benchmarks on a UFS file system of ~30GB) I get these warnings every few seconds inside the guest: (da1:vtscsi0:0:0:1): WRITE(10). CDB: 2a 00 00 cc 17 88 00 00 08 00 (da1:vtscsi0:0:0:1): CAM status: SCSI Status Error (da1:vtscsi0:0:0:1): SCSI status: Check Condition (da1:vtscsi0:0:0:1): SCSI sense: ILLEGAL REQUEST asc:4d,af ((null)) (da1:vtscsi0:0:0:1): Retrying command (per sense data) (da1:vtscsi0:0:0:1): WRITE(10). CDB: 2a 00 00 cc f6 30 00 00 08 00 (da1:vtscsi0:0:0:1): CAM status: SCSI Status Error (da1:vtscsi0:0:0:1): SCSI status: Check Condition (da1:vtscsi0:0:0:1): SCSI sense: ILLEGAL REQUEST asc:4d,20 ((null)) (da1:vtscsi0:0:0:1): Retrying command (per sense data) (da1:vtscsi0:0:0:1): WRITE(10). CDB: 2a 00 00 30 ad 20 00 00 08 00 (da1:vtscsi0:0:0:1): CAM status: SCSI Status Error (da1:vtscsi0:0:0:1): SCSI status: Check Condition (da1:vtscsi0:0:0:1): SCSI sense: ILLEGAL REQUEST asc:4d,f6 ((null)) (da1:vtscsi0:0:0:1): Retrying command (per sense data) The SCSI sense code reported (ILLEGAL REQUEST asc:4d,XX) decodes to the illegal reuse of a tag value that's already pending. I looked into the virtio_scsi guest driver and it uses the pointer a struct csio (casted to a 64bit uintptr_t) as SCSI tag, but on the host side inside CTL SCSI tags are truncated to 32 bit. I suspect this causes pointers with identical lower 32 bits to map to the same tag as seen by CTL, but an attempt to patch the virtio_scsi guest driver to use an atomic 32 bit counter (cargo culting the iSCSI tag handling) instead didn't solve the problem. I still attached the experimental patch.
My test within a Release Engineering 13.1 VM image to reproduce this: root@freebsd:~ # dd if=/dev/random of=/dev/da0 bs=1m ; dmesg | grep vtscsi (da0:vtscsi0:0:0:0): WRITE(10). CDB: 2a 00 02 00 1d b8 00 01 e8 00 (da0:vtscsi0:0:0:0): CAM status: SCSI Status Error (da0:vtscsi0:0:0:0): SCSI status: Check Condition (da0:vtscsi0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:4d,94 ((null)) (da0:vtscsi0:0:0:0): Retrying command (per sense data) dd: /dev/da0: end of device 30721+0 records in 30720+0 records out Setup on the host: Create a disk image that will be attached to the VM using CTL: # truncate -s 30G ctl-test.raw CTL Setup script: cat ctl-setup.sh echo ctladm portlist before anything ctladm portlist -l -v echo creating port ctladm port -c -O pp=1 -O vp=0 echo ctladm portlist before after port creation ctladm portlist -l -v echo creating LUN type block ctladm create -b block -d foo -S bar -o file=ctl-test.raw echo ctladm portlist before after LUN creation ctladm portlist -l -v echo listing /dev/cam ls /dev/cam/ echo running ctladm devlist ctladm devlist Download a Release Engineering VM images to /root/vm0.raw or use the full name VM boot script: cat ctl-boot.sh #!/bin/sh echo Loading vmm.ko if not loaded kldstat -q -m vmm || kldload vmm.ko || { echo vmm failed to load ; exit 1 ; } echo Destroying VM vm0 if present if [ -e "/dev/vmm/vm0" ] ; then echo Found /dev/vmm/vm0 and destroying it bhyvectl --destroy --vm=vm0 || \ { echo VM vm0 failed to destroy ; exit 1 ; } fi echo Loading VM vm0 bhyveload -m 4g -d /root/vm0.raw \ -e beastie_disable=YES \ -e splash_bmp_load=NO \ -e loader_logo=NO \ -e autoboot_delay=3 \ vm0 || \ { echo VM vm0 failed to load ; exit 1 ; } echo Booting VM vm0 bhyve -m 4g -c 2 -A -H -s 0,hostbridge -s 1,lpc -l com1,stdio \ -s 3,virtio-blk,/root/vm0.raw \ -s 4,virtio-scsi,/dev/cam/ctl1.0 \ vm0
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0acc026dda9e7405f7a88993d1b51e5fc846cbc5 commit 0acc026dda9e7405f7a88993d1b51e5fc846cbc5 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2022-12-03 15:23:29 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2022-12-03 15:23:29 +0000 CTL: Increase maximum SCSI tag size from 32 to 64 bits. SAM-5 specification states maximum size of command identifier (tag), defined by specific transports, should not be larger than 64 bits. While most of supported transports use 32 bits or less, it was reported that virtio-scsi uses 64 bits. Truncation to 32 bits in bhyve code caused false tag conflict errors reported and possibly other issues. This changes CTL ABI and HA protocol, so CTL_HA_VERSION is bumped. While we make HA protocol incompatible, increase default maximum number of ports in CTL from 256 to 1024, matching number of LUNs. There are many reports from people who need many iSCSI targets with only one LUN each. Increased memory consumption should be less of a problem these days. PR: 267539 share/man/man4/ctl.4 | 2 +- sys/cam/ctl/ctl.c | 13 +++++++------ sys/cam/ctl/ctl_frontend_ioctl.c | 4 ++-- sys/cam/ctl/ctl_io.h | 11 +++++------ sys/cam/ctl/ctl_ioctl.h | 3 ++- sys/cam/ctl/ctl_tpc_local.c | 2 +- sys/cam/ctl/ctl_util.c | 6 +++--- sys/cam/ctl/scsi_ctl.c | 2 +- usr.sbin/bhyve/pci_virtio_scsi.c | 4 ++-- 9 files changed, 24 insertions(+), 23 deletions(-)
Since it breaks CTL ABI, I am not sure whether I merge it to stable, but for head host the problem should be fixed now.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7467a6953683b439f7b31c2b42533cb893ed6be3 commit 7467a6953683b439f7b31c2b42533cb893ed6be3 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2022-12-03 17:05:05 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2022-12-03 17:05:05 +0000 CTL: Allow userland supply tags via ioctl frontend. Before this ioctl frontend always replaced tags with sequential ones. It was done for ctladm, that can not keep track of global tag list. But in case of virtio-scsi in bhyve we can pass provided tags as-is. It should be on virtio-scsi initiator to provide us valid tags. It should allow proper task management, error reporting, etc. In case of several virtio-scsi devices, they should use different CTL ports or initiator IDs to avoid conflicts, but this is expected by design. PR: 267539 sys/cam/ctl/ctl_frontend_ioctl.c | 5 +++-- sys/cam/ctl/ctl_io.h | 2 +- usr.sbin/bhyve/pci_virtio_scsi.c | 2 ++ 3 files changed, 6 insertions(+), 3 deletions(-)
The specified test passes with this fix and I will verify CTL/iSCSI for regressions. Excellent work!