VirtualBox FBSD guest (on a FBSD host) 11.1-RELEASE FreeBSD 11.1-RELEASE #0 r324971M: Wed Oct 25 08:01:10 CDT 2017 root@fbsd11.vboxnet:/usr/obj/usr/src/sys/GENERIC amd64 Using iscsi connection to a Synology NAS/SAN providing multiple Targets with one or more LUNs results in periodic error messages such as: kernel: WARNING: 192.168.1.19:3260 (iqn.2000-01.com.synology:ds216j.Target-12.c16e0895b7): underflow mismatch: target indicates 0, we calculated 4 kernel: WARNING: 192.168.1.19:3260 (iqn.2000-01.com.synology:ds216j.Target-12.c16e0895b7): underflow mismatch: target indicates 0, we calculated 8 iSCSI messages observed on Synology SAN/Nas: 2017-10-25T08:43:03-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION. 2017-10-25T08:43:04-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION. 2017-10-25T08:43:05-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION. 2017-10-25T08:43:05-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION. 2017-10-25T08:43:06-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION. 2017-10-25T08:43:06-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION. After some period of time the system hangs/locks up (kernel panic?) and has to be "killed" and rebooted.
Do you have logs or crash dumps? Any chance you could obtain a packet trace?
Created attachment 187478 [details] Segment of messages log file with iscsi errors
Created attachment 187480 [details] Relevant wireshark capture file during iscsi errors
No kernel core dump file found.
Created attachment 187481 [details] Screenshot of locked-up system
The warnings on the initiator (FreeBSD) side mean that the NAS doesn't correctly mark underflows. It's a common bug and not something to worry about (though still a protocol error). The "Unsupported SCSI Opcode 0xb7" messages on the target (Synology) side mean it doesn't support READ DEFECT DATA command. That's ok too, and shouldn't break anything. I'm not sure what makes FreeBSD send the READ DEFECT DATA command to the target. Do you have some disk monitoring software enabled on the initiator side? As for the panic message - looking at the traceback it looks like panic in mpt(4) driver, unrelated to iscsi(4). It might be triggered by the same software that triggers the READ DEFECT DATA on iSCSI. Can you check which process triggered the panic? Basically, when you see the backtrace and panic messages, press Scroll Lock and use cursor keys to scroll up; there should be some more information there.
On the target side, I know that the Synology SAN/NAS has some disk monitoring. There are scheduled smart tests, but they are weeks to months apart. All Synology administrative console smartctl resuls show "Normal" for the affected target HDD. There are also several Synology applications running related to "cloud services" (thumbd,syncd,cloud-cleand,storaged). There is also an ecryptfs-kthread task and a synoindexd task. There have been a couple of Synology SAN/NAS firwmware updates over the past month or so. On the initiator side - I am occassionally running Zabbix to query hard drives (client script: ls,dd,fsck,fdisk,iostat,smartctl,diskutil,system_profiler) every 10-15 minutes. On some targets I also run ZFS queries and iSCSI (lsscsi,iscsiadm,iscsictl) queries. But nothing that does presistent monitoring (like every few seconds or non-stop). I have at times 8-10 client hosts (physical and virtual) monitored by either a physical Zabbix or a virtual Zabbix, all of which have iSCSI targets.Right now I have 12 targets with 19 LUNs defined attached to 6 different *nix OS's/versions. I am unable to see any more screen output when the FBSD VirtualBox machine locks up. Neither the VBox console nor the SSH terminal are responsive. I might be able to run a trace to a file, but don't know if that will capture what you are interested in - the data might never get written to local (VBox) disk even if there was something worth seeing. When I look at locked-up-client-host messages, there is nothing but "last timed entry" and the next startup entry. I have enabled the /etc/syslog.conf configuration for the "console log". Perhaps that will capture some additional information. Past experience has shown, however, that open log files during a system crash/lock up may not receive all of the possible data in the log file.
OH -BTW. The screen shot shows the last system message, the iSCSI message, which is a repeat of the same message several times before the panic so there may not be any additional information of interest prior to the panic messages.
It might be the Zabbix, but the messages are about five minutes apart, not fifteen. As you're running on VirtualBox, could you try to change the virtual HBA from the one handled by mpt(4) to something else, for example ahci(4), and see if it still panics?
Created attachment 187623 [details] Synology CHECK_CONDITION on more than one Target/Client There are the same messagse for other clients as already noted for FreeBSD, but I don't see any result of them on the clients involved.
Created attachment 187624 [details] HDD assignments for VBOX FreeBSD in question iscsi warnings are related, obviously, to devices assigned to the iscsi driver.
The FreeBSD host running VirtualBox has one 15Krpm SAS drive for the "root/main" file systems - assigned to "mps". Two 10Krpm SATA drives contain the various VirtualBox guests which are assigned to "ahci". The "HDD assignments ..." screen shot shows the virtual drive layout for the FreeBSD VBOX guest (1 IDE, 1 SATA, 2 SCSI). I am not sure what you are asking for regarding changing the "virtual HBA", nor how to "assign it to ahci".
FWIW - VirtualBox version on hosting FBSD is 5.1.10 and I am not able at this time to upgrade to a newer version.
The iSCSI messages are obviously related to the iscsi(4) driver, sure, but they are not fatal - in fact they should be completely harmless. The panic in the screenshot doesn't seem to be related to iscsi at all - it's panicing in the mpt(4) driver. Thus my suggestion to switch the emulated hardware to something else, to see if the workaround helps. If it does, the logical next step would be to file a bug for mpr(4).
Ah, and regarding the "virtual HBA" - what I'm suggesting is to switch the virtual "SCSI" disks to virtual SATA (AHCI). It's been a while since I've used VirtualBox, but there should be just a dropdown widget to change the disk type, right?
Deleted the VBOX FBSD SCSI drives and controller. Can't change drive type once defined. Not a problem as they were dummy drives just to test VBOX SCSI alongside iSCSI. Deleted ZFS pool, which used one of the VBOX SCSI drives, and recreated it using repurposed SATA VBOX drive. Synology/iSCSI WARNING messages still flow but the VBOX FBSD guest has been stable for 9 hours now. Lots of Zabbix queries flow to and from the VBOX FBSD guest without any hiccups. FWIW - the VBOX SCSI controller is based on either an LSIlogic or BUSlogic driver. I will leave everything running for awhile longer.
VBOX FBSD-11 with iSCSI and ZFS (on SATA) still operational after 28 hours of run time. So it appears that there is an issue with VirtualBox SCSI (LSIlogic/BUSlogic) drives in conjunction with iSCSI. I am not able at this point to test this with a physical FBSD.
Thanks! Yeah, it looks like Zabbix (in particular sending the READ DEFECT DATA) triggers a bug in the mpt(4) driver. Would you file a PR for that, with that information and the backtrace in a screenshot? Thanks!
MPT related bug report #223381 has been created.