Bug 223238 - Persistent iscsi underflow mismatch errors followed by apparent kernel panic
Summary: Persistent iscsi underflow mismatch errors followed by apparent kernel panic
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Edward Tomasz Napierala
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-25 13:45 UTC by Jim D.
Modified: 2017-11-02 16:40 UTC (History)
2 users (show)

See Also:


Attachments
Segment of messages log file with iscsi errors (956 bytes, application/x-bzip2)
2017-10-25 23:17 UTC, Jim D.
no flags Details
Relevant wireshark capture file during iscsi errors (7.85 KB, application/x-bzip2)
2017-10-25 23:18 UTC, Jim D.
no flags Details
Screenshot of locked-up system (29.94 KB, image/png)
2017-10-25 23:46 UTC, Jim D.
no flags Details
Synology CHECK_CONDITION on more than one Target/Client (2.87 KB, text/plain)
2017-10-31 19:06 UTC, Jim D.
no flags Details
HDD assignments for VBOX FreeBSD in question (45.18 KB, image/png)
2017-10-31 19:08 UTC, Jim D.
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jim D. 2017-10-25 13:45:43 UTC
VirtualBox FBSD guest (on a FBSD host)

11.1-RELEASE FreeBSD 11.1-RELEASE #0 r324971M: Wed Oct 25 08:01:10 CDT 2017     root@fbsd11.vboxnet:/usr/obj/usr/src/sys/GENERIC  amd64


Using iscsi connection to a Synology NAS/SAN providing multiple Targets with one or more LUNs results in periodic error messages such as:

kernel: WARNING: 192.168.1.19:3260 (iqn.2000-01.com.synology:ds216j.Target-12.c16e0895b7): underflow mismatch: target indicates 0, we calculated 4

kernel: WARNING: 192.168.1.19:3260 (iqn.2000-01.com.synology:ds216j.Target-12.c16e0895b7): underflow mismatch: target indicates 0, we calculated 8


iSCSI messages observed on Synology SAN/Nas:

2017-10-25T08:43:03-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
2017-10-25T08:43:04-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.
2017-10-25T08:43:05-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
2017-10-25T08:43:05-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.
2017-10-25T08:43:06-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
2017-10-25T08:43:06-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.


After some period of time the system hangs/locks up (kernel panic?) and has to be "killed" and rebooted.
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-10-25 21:38:22 UTC
Do you have logs or crash dumps?  Any chance you could obtain a packet trace?
Comment 2 Jim D. 2017-10-25 23:17:58 UTC
Created attachment 187478 [details]
Segment of messages log file with iscsi errors
Comment 3 Jim D. 2017-10-25 23:18:37 UTC
Created attachment 187480 [details]
Relevant wireshark capture file during iscsi errors
Comment 4 Jim D. 2017-10-25 23:19:20 UTC
No kernel core dump file found.
Comment 5 Jim D. 2017-10-25 23:46:03 UTC
Created attachment 187481 [details]
Screenshot of locked-up system
Comment 6 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-10-29 13:17:22 UTC
The warnings on the initiator (FreeBSD) side mean that the NAS doesn't correctly mark underflows.  It's a common bug and not something to worry about (though still a protocol error).

The "Unsupported SCSI Opcode 0xb7" messages on the target (Synology) side mean it doesn't support READ DEFECT DATA command.  That's ok too, and shouldn't break anything.  I'm not sure what makes FreeBSD send the READ DEFECT DATA command to the target.  Do you have some disk monitoring software enabled on the initiator side?

As for the panic message - looking at the traceback it looks like panic in mpt(4) driver, unrelated to iscsi(4).  It might be triggered by the same software that triggers the READ DEFECT DATA on iSCSI.  Can you check which process triggered the panic?  Basically, when you see the backtrace and panic messages, press Scroll Lock and use cursor keys to scroll up; there should be some more information there.
Comment 7 Jim D. 2017-10-29 15:30:00 UTC
On the target side, I know that the Synology SAN/NAS has some disk monitoring. There are scheduled smart tests, but they are weeks to months apart. All Synology administrative console smartctl resuls show "Normal" for the affected target HDD. There are also several Synology applications running related to "cloud services" (thumbd,syncd,cloud-cleand,storaged). There is also an ecryptfs-kthread task and a synoindexd task. There have been a couple of Synology SAN/NAS firwmware updates over the past month or so.

On the initiator side - I am occassionally running Zabbix to query hard drives (client script: ls,dd,fsck,fdisk,iostat,smartctl,diskutil,system_profiler) every 10-15 minutes. On some targets I also run ZFS queries and iSCSI (lsscsi,iscsiadm,iscsictl) queries. But nothing that does presistent monitoring (like every few seconds or non-stop). I have at times 8-10 client hosts (physical and virtual) monitored by either a physical Zabbix or a virtual Zabbix, all of which have iSCSI targets.Right now I have 12 targets with 19 LUNs defined attached to 6 different *nix OS's/versions.

I am unable to see any more screen output when the FBSD VirtualBox machine locks up. Neither the VBox console nor the SSH terminal are responsive. I might be able to run a trace to a file, but don't know if that will capture what you are interested in - the data might never get written to local (VBox) disk even if there was something worth seeing. When I look at locked-up-client-host messages, there is nothing but "last timed entry" and the next startup entry.

I have enabled the /etc/syslog.conf configuration for the "console log". Perhaps that will capture some additional information. Past experience has shown, however, that open log files during a system crash/lock up may not receive all of the possible data in the log file.
Comment 8 Jim D. 2017-10-29 15:33:40 UTC
OH -BTW. The screen shot shows the last system message, the iSCSI message, which is a repeat of the same message several times before the panic so there may not be any additional information of interest prior to the panic messages.
Comment 9 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-10-30 12:48:45 UTC
It might be the Zabbix, but the messages are about five minutes apart, not fifteen.

As you're running on VirtualBox, could you try to change the virtual HBA from the one handled by mpt(4) to something else, for example ahci(4), and see if it still panics?
Comment 10 Jim D. 2017-10-31 19:06:05 UTC
Created attachment 187623 [details]
Synology CHECK_CONDITION on more than one Target/Client

There are the same messagse for other clients as already noted for FreeBSD, but I don't see any result of them on the clients involved.
Comment 11 Jim D. 2017-10-31 19:08:53 UTC
Created attachment 187624 [details]
HDD assignments for VBOX FreeBSD in question

iscsi warnings are related, obviously, to devices assigned to the iscsi driver.
Comment 12 Jim D. 2017-10-31 19:14:21 UTC
The FreeBSD host running VirtualBox has one 15Krpm SAS drive for the "root/main" file systems - assigned to "mps". Two 10Krpm SATA drives contain the various VirtualBox guests which are assigned to "ahci". The "HDD assignments ..." screen shot shows the virtual drive layout for the FreeBSD VBOX guest (1 IDE, 1 SATA, 2 SCSI).

I am not sure what you are asking for regarding changing the "virtual HBA", nor how to "assign it to ahci".
Comment 13 Jim D. 2017-10-31 19:18:23 UTC
FWIW - VirtualBox version on hosting FBSD is 5.1.10 and I am not able at this time to upgrade to a newer version.
Comment 14 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-10-31 20:03:29 UTC
The iSCSI messages are obviously related to the iscsi(4) driver, sure, but they are not fatal - in fact they should be completely harmless.  The panic in the screenshot doesn't seem to be related to iscsi at all - it's panicing in the mpt(4) driver.  Thus my suggestion to switch the emulated hardware to something else, to see if the workaround helps.  If it does, the logical next step would be to file a bug for mpr(4).
Comment 15 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-10-31 20:13:04 UTC
Ah, and regarding the "virtual HBA" - what I'm suggesting is to switch the virtual "SCSI" disks to virtual SATA (AHCI).  It's been a while since I've used VirtualBox, but there should be just a dropdown widget to change the disk type, right?
Comment 16 Jim D. 2017-11-01 09:12:05 UTC
Deleted the VBOX FBSD SCSI drives and controller. Can't change drive type once defined. Not a problem as they were dummy drives just to test VBOX SCSI alongside iSCSI.

Deleted ZFS pool, which used one of the VBOX SCSI drives, and recreated it using repurposed SATA VBOX drive.

Synology/iSCSI WARNING messages still flow but the VBOX FBSD guest has been stable for 9 hours now. Lots of Zabbix queries flow to and from the VBOX FBSD guest without any hiccups.

FWIW - the VBOX SCSI controller is based on either an LSIlogic or BUSlogic driver.

I will leave everything running for awhile longer.
Comment 17 Jim D. 2017-11-02 04:10:54 UTC
VBOX FBSD-11 with iSCSI and ZFS (on SATA) still operational after 28 hours of run time. So it appears that there is an issue with VirtualBox SCSI (LSIlogic/BUSlogic) drives in conjunction with iSCSI. I am not able at this point to test this with a physical FBSD.
Comment 18 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-11-02 10:59:27 UTC
Thanks!  Yeah, it looks like Zabbix (in particular sending the READ DEFECT DATA) triggers a bug in the mpt(4) driver.  Would you file a PR for that, with that information and the backtrace in a screenshot?  Thanks!
Comment 19 Jim D. 2017-11-02 14:52:51 UTC
MPT related bug report #223381 has been created.