223238 – Persistent iscsi underflow mismatch errors followed by apparent kernel panic

Bug 223238 - Persistent iscsi underflow mismatch errors followed by apparent kernel panic

Summary: Persistent iscsi underflow mismatch errors followed by apparent kernel panic

Status:	Closed Works As Intended

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	11.1-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	Edward Tomasz Napierala

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-10-25 13:45 UTC by Jim D.
Modified:	2017-11-02 16:40 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
Segment of messages log file with iscsi errors (956 bytes, application/x-bzip2) 2017-10-25 23:17 UTC, Jim D.	no flags	Details
Relevant wireshark capture file during iscsi errors (7.85 KB, application/x-bzip2) 2017-10-25 23:18 UTC, Jim D.	no flags	Details
Screenshot of locked-up system (29.94 KB, image/png) 2017-10-25 23:46 UTC, Jim D.	no flags	Details
Synology CHECK_CONDITION on more than one Target/Client (2.87 KB, text/plain) 2017-10-31 19:06 UTC, Jim D.	no flags	Details
HDD assignments for VBOX FreeBSD in question (45.18 KB, image/png) 2017-10-31 19:08 UTC, Jim D.	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jim D. 2017-10-25 13:45:43 UTC

VirtualBox FBSD guest (on a FBSD host)

11.1-RELEASE FreeBSD 11.1-RELEASE #0 r324971M: Wed Oct 25 08:01:10 CDT 2017     root@fbsd11.vboxnet:/usr/obj/usr/src/sys/GENERIC  amd64


Using iscsi connection to a Synology NAS/SAN providing multiple Targets with one or more LUNs results in periodic error messages such as:

kernel: WARNING: 192.168.1.19:3260 (iqn.2000-01.com.synology:ds216j.Target-12.c16e0895b7): underflow mismatch: target indicates 0, we calculated 4

kernel: WARNING: 192.168.1.19:3260 (iqn.2000-01.com.synology:ds216j.Target-12.c16e0895b7): underflow mismatch: target indicates 0, we calculated 8


iSCSI messages observed on Synology SAN/Nas:

2017-10-25T08:43:03-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
2017-10-25T08:43:04-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.
2017-10-25T08:43:05-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
2017-10-25T08:43:05-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.
2017-10-25T08:43:06-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0xb7, sending CHECK_CONDITION.
2017-10-25T08:43:06-05:00 ds216j kernel: target_core_transport.c:1783:target_setup_cmd_from_cdb iSCSI/iqn.1994-09.org.freebsd:fbsd11.vboxnet: Unsupported SCSI Opcode 0x37, sending CHECK_CONDITION.


After some period of time the system hangs/locks up (kernel panic?) and has to be "killed" and rebooted.

Comment 1 Edward Tomasz Napierala freebsd_committer

2017-10-25 21:38:22 UTC

Do you have logs or crash dumps?  Any chance you could obtain a packet trace?

Comment 2 Jim D. 2017-10-25 23:17:58 UTC

Created attachment 187478 [details]
Segment of messages log file with iscsi errors

Comment 3 Jim D. 2017-10-25 23:18:37 UTC

Created attachment 187480 [details]
Relevant wireshark capture file during iscsi errors

Comment 4 Jim D. 2017-10-25 23:19:20 UTC

No kernel core dump file found.

Comment 5 Jim D. 2017-10-25 23:46:03 UTC

Created attachment 187481 [details]
Screenshot of locked-up system

Comment 6 Edward Tomasz Napierala freebsd_committer

2017-10-29 13:17:22 UTC

The warnings on the initiator (FreeBSD) side mean that the NAS doesn't correctly mark underflows.  It's a common bug and not something to worry about (though still a protocol error).

The "Unsupported SCSI Opcode 0xb7" messages on the target (Synology) side mean it doesn't support READ DEFECT DATA command.  That's ok too, and shouldn't break anything.  I'm not sure what makes FreeBSD send the READ DEFECT DATA command to the target.  Do you have some disk monitoring software enabled on the initiator side?

As for the panic message - looking at the traceback it looks like panic in mpt(4) driver, unrelated to iscsi(4).  It might be triggered by the same software that triggers the READ DEFECT DATA on iSCSI.  Can you check which process triggered the panic?  Basically, when you see the backtrace and panic messages, press Scroll Lock and use cursor keys to scroll up; there should be some more information there.

Comment 7 Jim D. 2017-10-29 15:30:00 UTC

On the target side, I know that the Synology SAN/NAS has some disk monitoring. There are scheduled smart tests, but they are weeks to months apart. All Synology administrative console smartctl resuls show "Normal" for the affected target HDD. There are also several Synology applications running related to "cloud services" (thumbd,syncd,cloud-cleand,storaged). There is also an ecryptfs-kthread task and a synoindexd task. There have been a couple of Synology SAN/NAS firwmware updates over the past month or so.

On the initiator side - I am occassionally running Zabbix to query hard drives (client script: ls,dd,fsck,fdisk,iostat,smartctl,diskutil,system_profiler) every 10-15 minutes. On some targets I also run ZFS queries and iSCSI (lsscsi,iscsiadm,iscsictl) queries. But nothing that does presistent monitoring (like every few seconds or non-stop). I have at times 8-10 client hosts (physical and virtual) monitored by either a physical Zabbix or a virtual Zabbix, all of which have iSCSI targets.Right now I have 12 targets with 19 LUNs defined attached to 6 different *nix OS's/versions.

I am unable to see any more screen output when the FBSD VirtualBox machine locks up. Neither the VBox console nor the SSH terminal are responsive. I might be able to run a trace to a file, but don't know if that will capture what you are interested in - the data might never get written to local (VBox) disk even if there was something worth seeing. When I look at locked-up-client-host messages, there is nothing but "last timed entry" and the next startup entry.

I have enabled the /etc/syslog.conf configuration for the "console log". Perhaps that will capture some additional information. Past experience has shown, however, that open log files during a system crash/lock up may not receive all of the possible data in the log file.

Comment 8 Jim D. 2017-10-29 15:33:40 UTC

OH -BTW. The screen shot shows the last system message, the iSCSI message, which is a repeat of the same message several times before the panic so there may not be any additional information of interest prior to the panic messages.

Comment 9 Edward Tomasz Napierala freebsd_committer

2017-10-30 12:48:45 UTC

It might be the Zabbix, but the messages are about five minutes apart, not fifteen.

As you're running on VirtualBox, could you try to change the virtual HBA from the one handled by mpt(4) to something else, for example ahci(4), and see if it still panics?

Comment 10 Jim D. 2017-10-31 19:06:05 UTC

Created attachment 187623 [details]
Synology CHECK_CONDITION on more than one Target/Client

There are the same messagse for other clients as already noted for FreeBSD, but I don't see any result of them on the clients involved.

Comment 11 Jim D. 2017-10-31 19:08:53 UTC

Created attachment 187624 [details]
HDD assignments for VBOX FreeBSD in question

iscsi warnings are related, obviously, to devices assigned to the iscsi driver.

Comment 12 Jim D. 2017-10-31 19:14:21 UTC

The FreeBSD host running VirtualBox has one 15Krpm SAS drive for the "root/main" file systems - assigned to "mps". Two 10Krpm SATA drives contain the various VirtualBox guests which are assigned to "ahci". The "HDD assignments ..." screen shot shows the virtual drive layout for the FreeBSD VBOX guest (1 IDE, 1 SATA, 2 SCSI).

I am not sure what you are asking for regarding changing the "virtual HBA", nor how to "assign it to ahci".

Comment 13 Jim D. 2017-10-31 19:18:23 UTC

FWIW - VirtualBox version on hosting FBSD is 5.1.10 and I am not able at this time to upgrade to a newer version.

Comment 14 Edward Tomasz Napierala freebsd_committer

2017-10-31 20:03:29 UTC

The iSCSI messages are obviously related to the iscsi(4) driver, sure, but they are not fatal - in fact they should be completely harmless.  The panic in the screenshot doesn't seem to be related to iscsi at all - it's panicing in the mpt(4) driver.  Thus my suggestion to switch the emulated hardware to something else, to see if the workaround helps.  If it does, the logical next step would be to file a bug for mpr(4).

Comment 15 Edward Tomasz Napierala freebsd_committer

2017-10-31 20:13:04 UTC

Ah, and regarding the "virtual HBA" - what I'm suggesting is to switch the virtual "SCSI" disks to virtual SATA (AHCI).  It's been a while since I've used VirtualBox, but there should be just a dropdown widget to change the disk type, right?

Comment 16 Jim D. 2017-11-01 09:12:05 UTC

Deleted the VBOX FBSD SCSI drives and controller. Can't change drive type once defined. Not a problem as they were dummy drives just to test VBOX SCSI alongside iSCSI.

Deleted ZFS pool, which used one of the VBOX SCSI drives, and recreated it using repurposed SATA VBOX drive.

Synology/iSCSI WARNING messages still flow but the VBOX FBSD guest has been stable for 9 hours now. Lots of Zabbix queries flow to and from the VBOX FBSD guest without any hiccups.

FWIW - the VBOX SCSI controller is based on either an LSIlogic or BUSlogic driver.

I will leave everything running for awhile longer.

Comment 17 Jim D. 2017-11-02 04:10:54 UTC

VBOX FBSD-11 with iSCSI and ZFS (on SATA) still operational after 28 hours of run time. So it appears that there is an issue with VirtualBox SCSI (LSIlogic/BUSlogic) drives in conjunction with iSCSI. I am not able at this point to test this with a physical FBSD.

Comment 18 Edward Tomasz Napierala freebsd_committer

2017-11-02 10:59:27 UTC

Thanks!  Yeah, it looks like Zabbix (in particular sending the READ DEFECT DATA) triggers a bug in the mpt(4) driver.  Would you file a PR for that, with that information and the backtrace in a screenshot?  Thanks!

Comment 19 Jim D. 2017-11-02 14:52:51 UTC

MPT related bug report #223381 has been created.