Bug 223381 - mpt driver hangs with iscsi devices and scsi drives defined in VBOX FBSD guest
Summary: mpt driver hangs with iscsi devices and scsi drives defined in VBOX FBSD guest
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-scsi (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-02 14:50 UTC by Jim D.
Modified: 2017-11-19 18:36 UTC (History)
1 user (show)

See Also:


Attachments
screenshot-2017-11-19 (panic: vm_fault) (28.53 KB, image/png)
2017-11-19 07:59 UTC, Jim D.
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jim D. 2017-11-02 14:50:37 UTC
iSCSI Reference bug report: id=223238

VirtualBox (VBOX) guest system: FBSD-11.1-Release, Oct 25 2017
Host system: FBSD-9.3-STABLE
iSCSI server: Synology d216+II SAN/NAS delivering multiple targets and multiple LUNs to guest system.

Reference attachments from above bug report:
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=187481
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=187624

Problem description:

With (VBOX) SCSI drives (LSIlogic/BUSlogic) combined with iSCSI targets, there comes a time when the guest system locks up as shown in attachment 187481 [details] screen shot. The other attachement shows the FBSD VBOX storage assigments during the problem event. This problem is repetitive.

Temporary solution: remove VBOX SCSI drives.

Note that I am not able to test this on a physical FBSD system at this time.
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-11-02 16:40:00 UTC
One more data point - judging by the errors sent by the iSCSI target roughly at the time mpt panics, it might be caused by READ DEFECT DATA command, sent by disk monitoring software on the initiator side (Zabbix).
Comment 2 Jim D. 2017-11-10 02:29:07 UTC
I am not sure about the Zabbix connection. Now that I have completed one project development cycle, I will try adding a VirtualBox SCSI drive to the FreeBSD-11 guest and see what happens with and without the Zabbix client active.
Comment 3 Jim D. 2017-11-18 05:42:49 UTC
Update on situation (11/17/2017, 11:20pm CST)

1) created VirtualBox (VBox) SCSI controller and added VBox 3GB drive to it
2) modified FBSD-11.1 "fstab" to mount SCSI drive at boot (/SCSIhdd)
3) created shell script to run through a series of system commands on 3 iSCSI drives, 1 zfs drive, 1 SCSI drive every 10 minutes; FBSD11 Zabbix client not running
  a) dd (15 times,count=1), fsck -Cn, zpool status
  b) tail /var/log/messages
4) no notable issues for several hours
5) started FBSD11 Zabbix cilent with Zabbix server active but Zabbix queries disabled
6) no notable issues
7) modified Zabbix configuration for FBSD11 client
  a) removed Zabbix FreeBSD template which included various file system checks
  b) left special locally developed template to perform HDD discovery, status check, and other miscellaneous HDD information reporting
8) activated Zabbix server queries to FBSD11 client
9) script above continues to run without any notable issues
10) after some time, kernel WARNING messages related to iSCSI appeared

kernel: WARNING: 192.168.1.19:3260 (iqn.2000-01.com.synology:ds216j.Target-7.c16e0895b7): underflow mismatch: target indicates 0, we calculated 4

11) stopped Zabbix server queries
12) modified special local Zabbix template to exclude extensive HDD status testing (ls, dd, mount, fsck, smartctl, ZFS pool status, iSCSI Target status)
13) reconfigured Zabbix definition for FBSD11 client to use modified template
14) re-started Zabbix server queries to FBSD11 client; above script still running
15) after 1.5 hours, no iSCSI related kernel WARNING messages have appeared

Initial conclusion: something within the extensive HDD status testing is probably causing whatever to occur which results in the kernel WARNING messages. It may, or may not, be the iSCSI Target query using "iscsictl".

More testing is required to further nail done the specific command(s) related to the iSCSI issue which as reported earlier which would eventually cause the FBSD11 client host to hang.
Comment 4 Jim D. 2017-11-18 05:59:04 UTC
Note that the same locally developed Zabbix template for extensive HDD discovery, testing, and reporting works just fine, with OS specific modifications, on/under Solaris-10/11, CentOS-6/7, RHEL-7, OREL-7, and MacOS X Sierra (10.12). Though the Synology SAN shows log entries of the same "underflow" warning messages for most/all of the other OS's - none of them hang up, even when I have all 10, or so, servers and hosts up and active with Zabbix (one physical for 3 physical hosts, one virtual for all virtual hosts). One physical host is not included in this process.
Comment 5 Jim D. 2017-11-18 07:24:59 UTC
I have found the problmematical commands on the VBox FBSD-11.1 client host which invoke the previously noted "kernel WARNING/iSCSI underflow" messages:

smartctl -d scsi -a -T permissive /dev/da3
smartctl -d scsi -A -T permissive /dev/da3

Same effect if "-d scsi" is not included.
"smartctl --scan-open" reports da3 as "scsi" device type which is one of the 3 attached iSCSI Targets.

Interestingly enough, the following smartctl commands from FBSD-11.1 host do NOT cause any kernel messages:

smartctl -d scsi -H -T permissive /dev/da3
smartctl -d scsi -i -T permissive /dev/da3

Installed FBSD-11.1 smartmontools is at version 6.5 2016-05-07.

It might be that the Synology SAN handling of the smartctl request for SAN HDD smartmon attributes from the installed SAN hard drive is problematical, but I don't understand why that would cause FBSD iSCSI underflow messages.

==========================
Results from Synology SAN
same as obtained from FBSD-11.1 host
==========================
smartctl --scan-open
/dev/hda -d ata # /dev/hda, ATA device
/dev/sda -d scsi # /dev/sda, SCSI device

--------------------------------

smartctl -d scsi -i -T permissive /dev/sda
smartctl 6.5 (build date Oct 25 2017) [x86_64-linux-3.10.102] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WD6000HLHX-01JJP
Revision:             04.0
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
LU is fully provisioned
Rotation Rate:        10000 rpm
Logical Unit id:      0x50014ee0ae06dfb2
Serial number:        WD-WX71E71WZ063
Device type:          disk
Local Time is:        Sat Nov 18 01:13:52 2017 CST
SMART support is:     Unavailable - device lacks SMART capability.

--------------------------------

smartctl -d scsi -a -T permissive /dev/sda
smartctl 6.5 (build date Oct 25 2017) [x86_64-linux-3.10.102] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WD6000HLHX-01JJP
Revision:             04.0
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
LU is fully provisioned
Rotation Rate:        10000 rpm
Logical Unit id:      0x50014ee0ae06dfb2
Serial number:        WD-WX71E71WZ063
Device type:          disk
Local Time is:        Sat Nov 18 01:16:00 2017 CST
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging

--------------------------------

smartctl -d scsi -A -T permissive /dev/sda;echo $?
smartctl 6.5 (build date Oct 25 2017) [x86_64-linux-3.10.102] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
255

--------------------------------

smartctl -d scsi -H -T permissive /dev/sda;echo $?
smartctl 6.5 (build date Oct 25 2017) [x86_64-linux-3.10.102] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

255
Comment 6 Jim D. 2017-11-18 07:41:53 UTC
Again - other OS's accessing the same Synology SAN using the same commands don't have any problem with the "iSCSI underflow" messages. None of them lock or hang up. I don't understand enough of how the "iSCSI underflow" messages are generated or why.
Comment 7 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-11-18 20:49:05 UTC
The "iSCSI underflow" is a warning that the other side uses buggy iSCSI implementation.  It's harmless and shouldn't cause any problems.  If you don't want to see it, just mute it down by appending kern.iscsi.debug=0 to /etc/sysctl.conf.
Comment 8 Jim D. 2017-11-19 05:33:56 UTC
This bug report is concerned with the hang-/lock-up/freeze of FBSD-11.1 in VirtualBox where the iSCSI Targets where initially assigned to VBOX SCSI Contoller (LSIlogic/BUSlogic driver) as noted in the two attachments in the first comment block. When iSCSI Targets were attached to the VBox SCSI controller, the FBSD-11.1 VBox guest would freez/lock-up within a handful, or less, of hours of idle operation with iSCSI Targets attached. It just so happened that the last system messages were of the "underflow mismatch" type as noted in Attachement #187481. Also shown in the same attachment is a "vm_fault" followed by a KDB backtrace output. At this point, the FBSD host was non-responsive and had to be manually shut down and restarted (ie; pull the plug).

The first part of my examination of this situation was to attempt to determine and isolate any command(s) that preceded or induced the "underflow mismatch" messages. I had completed this exmination in my last/previous comment. Now that I know what processes were involved with the affected commands, I can isolate them so that they will not be used, which given everything else in operation, should prevent the situation that caused the "underflow mismatch" messages.

Since changing the VBox configuration where the iSCSI Targets were moved to the VBox SATA (ahci) controller - the FBSD-11.1 VBox never locked-up/froze again. It remained stable over whatever time that it was in operation (> 24 hours).

Looking at the KDB output it is possible that the "mpt" driver is involved.

My next step is to reassign one or more iSCSI Targets BACK to the VBox SCSI controller WITHOUT the offending smartctl commands and see if anything changes. Will the FBSD guest still lock-up/freeze without the "underflow mismatch" messages or not.

The suggestion for suppressing the "underflow mismatch" messages is welcome, but it is not these messages alone/specfically that this bug report is really concerned about.
Comment 9 Jim D. 2017-11-19 05:57:00 UTC
Minor clarification: it wasn't that iSCSI Targets were assigned to the VBox SCSI controller, it was that there were several virtual disks attached to the VBox SCSI controller (at guest system boot up) before the iSCSI Targets were connected and attached. So I am mistaken when I mentioned that I will be reassigning the iSCSI Targets to the VBox SCSI controller but that I will be re-adding the VBox SCSI controller with one or more virtual drives attached to it as was done before when the FBSD guest would lockup/freeze, but still disable the problimatical smartctl commands. I apologize for any confusion.
Comment 10 Jim D. 2017-11-19 07:59:07 UTC
Created attachment 188112 [details]
screenshot-2017-11-19 (panic: vm_fault)

Panic with no iSCSI messages at 1:57:44 runtime.
Comment 11 Andriy Gapon freebsd_committer freebsd_triage 2017-11-19 10:18:54 UTC
Comment on attachment 188112 [details]
screenshot-2017-11-19 (panic: vm_fault)

I think that this bug could be related to, if not the same as, bug 222066.
Comment 12 Jim D. 2017-11-19 18:36:41 UTC
Changed VBox SCSI controller "type" to BUSlogic and no more panics/vm_fault. FBSD-11.1 VBox guest up for just under 10.5 hours without any problems encountered when the VBox SCSI controller is defined to use LSIlogic.