Bug 215235 - Can't use iSCSI targets on AWS storage gateway
Summary: Can't use iSCSI targets on AWS storage gateway
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-STABLE
Hardware: amd64 Any
: --- Affects Many People
Assignee: Edward Tomasz Napierala
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-11 23:08 UTC by rouslan
Modified: 2019-08-28 20:44 UTC (History)
4 users (show)

See Also:


Attachments
Packet Dump (156.78 KB, application/octet-stream)
2017-06-14 12:46 UTC, tewner
no flags Details
Proposed patch. (2.02 KB, patch)
2017-06-15 09:39 UTC, Edward Tomasz Napierala
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description rouslan 2016-12-11 23:08:44 UTC
Hi,

I'm trying to mount volume from AWS storage gateway and getting following errors (dmesg):

WARNING: 172.29.16.70 (iqn.1997-05.com.amazon:myvolume): connection error; reconnecting
WARNING: 172.29.16.70 (iqn.1997-05.com.amazon:myvolume): truncated data segment (20 bytes, should be 22)
WARNING: 172.29.16.70 (iqn.1997-05.com.amazon:myvolume): underflow mismatch: target indicates 0, we calculated 512
(da1:iscsi1:0:0:0): READ(6). CDB: 08 00 00 00 01 00
(da1:iscsi1:0:0:0): CAM status: CCB request completed with an error
(da1:iscsi1:0:0:0): Retrying command
WARNING: 172.29.16.70 (iqn.1997-05.com.amazon:myvolume): connection error; reconnecting
WARNING: 172.29.16.70 (iqn.1997-05.com.amazon:myvolume): truncated data segment (20 bytes, should be 22)
WARNING: 172.29.16.70 (iqn.1997-05.com.amazon:myvolume): underflow mismatch: target indicates 0, we calculated 512
(da1:iscsi1:0:0:0): READ(6). CDB: 08 00 00 00 01 00
(da1:iscsi1:0:0:0): CAM status: CCB request completed with an error
(da1:iscsi1:0:0:0): Error 5, Retries exhausted

The target tested from Windows and Linux and working good.

/etc/iscsic.conf:
myiscsi {
  targetaddress = 172.29.16.70;
}

iscsictl output:
root@backup:~ # iscsictl -L
Target name                          Target portal    State
iqn.1997-05.com.amazon:myvolume      172.29.16.70     Connected: da1

camcontrol output:
root@backup:~ # camcontrol devlist
<NECVMWar VMware IDE CDR10 1.00>   at scbus1 target 0 lun 0 (pass0,cd0)
<VMware Virtual disk 1.0>          at scbus2 target 0 lun 0 (pass1,da0)
<Amazon Storage Gateway 1.0>       at scbus3 target 0 lun 0 (da1,pass2)

Thanks in the advance,
Rouslan
Comment 1 Edward Tomasz Napierala freebsd_committer 2017-01-14 10:39:26 UTC
Could you make a packet dump, eg using "tcpdump -w"?  Remember to use the "-s0" option, to prevent it from truncating packets.
Comment 2 Arendtsen 2017-03-05 19:32:32 UTC
Is there still a need of a packetdump?
Comment 3 Edward Tomasz Napierala freebsd_committer 2017-04-11 19:49:06 UTC
Yes, I can't really do anything without it.
Comment 4 tewner 2017-06-14 12:46:47 UTC
Created attachment 183480 [details]
Packet Dump
Comment 5 tewner 2017-06-14 12:52:38 UTC
I'm having the same problem.

A bit crazy set up - 
VMWare with an AWS Storage GW and a FreeNAS
The Storage gateway has 2 volumes exported. These are large VMDK files on the underlying VMWare (The appliance wouldn't accept the entire raw disks - that's a different issue)


FreeNAS sees the targets:

[root@freenas] /tmp# iscsictl 
Target name                          Target portal    State
iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t2l0 10.20.22.203     Connected: da0 
iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0 10.20.22.203     Connected: da1 

...but all IO to the disks is hanging, i.e.:
[root@freenas] /tmp# iostat -x 1
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
da1        1.0   0.0     0.0     0.0    1 10216.2 999 

I'm getting similar errors as Rouslan

Jun 14 05:49:48 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): connection error; reconnecting
Jun 14 05:49:48 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): truncated data segment (20 bytes, should be 22)
Jun 14 05:49:48 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): underflow mismatch: target indicates 0, we calculated 512
Jun 14 05:49:48 freenas (da1:iscsi6:0:0:0): READ(6). CDB: 08 00 00 00 01 00 
Jun 14 05:49:48 freenas (da1:iscsi6:0:0:0): CAM status: CCB request completed with an error
Jun 14 05:49:48 freenas (da1:iscsi6:0:0:0): Retrying command
Jun 14 05:49:50 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): connection error; reconnecting
Jun 14 05:49:50 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): truncated data segment (20 bytes, should be 22)
Jun 14 05:49:50 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): underflow mismatch: target indicates 0, we calculated 512
Jun 14 05:49:50 freenas (da1:iscsi6:0:0:0): READ(6). CDB: 08 00 00 00 01 00 
Jun 14 05:49:50 freenas (da1:iscsi6:0:0:0): CAM status: CCB request completed with an error
Jun 14 05:49:50 freenas (da1:iscsi6:0:0:0): Retrying command
Jun 14 05:49:52 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): connection error; reconnecting
Jun 14 05:49:52 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): truncated data segment (20 bytes, should be 22)
Jun 14 05:49:52 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): underflow mismatch: target indicates 0, we calculated 512
Jun 14 05:49:52 freenas (da1:iscsi6:0:0:0): READ(6). CDB: 08 00 00 00 01 00 
Jun 14 05:49:52 freenas (da1:iscsi6:0:0:0): CAM status: CCB request completed with an error
Jun 14 05:49:52 freenas (da1:iscsi6:0:0:0): Retrying command
Comment 6 Edward Tomasz Napierala freebsd_committer 2017-06-15 09:23:30 UTC
From the packet trace it looks like there's a bug in AWS iSCSI target implementation.  Basically, they are sending 20 bytes data segment containing, in the first two bytes, the "sense length" value 20.  From the specification point of view this is nonsense.  I have no idea why it works with other systems.  Would it be possible for you to make a similar trace with eg Linux?  Thanks!

Also - do you know what's the right way to report bugs to Amazon?
Comment 7 Edward Tomasz Napierala freebsd_committer 2017-06-15 09:39:42 UTC
Created attachment 183493 [details]
Proposed patch.
Comment 8 Edward Tomasz Napierala freebsd_committer 2017-06-15 09:40:23 UTC
Could you test the patch to see if it fixes the problem?  If it doesn't - could you also provide a packet trace?  Thanks!
Comment 9 tewner 2017-06-19 15:48:40 UTC
I don't think I'll be able to test the patch - I'm not set up to rebuild FreeBSD.

I opened a ticket with Amazon under our support contract.
Comment 10 tewner 2017-07-03 10:14:50 UTC
I received an answer from AWS:

...FreeBSD (FreeNAS) are not supported by SGW.

SGW analyzed the tcpdump from FreeBSD ticket and believe that even with the sense length/data segment fix deployed, FreeBSD will still not be supported.

They have submitted a feature request internally and included support for FreeBSD based initiators on product's road-map.
Comment 11 Edward Tomasz Napierala freebsd_committer 2017-07-03 11:02:22 UTC
Well, even if not officially supported, it should be able to talk to AWS just fine as soon as AWS properly implements the iSCSI protocol standard :-)
Comment 12 rouslan 2017-08-28 21:15:28 UTC
Hi,

Installed 11.1 release, suffering same problem. Do you need any additional information? Can I test the patch on 11.1 version?

Thanks in the advance,
Rouslan
Comment 13 rouslan 2017-08-28 21:17:31 UTC
(In reply to tewner from comment #10)
I got same reply from Amazon.
Comment 14 rouslan 2017-08-28 21:22:09 UTC
In addition:
camcontrol devlist reacts with following messages in dmesg:
cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072
cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072
cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072
cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072

Thanks,
Rouslan
Comment 15 Edward Tomasz Napierala freebsd_committer 2017-09-02 10:13:34 UTC
(In reply to rouslan from comment #12)

Yes, testing the patch on 11.1-RELEASE would be quite useful.
Comment 16 rouslan 2017-09-06 20:43:31 UTC
Hi,
Applied the patch, nothing changed. Should I rebuild world, because I rebuilt only kernel.

shell command:
iscsictl -A -p awssg.liantech.local -t iqn.1997-05.com.amazon:sgw-f23dd99b-mediachanger

dmesg output:
WARNING: awssg.liantech.local (iqn.1997-05.com.amazon:sgw-f23dd99b-mediachanger): underflow mismatch: target indicates 0, we calculated 24
ch0 at iscsi1 bus 0 scbus33 target 0 lun 0
ch0: <STK L700 0103> Fixed Changer SPC-3 SCSI device
ch0: Serial Number AMZN_SGW-F23DD99B_MC_00001
ch0: 150.000MB/s transfers
ch0: Command Queueing enabled
ch0: 1600 slots, 10 drives, 1 picker, 1600 portals

any operation with mtx (media changer command)

dmesg output:
cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072
cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072
cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072

Thanks in the advance,
Rouslan
Comment 17 rouslan 2017-09-06 21:07:44 UTC
Hi,
In addition tried to mount volume:

shell:
iscsictl -A -p 172.16.0.22 -t iqn.1997-05.com.amazon:z

dmesg:
WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 24
da1 at iscsi2 bus 0 scbus34 target 0 lun 0
da1: <Amazon Storage Gateway 1.0> Fixed Direct Access SPC-3 SCSI device
da1: Serial Number 000be6f0aabc12d6d3
da1: 150.000MB/s transfers
da1: Command Queueing enabled
da1: 102400MB (209715200 512 byte sectors)
WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): connection error; reconnecting
WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 512
(da1:iscsi2:0:0:0): READ(6). CDB: 08 00 00 01 01 00 
(da1:iscsi2:0:0:0): CAM status: SCSI Status Error
(da1:iscsi2:0:0:0): SCSI status: Check Condition
(da1:iscsi2:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:iscsi2:0:0:0): Retrying command (per sense data)

shell:
newfs /dev/da1

dmesg:
WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): connection error; reconnecting
WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 8192
(da1:iscsi2:0:0:0): READ(6). CDB: 08 00 00 10 10 00 
(da1:iscsi2:0:0:0): CAM status: SCSI Status Error
(da1:iscsi2:0:0:0): SCSI status: Check Condition
(da1:iscsi2:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:iscsi2:0:0:0): Retrying command (per sense data)
WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): connection error; reconnecting
WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 8192
(da1:iscsi2:0:0:0): READ(6). CDB: 08 00 00 10 10 00 
(da1:iscsi2:0:0:0): CAM status: SCSI Status Error
(da1:iscsi2:0:0:0): SCSI status: Check Condition
(da1:iscsi2:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:iscsi2:0:0:0): Retrying command (per sense data)
root@bacula:~ # 

Best regards,
Rouslan
Comment 18 Edward Tomasz Napierala freebsd_committer 2017-09-08 12:34:41 UTC
(In reply to rouslan from comment #17)

Hm, that's weird.  What's the value of kern.iscsi.aws_workaround sysctl?
Comment 19 Edward Tomasz Napierala freebsd_committer 2017-09-08 12:39:11 UTC
Ah, never mind the previous comment; the output message I've been expecting to see doesn't show up if you don't have debug enabled (like, sysctl kern.iscsi.debug=10).  Still, enabling it and trying again might sched some more light.
Comment 20 Edward Tomasz Napierala freebsd_committer 2017-09-08 12:43:13 UTC
Also - it would be useful to do a packet dump, with the patch applied, and also to enable the debug mode in iscsid (kill the daemon and then run it by hand as "iscsid -d" and copy/paste the output).