Summary: | Can't use iSCSI targets on AWS storage gateway | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | rouslan | ||||||
Component: | kern | Assignee: | Bugmeister <bugmeister> | ||||||
Status: | Closed Overcome By Events | ||||||||
Severity: | Affects Many People | CC: | emaste, me, mga, tewner, trasz | ||||||
Priority: | --- | ||||||||
Version: | 10.3-STABLE | ||||||||
Hardware: | amd64 | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
rouslan
2016-12-11 23:08:44 UTC
Could you make a packet dump, eg using "tcpdump -w"? Remember to use the "-s0" option, to prevent it from truncating packets. Is there still a need of a packetdump? Yes, I can't really do anything without it. Created attachment 183480 [details]
Packet Dump
I'm having the same problem. A bit crazy set up - VMWare with an AWS Storage GW and a FreeNAS The Storage gateway has 2 volumes exported. These are large VMDK files on the underlying VMWare (The appliance wouldn't accept the entire raw disks - that's a different issue) FreeNAS sees the targets: [root@freenas] /tmp# iscsictl Target name Target portal State iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t2l0 10.20.22.203 Connected: da0 iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0 10.20.22.203 Connected: da1 ...but all IO to the disks is hanging, i.e.: [root@freenas] /tmp# iostat -x 1 extended device statistics device r/s w/s kr/s kw/s qlen svc_t %b da1 1.0 0.0 0.0 0.0 1 10216.2 999 I'm getting similar errors as Rouslan Jun 14 05:49:48 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): connection error; reconnecting Jun 14 05:49:48 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): truncated data segment (20 bytes, should be 22) Jun 14 05:49:48 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): underflow mismatch: target indicates 0, we calculated 512 Jun 14 05:49:48 freenas (da1:iscsi6:0:0:0): READ(6). CDB: 08 00 00 00 01 00 Jun 14 05:49:48 freenas (da1:iscsi6:0:0:0): CAM status: CCB request completed with an error Jun 14 05:49:48 freenas (da1:iscsi6:0:0:0): Retrying command Jun 14 05:49:50 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): connection error; reconnecting Jun 14 05:49:50 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): truncated data segment (20 bytes, should be 22) Jun 14 05:49:50 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): underflow mismatch: target indicates 0, we calculated 512 Jun 14 05:49:50 freenas (da1:iscsi6:0:0:0): READ(6). CDB: 08 00 00 00 01 00 Jun 14 05:49:50 freenas (da1:iscsi6:0:0:0): CAM status: CCB request completed with an error Jun 14 05:49:50 freenas (da1:iscsi6:0:0:0): Retrying command Jun 14 05:49:52 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): connection error; reconnecting Jun 14 05:49:52 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): truncated data segment (20 bytes, should be 22) Jun 14 05:49:52 freenas WARNING: 10.20.22.203 (iqn.1997-05.com.amazon:storagegw-vmdkbacked-c2t3l0): underflow mismatch: target indicates 0, we calculated 512 Jun 14 05:49:52 freenas (da1:iscsi6:0:0:0): READ(6). CDB: 08 00 00 00 01 00 Jun 14 05:49:52 freenas (da1:iscsi6:0:0:0): CAM status: CCB request completed with an error Jun 14 05:49:52 freenas (da1:iscsi6:0:0:0): Retrying command From the packet trace it looks like there's a bug in AWS iSCSI target implementation. Basically, they are sending 20 bytes data segment containing, in the first two bytes, the "sense length" value 20. From the specification point of view this is nonsense. I have no idea why it works with other systems. Would it be possible for you to make a similar trace with eg Linux? Thanks! Also - do you know what's the right way to report bugs to Amazon? Created attachment 183493 [details]
Proposed patch.
Could you test the patch to see if it fixes the problem? If it doesn't - could you also provide a packet trace? Thanks! I don't think I'll be able to test the patch - I'm not set up to rebuild FreeBSD. I opened a ticket with Amazon under our support contract. I received an answer from AWS: ...FreeBSD (FreeNAS) are not supported by SGW. SGW analyzed the tcpdump from FreeBSD ticket and believe that even with the sense length/data segment fix deployed, FreeBSD will still not be supported. They have submitted a feature request internally and included support for FreeBSD based initiators on product's road-map. Well, even if not officially supported, it should be able to talk to AWS just fine as soon as AWS properly implements the iSCSI protocol standard :-) Hi, Installed 11.1 release, suffering same problem. Do you need any additional information? Can I test the patch on 11.1 version? Thanks in the advance, Rouslan (In reply to tewner from comment #10) I got same reply from Amazon. In addition: camcontrol devlist reacts with following messages in dmesg: cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072 cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072 cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072 cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072 Thanks, Rouslan (In reply to rouslan from comment #12) Yes, testing the patch on 11.1-RELEASE would be quite useful. Hi, Applied the patch, nothing changed. Should I rebuild world, because I rebuilt only kernel. shell command: iscsictl -A -p awssg.liantech.local -t iqn.1997-05.com.amazon:sgw-f23dd99b-mediachanger dmesg output: WARNING: awssg.liantech.local (iqn.1997-05.com.amazon:sgw-f23dd99b-mediachanger): underflow mismatch: target indicates 0, we calculated 24 ch0 at iscsi1 bus 0 scbus33 target 0 lun 0 ch0: <STK L700 0103> Fixed Changer SPC-3 SCSI device ch0: Serial Number AMZN_SGW-F23DD99B_MC_00001 ch0: 150.000MB/s transfers ch0: Command Queueing enabled ch0: 1600 slots, 10 drives, 1 picker, 1600 portals any operation with mtx (media changer command) dmesg output: cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072 cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072 cam_periph_mapmem: attempt to map 271748 bytes, which is greater than 131072 Thanks in the advance, Rouslan Hi, In addition tried to mount volume: shell: iscsictl -A -p 172.16.0.22 -t iqn.1997-05.com.amazon:z dmesg: WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 24 da1 at iscsi2 bus 0 scbus34 target 0 lun 0 da1: <Amazon Storage Gateway 1.0> Fixed Direct Access SPC-3 SCSI device da1: Serial Number 000be6f0aabc12d6d3 da1: 150.000MB/s transfers da1: Command Queueing enabled da1: 102400MB (209715200 512 byte sectors) WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): connection error; reconnecting WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 512 (da1:iscsi2:0:0:0): READ(6). CDB: 08 00 00 01 01 00 (da1:iscsi2:0:0:0): CAM status: SCSI Status Error (da1:iscsi2:0:0:0): SCSI status: Check Condition (da1:iscsi2:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da1:iscsi2:0:0:0): Retrying command (per sense data) shell: newfs /dev/da1 dmesg: WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): connection error; reconnecting WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 8192 (da1:iscsi2:0:0:0): READ(6). CDB: 08 00 00 10 10 00 (da1:iscsi2:0:0:0): CAM status: SCSI Status Error (da1:iscsi2:0:0:0): SCSI status: Check Condition (da1:iscsi2:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da1:iscsi2:0:0:0): Retrying command (per sense data) WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): connection error; reconnecting WARNING: 172.16.0.22 (iqn.1997-05.com.amazon:z): underflow mismatch: target indicates 0, we calculated 8192 (da1:iscsi2:0:0:0): READ(6). CDB: 08 00 00 10 10 00 (da1:iscsi2:0:0:0): CAM status: SCSI Status Error (da1:iscsi2:0:0:0): SCSI status: Check Condition (da1:iscsi2:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da1:iscsi2:0:0:0): Retrying command (per sense data) root@bacula:~ # Best regards, Rouslan (In reply to rouslan from comment #17) Hm, that's weird. What's the value of kern.iscsi.aws_workaround sysctl? Ah, never mind the previous comment; the output message I've been expecting to see doesn't show up if you don't have debug enabled (like, sysctl kern.iscsi.debug=10). Still, enabling it and trying again might sched some more light. Also - it would be useful to do a packet dump, with the patch applied, and also to enable the debug mode in iscsid (kill the daemon and then run it by hand as "iscsid -d" and copy/paste the output). ^Triage: I'm sorry that this PR did not get addressed in a timely fashion. By now, the version that it was created against is long out of support. Please re-open if it is still a problem on a supported version. |