Bug 208798 - [iscsi] MaxRecvDataSegmentLength and sockbufsize ignored
Summary: [iscsi] MaxRecvDataSegmentLength and sockbufsize ignored
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Edward Tomasz Napierala
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-14 12:31 UTC by Ben RUBSON
Modified: 2016-09-20 07:43 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ben RUBSON 2016-04-14 12:31:31 UTC
Hello,

I would like to increase PDU size / DataSegmentLength in order to increase iSCSI throughput over a high latency Ethernet link (several ms).
I then tuned the 2 following parameters in /etc/iscsi.conf :
MaxRecvDataSegmentLength=2097152
sockbufsize=2048

However, iscsictl simply returns the following for them : obsolete statement ignored.
iscsictl -v confirms with a DataSegmentLen stuck at 131072.
Below this message is the demonstration.

Could we think about supporting these 2 parameters in order to increase the PDU size please ?

Thank you very much,

Best regards,

Ben



# man iscsi.conf
MaxRecvDataSegmentLength
    the maximum data segment length in bytes it can receive
    in an iSCSI PDU, default is 8192.
sockbufsize
    sets the receiver and transmitter socket buffer size to
    size, in kilobytes.  The default is 128.

# cat /etc/iscsi.conf 
t0 {
	TargetAddress = 192.168.2.3
	TargetName = iqn.2012-06.com.example:target0
	MaxRecvDataSegmentLength = 2097152
	sockbufsize = 2048
}

# iscsictl -An t0
iscsictl: obsolete statement ignored at line 4
iscsictl: obsolete statement ignored at line 5

# iscsictl -v
Session ID:       9
Initiator name:   iqn.1994-09.org.freebsd:myinitiator.com
Initiator portal: 
Initiator alias:  
Target name:      iqn.2012-06.com.example:target0
Target portal:    192.168.2.3
Target alias:     8T-500-P4G69SW0
User:             
Secret:           
Mutual user:      
Mutual secret:    
Session type:     Normal
Session state:    Connected
Failure reason:   
Header digest:    None
Data digest:      None
DataSegmentLen:   131072
ImmediateData:    Yes
iSER (RDMA):      No
Device nodes:     da8
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2016-05-10 10:19:03 UTC
They are ignored, because the defaults are already reasonably high.  The FreeBSD block layer never issues IOs larger than 128k, so increasing MaxRecvDataSegmentLength above that value wouldn't change anything.  The socket buffer for iSCSI defaults to 1048576.
Comment 2 Ben RUBSON 2016-05-10 20:20:33 UTC
Hello,

Thank you for your answer.

My goal is to achieve the max throughput of my iSCSI targets.
Each one is made of one disk, max throughput tested locally 180MB/s.

I made several tests with iPerf across my network link :
several GB/s available (perfect), 10ms between source and target.

Time for iSCSI tests : I can't read (or write) more than some MB/s on my iSCSI targets (each one tested of course individually).

I took iPerf back and tuned it using the same buffer size as used by FreeBSD iSCSI.
It then achieved exactly the same poor numbers as for iSCSI traffic !

Increasing the iPerf buffer size gradually up to 2MB allowed my to achieve 180MB/s.

This is why I would like to be able to tune the iSCSI sockbufsize : it would allow to fully fill TCP links having quite high latency.
MaxRecvDataSegmentLength could also help reducing iSCSI overhead, which could be interesting across high latency links.

Thank you very much for considering this !

Best regards,

Ben
Comment 3 Edward Tomasz Napierala freebsd_committer freebsd_triage 2016-05-11 08:49:48 UTC
You are right in that larger buffers could help improve the throughput.  But again: MaxRecvDataSegmentLength larger than 128kB wouldn't be used anyway, because the rest of the kernel doesn't issue IO requests larger than that, and the default socket buffer size is 1MB.  You might try to increase the socket buffer by tweaking the SOCKBUF_SIZE value in usr.sbin/iscsid/iscsid.h and rebuilding it (cd usr.sbin/iscsid && make clean all install).  I'm not sure if it will actually improve anything in a measurable way, though.  Let me know if it does.
Comment 4 Ben RUBSON 2016-05-11 10:03:21 UTC
Thank you very much, I will try this as soon as possible and will come back to you.

What about default values reported in manual ?

# man iscsi.conf
MaxRecvDataSegmentLength
    the maximum data segment length in bytes it can receive
    in an iSCSI PDU, default is 8192.
sockbufsize
    sets the receiver and transmitter socket buffer size to
    size, in kilobytes.  The default is 128.

Sounds like the default for MaxRecvDataSegmentLength is 131072 (according to iscsictl -v), not 8192, right ?
And for sockbufsize, how do you show its current value ? You say its default value is 1MB, not 128kB (however my iPerf tests would make me think about a value around 128kB).

Thank you again !

Ben
Comment 5 Edward Tomasz Napierala freebsd_committer freebsd_triage 2016-06-01 12:08:28 UTC
Hi.  The manual page for iscsi.conf(5) was rewritten some time ago; you can see the current version here: https://www.freebsd.org/cgi/man.cgi?query=iscsi.conf&apropos=0&sektion=0&manpath=FreeBSD+11-current&arch=default&format=html.  For MaxRecvDataSegmentLength: just trust what "iscsictl -v" tells you :-)

As for the socket size - it's in sources, in usr.sbin/iscsid/iscsid.h.
Comment 6 Ben RUBSON 2016-07-05 16:35:38 UTC
So, goal is to increase iSCSI throughput on high lantency links.

My test link :
- latency around 10ms
- bandwidth around 8 Gbps (iPerf tested, one thread)

My test disks :
- SSD, throughput around 400 MBps

Modification : socket size :
- usr.sbin/iscsid/iscsid.h : #define SOCKBUF_SIZE 10485760
- usr.sbin/ctld/ctld.h     : #define SOCKBUF_SIZE 10485760

Modification : MaxDataSegmentLength :
- sys/dev/iscsi/iscsi_ioctl.h     : #define ISCSI_MAX_DATA_SEGMENT_LENGTH (1024 * 10240)
- sys/dev/iscsi/icl.h             : #define ICL_MAX_DATA_SEGMENT_LENGTH (1024 * 10240)
- sys/dev/iscsi_initiator/iscsi.c : sp->opt.maxRecvDataSegmentLength = 10485760;
- sys/dev/iscsi_initiator/iscsi.c : sp->opt.maxXmitDataSegmentLength = 10485760;
Of course I booted this newly configured kernel (on both sides).

Test :
- With ZFS (recordsize=1M) : dd if=bigfilein of=/dev/null bs=1M
- With ZFS (recordsize=1M) : dd if=bigfilein of=bigfileot bs=1M
- Without ZFS : dd if=/dev/<iscsi_dev> of=/dev/null bs=10M
- Without ZFS : dd if=/dev/zero of=/dev/<iscsi_dev> bs=10M

Results :
With or without modifications, read or write, same results, some MBps only !
Without modifications, with a no-latency link, I can go up-to the disks' throughput.

Interesting document :
http://www.cs.unh.edu/~rdr/pdcn2005.pdf
Reading it, I think that increasing MaxOutstandingR2T could help :
"The larger the number of outstanding R2Ts, the smaller the data transfer time because the initiator does not have to wait for another R2T to arrive at the end of each sequence."
Sounds like we have MaxOutstandingR2T=1 by default, which would force initiator to wait 10ms after each sequence, a performance killer.

Where could I tune this ?
Any other thoughs ?

Many thanks !

Ben
Comment 7 Edward Tomasz Napierala freebsd_committer freebsd_triage 2016-07-06 09:34:50 UTC
Well, like I've said, there's nothing to tune, because all those are maxed out already.  The MaxOutstandingR2T wouldn't do any difference either, because with ImmediateData it's just not used - we push data in SCSI Command PDUs, without waiting for R2T PDUs.
Comment 8 Ben RUBSON 2016-07-11 15:17:27 UTC
So, I made some deeper tests again, to understand the throughput limitation.



### With default MAXPHYS=128KB :

Max throughput never goes above 13MBps, whatever dd_bs (dd block size), DataSegmentLen, MaxBurstLen and FirstBurstLen used.



### With MAXPHYS=1MB :

# With dd_bs=1MB :

DataSegmentLen : 1048576
FirstBurstLen  : 1048576
MaxBurstLen    : 10485760
ImmediateData  : Yes
Max throughput : 70.4 MBps

DataSegmentLen : 1048576
FirstBurstLen  : 1048576
MaxBurstLen    : 1048576
ImmediateData  : Yes
Max throughput : 70.4 MBps

DataSegmentLen : 1048576
FirstBurstLen  : 1048576
MaxBurstLen    : 1048576
ImmediateData  : No
Max throughput : 44.5 MBps

DataSegmentLen : 131072
FirstBurstLen  : 131072
MaxBurstLen    : 1310720
ImmediateData  : Yes
Max throughput : 45.5 MBps

DataSegmentLen : 131072
FirstBurstLen  : 1310720
MaxBurstLen    : 1310720
ImmediateData  : Yes
Max throughput : 45.5 MBps

DataSegmentLen : 131072
FirstBurstLen  : 65536
MaxBurstLen    : 1310720
ImmediateData  : Yes
Max throughput : 45.2 MBps

DataSegmentLen : 131072
FirstBurstLen  : 131072
MaxBurstLen    : 1310720
ImmediateData  : Yes
Max throughput : 13.0 MBps (with dd_bs=128K)

DataSegmentLen : 131072
FirstBurstLen  : 131072
MaxBurstLen    : 131072
ImmediateData  : Yes
Max throughput : 13.0 MBps

DataSegmentLen : 131072
FirstBurstLen  : 65536
MaxBurstLen    : 131072
ImmediateData  : Yes
Max throughput : 12.0 MBps



### Conclusion of these tests :

So MAXPHYS was a bottleneck ; increasing data amount (from 128KB to 1MB) for each iSCSI command helped a lot.
Regarding results themselves, a good setting seems to be DataSegmentLen = FirstBurstLen = MaxBurstLen = MAXPHYS, with of course ImmediateData=yes.
By default we have MAXPHYS=128KB and ZFS_recordsize=128KB, so having the 3 *Len parameters set to 128KB is then a good setting.



### Comments / Questions :

As we can set ZFS_recordsize=1MB, I think MAXPHYS should not be limited to 128KB by default, but to 1MB, so that we would be able to improve ZFS throughput on top of iSCSI.
The 3 *Len parameters would then have to be tuned accordingly (set to 1MB).
Did we already think about increasing default MAXPHYS value ?

According to this paper : http://www.cs.unh.edu/~rdr/pdcn2005.pdf
Other things could help improving throughput, keeping the network channel full :
- MaxConnections
- Number of outstanding SCSI commands
I did not manage to tune them.
Are these features implemented ?

I also tried to set InitialR2T=No during test number 5.
(in usr.sbin/ctld/login.c and usr.sbin/iscsid/login.c)
I would have expected throughput to go up to 70.4 MBps, but it did not change anything.
Is InitialR2T=no feature implemented ?
(not so important, because as MaxOutstandingR2T, InitialR2T will be irrelevant if we have DataSegmentLen=FirstBurstLen=MaxBurstLen and ImmediateData=yes, but just to know)

Many thanks for your support !

Best regards,

Ben
Comment 9 Edward Tomasz Napierala freebsd_committer freebsd_triage 2016-09-19 09:17:23 UTC
Thanks for doing the benchmarks.

Regarding MAXPHYS - I remember there was _some_ talk a few years ago, but I don't remember the reasons it wasn't changed.

Regarding the number of connections and SCSI commands - we only do one connection per session.  The number of commands (tags) is already maxed out at 128.  The InitialR2T is not implemented, because, as you noted, there's no point in implementing it when we have ImmediateData.
Comment 10 Ben RUBSON 2016-09-20 07:43:40 UTC
OK, let's then close this for the moment.
Thank you for all your details !