Hello, I would like to increase PDU size / DataSegmentLength in order to increase iSCSI throughput over a high latency Ethernet link (several ms). I then tuned the 2 following parameters in /etc/iscsi.conf : MaxRecvDataSegmentLength=2097152 sockbufsize=2048 However, iscsictl simply returns the following for them : obsolete statement ignored. iscsictl -v confirms with a DataSegmentLen stuck at 131072. Below this message is the demonstration. Could we think about supporting these 2 parameters in order to increase the PDU size please ? Thank you very much, Best regards, Ben # man iscsi.conf MaxRecvDataSegmentLength the maximum data segment length in bytes it can receive in an iSCSI PDU, default is 8192. sockbufsize sets the receiver and transmitter socket buffer size to size, in kilobytes. The default is 128. # cat /etc/iscsi.conf t0 { TargetAddress = 192.168.2.3 TargetName = iqn.2012-06.com.example:target0 MaxRecvDataSegmentLength = 2097152 sockbufsize = 2048 } # iscsictl -An t0 iscsictl: obsolete statement ignored at line 4 iscsictl: obsolete statement ignored at line 5 # iscsictl -v Session ID: 9 Initiator name: iqn.1994-09.org.freebsd:myinitiator.com Initiator portal: Initiator alias: Target name: iqn.2012-06.com.example:target0 Target portal: 192.168.2.3 Target alias: 8T-500-P4G69SW0 User: Secret: Mutual user: Mutual secret: Session type: Normal Session state: Connected Failure reason: Header digest: None Data digest: None DataSegmentLen: 131072 ImmediateData: Yes iSER (RDMA): No Device nodes: da8
They are ignored, because the defaults are already reasonably high. The FreeBSD block layer never issues IOs larger than 128k, so increasing MaxRecvDataSegmentLength above that value wouldn't change anything. The socket buffer for iSCSI defaults to 1048576.
Hello, Thank you for your answer. My goal is to achieve the max throughput of my iSCSI targets. Each one is made of one disk, max throughput tested locally 180MB/s. I made several tests with iPerf across my network link : several GB/s available (perfect), 10ms between source and target. Time for iSCSI tests : I can't read (or write) more than some MB/s on my iSCSI targets (each one tested of course individually). I took iPerf back and tuned it using the same buffer size as used by FreeBSD iSCSI. It then achieved exactly the same poor numbers as for iSCSI traffic ! Increasing the iPerf buffer size gradually up to 2MB allowed my to achieve 180MB/s. This is why I would like to be able to tune the iSCSI sockbufsize : it would allow to fully fill TCP links having quite high latency. MaxRecvDataSegmentLength could also help reducing iSCSI overhead, which could be interesting across high latency links. Thank you very much for considering this ! Best regards, Ben
You are right in that larger buffers could help improve the throughput. But again: MaxRecvDataSegmentLength larger than 128kB wouldn't be used anyway, because the rest of the kernel doesn't issue IO requests larger than that, and the default socket buffer size is 1MB. You might try to increase the socket buffer by tweaking the SOCKBUF_SIZE value in usr.sbin/iscsid/iscsid.h and rebuilding it (cd usr.sbin/iscsid && make clean all install). I'm not sure if it will actually improve anything in a measurable way, though. Let me know if it does.
Thank you very much, I will try this as soon as possible and will come back to you. What about default values reported in manual ? # man iscsi.conf MaxRecvDataSegmentLength the maximum data segment length in bytes it can receive in an iSCSI PDU, default is 8192. sockbufsize sets the receiver and transmitter socket buffer size to size, in kilobytes. The default is 128. Sounds like the default for MaxRecvDataSegmentLength is 131072 (according to iscsictl -v), not 8192, right ? And for sockbufsize, how do you show its current value ? You say its default value is 1MB, not 128kB (however my iPerf tests would make me think about a value around 128kB). Thank you again ! Ben
Hi. The manual page for iscsi.conf(5) was rewritten some time ago; you can see the current version here: https://www.freebsd.org/cgi/man.cgi?query=iscsi.conf&apropos=0&sektion=0&manpath=FreeBSD+11-current&arch=default&format=html. For MaxRecvDataSegmentLength: just trust what "iscsictl -v" tells you :-) As for the socket size - it's in sources, in usr.sbin/iscsid/iscsid.h.
So, goal is to increase iSCSI throughput on high lantency links. My test link : - latency around 10ms - bandwidth around 8 Gbps (iPerf tested, one thread) My test disks : - SSD, throughput around 400 MBps Modification : socket size : - usr.sbin/iscsid/iscsid.h : #define SOCKBUF_SIZE 10485760 - usr.sbin/ctld/ctld.h : #define SOCKBUF_SIZE 10485760 Modification : MaxDataSegmentLength : - sys/dev/iscsi/iscsi_ioctl.h : #define ISCSI_MAX_DATA_SEGMENT_LENGTH (1024 * 10240) - sys/dev/iscsi/icl.h : #define ICL_MAX_DATA_SEGMENT_LENGTH (1024 * 10240) - sys/dev/iscsi_initiator/iscsi.c : sp->opt.maxRecvDataSegmentLength = 10485760; - sys/dev/iscsi_initiator/iscsi.c : sp->opt.maxXmitDataSegmentLength = 10485760; Of course I booted this newly configured kernel (on both sides). Test : - With ZFS (recordsize=1M) : dd if=bigfilein of=/dev/null bs=1M - With ZFS (recordsize=1M) : dd if=bigfilein of=bigfileot bs=1M - Without ZFS : dd if=/dev/<iscsi_dev> of=/dev/null bs=10M - Without ZFS : dd if=/dev/zero of=/dev/<iscsi_dev> bs=10M Results : With or without modifications, read or write, same results, some MBps only ! Without modifications, with a no-latency link, I can go up-to the disks' throughput. Interesting document : http://www.cs.unh.edu/~rdr/pdcn2005.pdf Reading it, I think that increasing MaxOutstandingR2T could help : "The larger the number of outstanding R2Ts, the smaller the data transfer time because the initiator does not have to wait for another R2T to arrive at the end of each sequence." Sounds like we have MaxOutstandingR2T=1 by default, which would force initiator to wait 10ms after each sequence, a performance killer. Where could I tune this ? Any other thoughs ? Many thanks ! Ben
Well, like I've said, there's nothing to tune, because all those are maxed out already. The MaxOutstandingR2T wouldn't do any difference either, because with ImmediateData it's just not used - we push data in SCSI Command PDUs, without waiting for R2T PDUs.
So, I made some deeper tests again, to understand the throughput limitation. ### With default MAXPHYS=128KB : Max throughput never goes above 13MBps, whatever dd_bs (dd block size), DataSegmentLen, MaxBurstLen and FirstBurstLen used. ### With MAXPHYS=1MB : # With dd_bs=1MB : DataSegmentLen : 1048576 FirstBurstLen : 1048576 MaxBurstLen : 10485760 ImmediateData : Yes Max throughput : 70.4 MBps DataSegmentLen : 1048576 FirstBurstLen : 1048576 MaxBurstLen : 1048576 ImmediateData : Yes Max throughput : 70.4 MBps DataSegmentLen : 1048576 FirstBurstLen : 1048576 MaxBurstLen : 1048576 ImmediateData : No Max throughput : 44.5 MBps DataSegmentLen : 131072 FirstBurstLen : 131072 MaxBurstLen : 1310720 ImmediateData : Yes Max throughput : 45.5 MBps DataSegmentLen : 131072 FirstBurstLen : 1310720 MaxBurstLen : 1310720 ImmediateData : Yes Max throughput : 45.5 MBps DataSegmentLen : 131072 FirstBurstLen : 65536 MaxBurstLen : 1310720 ImmediateData : Yes Max throughput : 45.2 MBps DataSegmentLen : 131072 FirstBurstLen : 131072 MaxBurstLen : 1310720 ImmediateData : Yes Max throughput : 13.0 MBps (with dd_bs=128K) DataSegmentLen : 131072 FirstBurstLen : 131072 MaxBurstLen : 131072 ImmediateData : Yes Max throughput : 13.0 MBps DataSegmentLen : 131072 FirstBurstLen : 65536 MaxBurstLen : 131072 ImmediateData : Yes Max throughput : 12.0 MBps ### Conclusion of these tests : So MAXPHYS was a bottleneck ; increasing data amount (from 128KB to 1MB) for each iSCSI command helped a lot. Regarding results themselves, a good setting seems to be DataSegmentLen = FirstBurstLen = MaxBurstLen = MAXPHYS, with of course ImmediateData=yes. By default we have MAXPHYS=128KB and ZFS_recordsize=128KB, so having the 3 *Len parameters set to 128KB is then a good setting. ### Comments / Questions : As we can set ZFS_recordsize=1MB, I think MAXPHYS should not be limited to 128KB by default, but to 1MB, so that we would be able to improve ZFS throughput on top of iSCSI. The 3 *Len parameters would then have to be tuned accordingly (set to 1MB). Did we already think about increasing default MAXPHYS value ? According to this paper : http://www.cs.unh.edu/~rdr/pdcn2005.pdf Other things could help improving throughput, keeping the network channel full : - MaxConnections - Number of outstanding SCSI commands I did not manage to tune them. Are these features implemented ? I also tried to set InitialR2T=No during test number 5. (in usr.sbin/ctld/login.c and usr.sbin/iscsid/login.c) I would have expected throughput to go up to 70.4 MBps, but it did not change anything. Is InitialR2T=no feature implemented ? (not so important, because as MaxOutstandingR2T, InitialR2T will be irrelevant if we have DataSegmentLen=FirstBurstLen=MaxBurstLen and ImmediateData=yes, but just to know) Many thanks for your support ! Best regards, Ben
Thanks for doing the benchmarks. Regarding MAXPHYS - I remember there was _some_ talk a few years ago, but I don't remember the reasons it wasn't changed. Regarding the number of connections and SCSI commands - we only do one connection per session. The number of commands (tags) is already maxed out at 128. The InitialR2T is not implemented, because, as you noted, there's no point in implementing it when we have ImmediateData.
OK, let's then close this for the moment. Thank you for all your details !