I am not shure this bug is related to net/isboot-kmod or ctld(8). I am successfully running a diskless TrueOS, a FreeBSD 12-CURRENT based OS, desktop with the net/istgt port installed in a FreeBSD 11-RELEASE server. I am using this setup without any error, but it is a bit slow. I want to test ctld on my server, but when the diskless PC loads the isboot driver it loops with a lot of these error messages: “soreceive BHS is not complete, remaining byte(s)=48 do login failed soreceive BHS is not complete, remaining byte(s)=48 do login failed” The source code, of the isboot-kmod driver, that prints the error message is: /* BHS */ flags = MSG_WAITALL; uio.uio_resid = ISCSI_BHS_LEN; error = soreceive(sess->so, NULL, &uio, &mp, NULL, &flags); if (error) { ISBOOT_ERROR("soreceive BHS error %d\n", error); return (error); } if (uio.uio_resid != 0) { ISBOOT_ERROR("soreceive BHS is not complete, remaining " "byte(s)=%d\n", (int) uio.uio_resid); return (EIO); } source file: iscsi.c, proc: isboot_recv_pdu() The ISCSI_BHS_LEN constant is 48 and seems that no bytes are read. On the server, in /var/log/messages, I can read a lots of messages like: “Mar 28 16:32:39 clover-nas2 ctld[59634]: 192.168.0.164: protocol error: received invalid opcode 0x83 Mar 28 16:32:39 clover-nas2 ctld[40784]: child process 59634 terminated with exit status 1 Mar 28 16:32:40 clover-nas2 ctld[59637]: 192.168.0.164: protocol error: received invalid opcode 0x83 Mar 28 16:32:40 clover-nas2 ctld[40784]: child process 59637 terminated with exit status 1” The isboot driver send an unrecognized opcode 0x83 to the cltd daemon ..., but I cannot continue, I need some help from the community. :-) Note: the patch for compiling and running isboot-kmod in FreeBSD 12 is at: bug #226982
I can confirm I am experiencing this same bug between two FreeBSD 11.1-RELEASE-p10/amd64 systems. I had to apply the same patch to isboot or else I'd get fatal trap 12 during boot at the point isboot does its thing. Both happen to be VIMAGE kernels though I am running ctld/istgt on the host for this testing, not a jail. The ctld and istgt daemons are configured identically as far as I can get them (CHAP required, no restriction for initiator name or ip address, one portal group, one target, one LUN of 512 byte blocksize which is the same zvol of 4096 byte zfs blocksize.) I can confirm CHAP is indeed authenticating in both cases. To be explicit, istgt allows isboot to succeed and ctld doesn't. I get the exact console and log messages as Maurizio. Furthermore, I then enabled DEBUG in the isboot module to get the following console output when booting through ctld: ... ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec iSCSI boot driver version 0.2.13 IS: Initiator name: iqn.2011-11.org.nuos:08-00-27-a7-cb-a7 NIC0: IP address: 192.168.1.71 NIC0: Prefix: 24 NIC0: Gateway: 192.168.1.1 NIC0: MAC address: 08:00:27:a7:cb:a7 TGT0: Target IP address: 23.111.168.34 TGT0: Target Port: 3260 TGT0: Target LUN: 0 TGT0: Target name: iqn.2011-11.org.nuos:target0 Boot NIC: em0 Configure IPv4 by NIC0 CHAP Type: CHAP isboot start, thread id=186a0 kproc_start Attempting to login to iSCSI target and scan all LUNs. isboot kproc start, thread id=186d9 isboot iscsi start, thread id=186d9 main loop, thread id=186d9 initialize session, thread id=186d9 Initiator: iqn.2011-11.org.nuos:08-00-27-a7-cb-a7 Target: iqn.2011-11.org.nuos:target0 Target IP=23.111.168.34, Port=3260, LUN=0 strdup(joe)3 strdup(passwordhere)12 strdup(iqn.2011-11.org.nuos:08-00-27-a7-cb-a7)38 strdup(iqn.2011-11.org.nuos:target0)28 strdup(CHAP,None)9 strdup(None,CRC32C)11 strdup(None,CRC32C)11 isboot_connect open socket try connect...(186d9) wait connect... em0: link state changed to UP old so=0, new so=0xfffff80006424360 isboot_do_login login start xmit PDU recv PDU isboot_free_mbufext soreceive BHS is not complete, remaining byte(s)=48 do login failed boot retry (59) isboot_connect open socket try connect...(186d9) wait connect... em0: link state changed to UP old so=0, new so=0xfffff80006424000 isboot_do_login login start xmit PDU recv PDU isboot_free_mbufext soreceive BHS is not complete, remaining byte(s)=48 do login failed boot retry (58) isboot_connect open socket try connect...(186d9) wait connect... em0: link state changed to UP old so=0, new so=0xfffff80006423a20 isboot_do_login login start xmit PDU recv PDU isboot_free_mbufext soreceive BHS is not complete, remaining byte(s)=48 do login failed boot retry (57) ...repeats... I haven't gone so far as to dig into what's happening on the wire but if I'm asked I'll be glad to follow up with any assistance or testing.
Created attachment 194065 [details] Let isboot-kmod working with ctld(8)
In https://jelmer.uk/klaus/wireshark/blob/4a812d4ad5e61143d7a18b52f1f32a2a369784f6/packet-iscsi.c are defined the iscsi opcodes: {0x03, "Login Command"}, {0x83, "Login Command (Retry)"}, In my opinion there are two bugs, one in isboot-kmod and one in ctld: isboot-kmod send the "Login Command (Retry)" (0x83) opcode and not the "Login Command" (0x03) opcode, ctld accepts only the "Login Command" (0x03) opcode and not the "Login Command (Retry)" (0x83) opcode. The above patch let the isboot-kmod driver to send the "Login Command" (0x03) opcode and then, if fails, the "Login Command (Retry)" (0x83) opcode.
Hi, I tested this on an 11.1 ESXi VM guest and it works great. BTW if compiling on 12 it should probably default to VIMAGE being on since that is in GENERIC. Better yet would be to get it into base, and then have the loader detect the iBFT table and load it automatically. That would allow unmodified images to be booted which would be Pretty Neat (tm) IMO.
Created attachment 196934 [details] Patch loader to load isboot.ko if iBFT is present.
I added a patch for the loader which sets isboot_load to YES if it finds the iBFT. I am going to see how much work it is to create a patch to merge isboot into base as well.
(In reply to darius from comment #4) Oops, I spoke too soon.. isboot_load gets set but the loader doesn't seem to notice for some reason.
(In reply to Maurizio from comment #3) If this is still relevant please submit a PR against https://github.com/jnielsendotnet/isboot
(In reply to John Nielsen from comment #8) Done
(In reply to darius from comment #9) What I meant was if someone could go through the patch on this ticket, apply the parts that are still needed and submit a Git pull request against the above repo that would be helpful.
(In reply to John Nielsen from comment #10 Ahh sorry, wrong meaning for 'PR' :) Unfortunately my iSCSI booting stuff is currently broken so I'm not sure when I'll be able to test any patches..
Thank you @DanielO for the PRs. This issue should be addressed in the next version of the port.
The new version of the port is ready and should be committed soon: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255578 This version includes a patch to "Flip I bit each time as a cheap way to toggle between Login and Login (retry)". From all the above comments I believed that would resolve this issue. At the very least the update makes the port usable on FreeBSD 11, 12, 13 and 14. Unfortunately on a test system I am unable to log in to a ctld target with isboot; I get a repeating error similar to the original poster: "soreceive BHS is not complete / do login failed" I will continue to look in to this but if anyone can please test with 0.2.14 and let me know the results I'd appreciate it.
With some help from wireshark I was able to determine that isboot 0.2.14 is in fact alternating between opcodes 0x83 and 0x03, and it succeeds and logs in with 0x03. It then does some device identification (test unit ready, report luns, supported VPDs, device identification, extended inquiry, serial number) which all looks good. But then I get three nearly-identical packets from ctld of opcode 0x20 (NOP in), and then the target side (ctld) sends a FIN, closing the TCP connection. Will troubleshoot more this weekend. Again, other testers of 0.2.14 welcome.
It looks like isboot is failing to respond to the ping requests (NOP In) sent by the target (ctld) so the target drops the connection. From https://datatracker.ietf.org/doc/html/rfc3720#section-10.18 : Upon receipt of a NOP-In with the Target Transfer Tag set to a valid value (not the reserved 0xffffffff), the initiator MUST respond with a NOP-Out. In this case, the NOP-Out Target Transfer Tag MUST contain a copy of the NOP-In Target Transfer Tag. That is not happening. Not sure if other targets don't use this ping behavior but I don't see anything in the isboot code that would meet this requirement.
Odd, I don't have problems with isboot using ctld, perhaps there is a change in ctld (or the kernel I suppose?) which is doing NOPs now.
(In reply to darius from comment #16) If you have a working ctld setup will you share your FreeBSD version and ctl.conf?
(In reply to John Nielsen from comment #17) Sure, I have this: portal-group pg0 { discovery-auth-group no-authentication listen 0.0.0.0 listen [::] } target iqn.2018-09.au.com.gsoft:target0 { auth-group no-authentication portal-group pg0 lun 0 { backend block path /usr/local/tftp/FreeBSD-12.2-RELEASE-amd64-memstick.img blocksize 512 device-id FREEBSD serial 00010001 } } System is 12.2: FreeBSD vm11.gsoft.com.au 12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 GENERIC amd64 Note the 12.2 image above has been modified to have isboot.ko on it, I load that and then it mounts root fine.
(In reply to darius from comment #18) I hacked together a fix to allow isboot to send NOP OUT as ping responses, which seems to make my version of ctld (13-STABLE) happier. My primary issue was a typo in /etc/fstab :( Once I fixed that I was able to mount root via isboot even without the ping responses, but ctld would time out the connection after no activity and isboot would have to reconnect. Slowed things down and generated a lot of noise on the console. With my fix that goes away: https://github.com/jnielsendotnet/isboot/commit/3db550642e03a041450a8cf29891ec6044e49a42 Let me know if you are able to test. I plan to update the port again in the near-ish future.
Is it still relevant?
(In reply to Zsolt Udvari from comment #20) I think it is but I no longer use it so I can't test the changes.
(In reply to Zsolt Udvari from comment #20) No, this was resolved with version 0.2.15 which has been in the ports tree since last year.
(In reply to John Nielsen from comment #22) Thanks. I close it.