Bug 227313

Summary: net/isboot-kmod works with net/istgt but not with ctld(8)
Product: Ports & Packages Reporter: Maurizio <maurizio1018>
Component: Individual Port(s)Assignee: freebsd-ports-bugs (Nobody) <ports-bugs>
Status: Closed Overcome By Events    
Severity: Affects Some People CC: darius, dpetrov67, emaste, john, marek, maurizio1018, milios, uzsolt
Priority: ---    
Version: Latest   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Let isboot-kmod working with ctld(8)
none
Patch loader to load isboot.ko if iBFT is present. none

Description Maurizio 2018-04-06 08:34:57 UTC
I am not shure this bug is related to net/isboot-kmod or ctld(8).

I am successfully running a diskless TrueOS, a FreeBSD 12-CURRENT based OS, desktop with the net/istgt port installed in a FreeBSD 11-RELEASE server. I am using this setup without any error, but it is a bit slow.
I want to test ctld on my server, but when the diskless PC loads the isboot driver it loops with a lot of these error messages:
“soreceive BHS is not complete, remaining byte(s)=48
do login failed
soreceive BHS is not complete, remaining byte(s)=48
do login failed”
The source code, of the isboot-kmod driver, that prints the error message is:

/* BHS */
    flags = MSG_WAITALL;
    uio.uio_resid = ISCSI_BHS_LEN;
    error = soreceive(sess->so, NULL, &uio, &mp, NULL, &flags);
    if (error) {
        ISBOOT_ERROR("soreceive BHS error %d\n", error);
        return (error);
    }
    if (uio.uio_resid != 0) {
        ISBOOT_ERROR("soreceive BHS is not complete, remaining "
            "byte(s)=%d\n", (int) uio.uio_resid);
        return (EIO);
    }

source file: iscsi.c, proc: isboot_recv_pdu()
The ISCSI_BHS_LEN constant is 48 and seems that no bytes are read.

On the server, in /var/log/messages, I can read a lots of messages like:
“Mar 28 16:32:39 clover-nas2 ctld[59634]: 192.168.0.164: protocol error: received invalid opcode 0x83
Mar 28 16:32:39 clover-nas2 ctld[40784]: child process 59634 terminated with exit status 1
Mar 28 16:32:40 clover-nas2 ctld[59637]: 192.168.0.164: protocol error: received invalid opcode 0x83
Mar 28 16:32:40 clover-nas2 ctld[40784]: child process 59637 terminated with exit status 1”

The isboot driver send an unrecognized opcode 0x83 to the cltd daemon ..., but I cannot continue, I need some help from the community. :-)

Note: the patch for compiling and running isboot-kmod in FreeBSD 12 is at: bug #226982
Comment 1 Chad Jacob Milios 2018-06-01 15:55:41 UTC
I can confirm I am experiencing this same bug between two FreeBSD 11.1-RELEASE-p10/amd64 systems. I had to apply the same patch to isboot or else I'd get fatal trap 12 during boot at the point isboot does its thing. Both happen to be VIMAGE kernels though I am running ctld/istgt on the host for this testing, not a jail.

The ctld and istgt daemons are configured identically as far as I can get them (CHAP required, no restriction for initiator name or ip address, one portal group, one target, one LUN of 512 byte blocksize which is the same zvol of 4096 byte zfs blocksize.) I can confirm CHAP is indeed authenticating in both cases. To be explicit, istgt allows isboot to succeed and ctld doesn't.

I get the exact console and log messages as Maurizio. Furthermore, I then enabled DEBUG in the isboot module to get the following console output when booting through ctld:

...
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
iSCSI boot driver version 0.2.13
IS: Initiator name: iqn.2011-11.org.nuos:08-00-27-a7-cb-a7
NIC0: IP address: 192.168.1.71
NIC0: Prefix: 24
NIC0: Gateway: 192.168.1.1
NIC0: MAC address: 08:00:27:a7:cb:a7
TGT0: Target IP address: 23.111.168.34
TGT0: Target Port: 3260
TGT0: Target LUN: 0
TGT0: Target name: iqn.2011-11.org.nuos:target0
Boot NIC: em0
Configure IPv4 by NIC0
CHAP Type: CHAP
isboot start, thread id=186a0
kproc_start
Attempting to login to iSCSI target and scan all LUNs.
isboot kproc start, thread id=186d9
isboot iscsi start, thread id=186d9
main loop, thread id=186d9
initialize session, thread id=186d9
Initiator: iqn.2011-11.org.nuos:08-00-27-a7-cb-a7
Target: iqn.2011-11.org.nuos:target0
Target IP=23.111.168.34, Port=3260, LUN=0
strdup(joe)3
strdup(passwordhere)12
strdup(iqn.2011-11.org.nuos:08-00-27-a7-cb-a7)38
strdup(iqn.2011-11.org.nuos:target0)28
strdup(CHAP,None)9
strdup(None,CRC32C)11
strdup(None,CRC32C)11
isboot_connect
open socket
try connect...(186d9)
wait connect...
em0: link state changed to UP
old so=0, new so=0xfffff80006424360
isboot_do_login
login start
xmit PDU
recv PDU
isboot_free_mbufext
soreceive BHS is not complete, remaining byte(s)=48
do login failed
boot retry (59)
isboot_connect
open socket
try connect...(186d9)
wait connect...
em0: link state changed to UP
old so=0, new so=0xfffff80006424000
isboot_do_login
login start
xmit PDU
recv PDU
isboot_free_mbufext
soreceive BHS is not complete, remaining byte(s)=48
do login failed
boot retry (58)
isboot_connect
open socket
try connect...(186d9)
wait connect...
em0: link state changed to UP
old so=0, new so=0xfffff80006423a20
isboot_do_login
login start
xmit PDU
recv PDU
isboot_free_mbufext
soreceive BHS is not complete, remaining byte(s)=48
do login failed
boot retry (57)
...repeats...

I haven't gone so far as to dig into what's happening on the wire but if I'm asked I'll be glad to follow up with any assistance or testing.
Comment 2 Maurizio 2018-06-07 12:52:34 UTC
Created attachment 194065 [details]
Let isboot-kmod working with ctld(8)
Comment 3 Maurizio 2018-06-07 13:05:24 UTC
In https://jelmer.uk/klaus/wireshark/blob/4a812d4ad5e61143d7a18b52f1f32a2a369784f6/packet-iscsi.c
are defined the iscsi opcodes:
 {0x03, "Login Command"},
 {0x83, "Login Command (Retry)"},

In my opinion there are two bugs, one in isboot-kmod and one in ctld:
isboot-kmod send the "Login Command (Retry)" (0x83) opcode and not the  "Login Command" (0x03) opcode, ctld accepts only the "Login Command" (0x03) opcode and not the "Login Command (Retry)" (0x83) opcode.

The above patch let the isboot-kmod driver to send the "Login Command" (0x03) opcode and then, if fails, the "Login Command (Retry)" (0x83) opcode.
Comment 4 Daniel O'Connor 2018-09-06 13:05:12 UTC
Hi,
I tested this on an 11.1 ESXi VM guest and it works great.

BTW if compiling on 12 it should probably default to VIMAGE being on since that is in GENERIC.

Better yet would be to get it into base, and then have the loader detect the iBFT table and load it automatically.

That would allow unmodified images to be booted which would be Pretty Neat (tm) IMO.
Comment 5 Daniel O'Connor 2018-09-07 04:14:29 UTC
Created attachment 196934 [details]
Patch loader to load isboot.ko if iBFT is present.
Comment 6 Daniel O'Connor 2018-09-07 04:17:02 UTC
I added a patch for the loader which sets isboot_load to YES if it finds the iBFT.


I am going to see how much work it is to create a patch to merge isboot into base as well.
Comment 7 Daniel O'Connor 2018-09-07 04:28:11 UTC
(In reply to darius from comment #4)
Oops, I spoke too soon.. 
isboot_load gets set but the loader doesn't seem to notice for some reason.
Comment 8 John Nielsen 2021-05-03 20:16:56 UTC
(In reply to Maurizio from comment #3)
If this is still relevant please submit a PR against https://github.com/jnielsendotnet/isboot
Comment 9 Daniel O'Connor 2021-05-04 01:23:16 UTC
(In reply to John Nielsen from comment #8)
Done
Comment 10 John Nielsen 2021-05-04 04:27:03 UTC
(In reply to darius from comment #9)
What I meant was if someone could go through the patch on this ticket, apply the parts that are still needed and submit a Git pull request against the above repo that would be helpful.
Comment 11 Daniel O'Connor 2021-05-04 12:35:09 UTC
(In reply to John Nielsen from comment #10
Ahh sorry, wrong meaning for 'PR' :)

Unfortunately my iSCSI booting stuff is currently broken so I'm not sure when I'll be able to test any patches..
Comment 12 John Nielsen 2021-05-10 22:20:18 UTC
Thank you @DanielO for the PRs. This issue should be addressed in the next version of the port.
Comment 13 John Nielsen 2021-05-20 18:46:48 UTC
The new version of the port is ready and should be committed soon: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255578

This version includes a patch to "Flip I bit each time as a cheap way to toggle between Login and Login (retry)". From all the above comments I believed that would resolve this issue. At the very least the update makes the port usable on FreeBSD 11, 12, 13 and 14.

Unfortunately on a test system I am unable to log in to a ctld target with isboot; I get a repeating error similar to the original poster: "soreceive BHS is not complete / do login failed"

I will continue to look in to this but if anyone can please test with 0.2.14 and let me know the results I'd appreciate it.
Comment 14 John Nielsen 2021-05-21 13:47:27 UTC
With some help from wireshark I was able to determine that isboot 0.2.14 is in fact alternating between opcodes 0x83 and 0x03, and it succeeds and logs in with 0x03. It then does some device identification (test unit ready, report luns, supported VPDs, device identification, extended inquiry, serial number) which all looks good. But then I get three nearly-identical packets from ctld of opcode 0x20 (NOP in), and then the target side (ctld) sends a FIN, closing the TCP connection.

Will troubleshoot more this weekend.

Again, other testers of 0.2.14 welcome.
Comment 15 John Nielsen 2021-05-22 18:14:23 UTC
It looks like isboot is failing to respond to the ping requests (NOP In) sent by the target (ctld) so the target drops the connection. From https://datatracker.ietf.org/doc/html/rfc3720#section-10.18 :

   Upon receipt of a NOP-In with the Target Transfer Tag set to a valid
   value (not the reserved 0xffffffff), the initiator MUST respond with
   a NOP-Out.  In this case, the NOP-Out Target Transfer Tag MUST
   contain a copy of the NOP-In Target Transfer Tag.

That is not happening. Not sure if other targets don't use this ping behavior but I don't see anything in the isboot code that would meet this requirement.
Comment 16 Daniel O'Connor 2021-05-24 00:48:40 UTC
Odd, I don't have problems with isboot using ctld, perhaps there is a change in ctld (or the kernel I suppose?) which is doing NOPs now.
Comment 17 John Nielsen 2021-05-25 14:38:53 UTC
(In reply to darius from comment #16)
If you have a working ctld setup will you share your FreeBSD version and ctl.conf?
Comment 18 Daniel O'Connor 2021-05-25 23:24:29 UTC
(In reply to John Nielsen from comment #17)
Sure, I have this:

portal-group pg0 {
    discovery-auth-group no-authentication
    listen 0.0.0.0
    listen [::]
}

target iqn.2018-09.au.com.gsoft:target0 {
    auth-group no-authentication
    portal-group pg0

    lun 0 {
            backend block
            path /usr/local/tftp/FreeBSD-12.2-RELEASE-amd64-memstick.img
            blocksize 512
            device-id FREEBSD
            serial 00010001
    }
}

System is 12.2:
FreeBSD vm11.gsoft.com.au 12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 GENERIC  amd64

Note the 12.2 image above has been modified to have isboot.ko on it, I load that and then it mounts root fine.
Comment 19 John Nielsen 2021-06-03 20:55:24 UTC
(In reply to darius from comment #18)
I hacked together a fix to allow isboot to send NOP OUT as ping responses, which seems to make my version of ctld (13-STABLE) happier.

My primary issue was a typo in /etc/fstab :( Once I fixed that I was able to mount root via isboot even without the ping responses, but ctld would time out the connection after no activity and isboot would have to reconnect. Slowed things down and generated a lot of noise on the console.

With my fix that goes away:
https://github.com/jnielsendotnet/isboot/commit/3db550642e03a041450a8cf29891ec6044e49a42

Let me know if you are able to test. I plan to update the port again in the near-ish future.
Comment 20 Zsolt Udvari freebsd_committer freebsd_triage 2024-09-29 18:47:28 UTC
Is it still relevant?
Comment 21 Daniel O'Connor 2024-09-30 05:20:31 UTC
(In reply to Zsolt Udvari from comment #20)
I think it is but I no longer use it so I can't test the changes.
Comment 22 John Nielsen 2024-09-30 15:58:04 UTC
(In reply to Zsolt Udvari from comment #20)
No, this was resolved with version 0.2.15 which has been in the ports tree since last year.
Comment 23 Zsolt Udvari freebsd_committer freebsd_triage 2024-09-30 18:37:52 UTC
(In reply to John Nielsen from comment #22)
Thanks. I close it.