Created attachment 230154 [details] Cause iscsi target to crash due to incorrect PRIV_REQUEST() uses. I suspect this line in cfiscsi_done(): cs = PRIV_REQUEST(io); should be request = PRIV_REQUEST(io); cs = PDU_SESSION(request); And this line in cfiscsi_session_terminate_tasks() looks odd: PRIV_REQUEST(io) = cs; I'd expect ... = request, not ... = cs. I've attached a program that (on my machine) produces panic from the first problem, and (once the first is fixed) a kernel page fault due to the second. panic: refcount 0xffffffd0023eb7b4 wraparound panic() at panic+0x2a _refcount_update_saturated() at _refcount_update_saturated+0x16 refcount_releasen() at refcount_releasen+0x4c refcount_release() at refcount_release+0xc cfiscsi_done() at cfiscsi_done+0x3e ctl_process_done() at ctl_process_done+0x460 ctl_work_thread() at ctl_work_thread+0x13e fork_exit() at fork_exit+0x80 fork_trampoline() at fork_trampoline+0xa panic: Fatal page fault at 0xffffffc0000468c4: 0x000000000000ba --- exception 15, tval = 0xba cfiscsi_done() at cfiscsi_done+0x5e ctl_process_done() at ctl_process_done+0x460 ctl_work_thread() at ctl_work_thread+0x13e fork_exit() at fork_exit+0x80 fork_trampoline() at fork_trampoline+0xa FreeBSD 14.0-CURRENT FreeBSD 14.0-CURRENT #160 main-n250912-e4746deeda02-dirty: Wed Dec 15 14:36:14 EST 2021 rtm@xxx:/usr/obj/usr/rtm/symbsd/src/riscv.riscv64/sys/RTM riscv
So I think the issue is that in cfiscsi_terminate_tasks() there is no request to store in PRIV_REQUEST() so it instead stores the session directly. However, it is possible to send a request on the wire that also results in CTL_IO_TASK and and CTL_TASK_I_T_NEXUS_RESET in cfiscsi_pdu_handle_task_request: case BHSTMR_FUNCTION_I_T_NEXUS_RESET: #if 0 CFISCSI_SESSION_DEBUG(cs, "BHSTMR_FUNCTION_I_T_NEXUS_RESET"); #endif io->taskio.task_action = CTL_TASK_I_T_NEXUS_RESET; break; And in this case, PRIV_REQUEST() is the request instead. Your test program constructs a normal CDB inquiry but then xors the first two bytes with 0x4b03 to generate 0x8b02 (little endian). 0x2 == ISCSI_BHS_OPCODE_TASK_REQUEST and 0x8b as the function == 0x80 | BHSTMR_FUNCTION_I_T_NEXUS_RESET. I think the bug is we need a clean way to differentiate between an on-the-wire reset vs the internally-initiatired reset. One way may be to allocate a new CTL_TASK_INTERNAL_RESET or some such that terminate_tasks would use instead of of CTL_TASK_I_T_NEXUS_RESET, and ctl_run_task would treat as identical to CTL_TASK_I_T_NEXUS_RESET. The only thing I don't yet understand for that is cfumass_done() in the usb storage driver checking for this opcode. Nothing in the cfumass driver schedules a CTL_IO_TASK, so I suspect the code in cfumass_done is just stale copy/paste from cfiscsi_done?
Candidate fix up for review at https://reviews.freebsd.org/D34055
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2e8d1a55258d39f7315fa4f2164c0fce96e79802 commit 2e8d1a55258d39f7315fa4f2164c0fce96e79802 Author: John Baldwin <jhb@FreeBSD.org> AuthorDate: 2022-01-28 21:07:04 +0000 Commit: John Baldwin <jhb@FreeBSD.org> CommitDate: 2022-01-28 21:07:04 +0000 iscsi: Allocate a dummy PDU for the internal nexus reset task. When an iSCSI target session is terminated, an internal nexus reset task is posted to abort existing tasks belonging to the session. Previously, the ctl_io for this internal nexus reset stored a pointer to the session in the slot that normally holds a pointer to the PDU from the initiator that triggered the I/O request. The completion handler then assumed that any nexus reset I/O was due to an internal request and fetched the session pointer (instead of the PDU pointer) from the ctl_io. However, it is possible to trigger a nexus reset via an on-the-wire task management PDU. If such a PDU were sent to the target, then the completion handler would incorrectly treat this request as an internal request and treat the pointer to the received PDU as a pointer to the session instead. To fix, allocate a dummy PDU for the internal reset task and use an invalid opcode to differentiate internal nexus resets from resets requested by the initiator. PR: 260449 Reported by: Robert Morris <rtm@lcs.mit.edu> Reviewed by: mav Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D34055 sys/cam/ctl/ctl_frontend_iscsi.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-)
Any plans to MFC @jhb?
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7f8f8202edc4ae61a65a766de6ec3420a64d3f7d commit 7f8f8202edc4ae61a65a766de6ec3420a64d3f7d Author: John Baldwin <jhb@FreeBSD.org> AuthorDate: 2022-01-28 21:07:04 +0000 Commit: John Baldwin <jhb@FreeBSD.org> CommitDate: 2022-04-29 23:05:55 +0000 iscsi: Allocate a dummy PDU for the internal nexus reset task. When an iSCSI target session is terminated, an internal nexus reset task is posted to abort existing tasks belonging to the session. Previously, the ctl_io for this internal nexus reset stored a pointer to the session in the slot that normally holds a pointer to the PDU from the initiator that triggered the I/O request. The completion handler then assumed that any nexus reset I/O was due to an internal request and fetched the session pointer (instead of the PDU pointer) from the ctl_io. However, it is possible to trigger a nexus reset via an on-the-wire task management PDU. If such a PDU were sent to the target, then the completion handler would incorrectly treat this request as an internal request and treat the pointer to the received PDU as a pointer to the session instead. To fix, allocate a dummy PDU for the internal reset task and use an invalid opcode to differentiate internal nexus resets from resets requested by the initiator. PR: 260449 Reported by: Robert Morris <rtm@lcs.mit.edu> Reviewed by: mav Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D34055 (cherry picked from commit 2e8d1a55258d39f7315fa4f2164c0fce96e79802) sys/cam/ctl/ctl_frontend_iscsi.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-)