Created attachment 212383 [details] iscsi_ioctl_list panic + debug info I found an issue which leads to kernel panic. Test setup: Machine 1 - ISCSI target. Machine 2 - ISCSI initiator. Disable ctl(4) port on target: Machine 1# ctladm port -o off -p 3 Front End Ports disabled After that, initiator trying to reconnect: Machine 2# dmesg ... (da23:iscsi4:0:0:5): Periph destroyed (da22:iscsi4:0:0:4): Periph destroyed (da19:iscsi4:0:0:3): Periph destroyed (da17:iscsi4:0:0:2): Periph destroyed WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error; reconnecting WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error; reconnecting ... If I try to list iscsi sessions on target side - kernel panics. Machine 1# ctladm islist Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 11 fault virtual address = 0x17c fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff831bb8c3 stack pointer = 0x28:0xfffffe01c358f780 frame pointer = 0x28:0xfffffe01c358f810 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 27739 (ctladm) trap number = 12 panic: page fault cpuid = 11 time = 1583839216 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c358f3e0 vpanic() at vpanic+0x185/frame 0xfffffe01c358f440 panic() at panic+0x43/frame 0xfffffe01c358f4a0 trap_fatal() at trap_fatal+0x386/frame 0xfffffe01c358f500 trap_pfault() at trap_pfault+0x99/frame 0xfffffe01c358f580 trap() at trap+0x2a7/frame 0xfffffe01c358f6b0 calltrap() at calltrap+0x8/frame 0xfffffe01c358f6b0 --- trap 0xc, rip = 0xffffffff831bb8c3, rsp = 0xfffffe01c358f780, rbp = 0xfffffe01c358f810 --- cfiscsi_ioctl() at cfiscsi_ioctl+0x753/frame 0xfffffe01c358f810 devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe01c358f860 vn_ioctl() at vn_ioctl+0x132/frame 0xfffffe01c358f970 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe01c358f990 kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe01c358f9f0 sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe01c358fac0 amd64_syscall() at amd64_syscall+0x168/frame 0xfffffe01c358fbf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe01c358fbf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004c19ba, rsp = 0x7fffffffe448, rbp = 0x7fffffffeab0 --- KDB: enter: panic You can see full output with debug in attachment. The panic occurs in function cfiscsi_ioctl_list() https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1718 Due the fact that cs->cs_target pointer is NULL (see attachment). I add some checks to prevent panic: Machine 1# ctladm islist ID Portal Initiator name Target name 1 192.168.101.5 iqn.1994-09.org.freebsd:q1u005.z.vstack.com iqn.2018-11.com.vstack:target4 3 192.168.101.4 iqn.1994-09.org.freebsd:q1u004.z.vstack.com iqn.2018-11.com.vstack:target3 4 192.168.101.3 iqn.1994-09.org.freebsd:q1u003.z.vstack.com iqn.2018-11.com.vstack:target2 74 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none 106 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none 124 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none 130 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none 147 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none 259 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none 330 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none root@q1u001:~ # ps -l -p 0 -HSwww | grep cfiscsimt 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] 0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00 [kernel/cfiscsimt] As you can see, there are many partially initialized session and corresponding maintenance threads: https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1161. After some investigation I found that cs->cs_target == NULL because session doesn't terminated correctly. A new session is created in the cfiscsi_ioctl_handoff() function: https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1490 Find target: ... 1505 ct = cfiscsi_target_find(softc, cihp->target_name, 1506 cihp->portal_group_tag); 1507 if (ct == NULL) { 1508 ci->status = CTL_ISCSI_ERROR; 1509 snprintf(ci->error_str, sizeof(ci->error_str), 1510 "%s: target not found", __func__); 1511 return; 1512 } ... Create new session: allocate struct 'cs', start manteinance thread. 1539 cs = cfiscsi_session_new(softc, cihp->offload); 1540 if (cs == NULL) { 1541 ci->status = CTL_ISCSI_ERROR; 1542 snprintf(ci->error_str, sizeof(ci->error_str), 1543 "%s: cfiscsi_session_new failed", __func__); 1544 cfiscsi_target_release(ct); 1545 return; 1546 } ... Check if target port is online. In our case target port is offline (ct->ct_online == 0). 1583 if (ct->ct_online == 0) { 1584 mtx_unlock(&softc->lock); 1585 cs->cs_handoff_in_progress = false; Terminate session: Send cv_signal() to mantainance thread, deallocate struct, etc. 1586 cfiscsi_session_terminate(cs); 1587 cfiscsi_target_release(ct); 1588 ci->status = CTL_ISCSI_ERROR; 1589 snprintf(ci->error_str, sizeof(ci->error_str), 1590 "%s: port offline", __func__); 1591 return; 1592 } The main problem is that mantainance thread not always receive cv_signal() and stuck in cv_wait(). So we have many partially initilized sessions and mantanance threads. I see the following problems: 1. Flags cs->cs_handoff_in_progress and cs->cs_terminating which used by mantanance thread is changed without the lock. 2. As I understand cv_signal() must be called under lock: 390 /* 391 * Signal a condition variable, wakes up one waiting thread. Will also wakeup 392 * the swapper if the process is not in memory, so that it can bring the 393 * sleeping process in. Note that this may also result in additional threads 394 * being made runnable. Should be called with the same mutex as was passed to 395 * cv_wait held. 396 */ 397 void 398 cv_signal(struct cv *cvp) With next patch I can't reproduce the panic: diff --git a/sys/cam/ctl/ctl_frontend_iscsi.c b/sys/cam/ctl/ctl_frontend_iscsi.c index d5be20c2a215..1b7837aa8355 100644 --- a/sys/cam/ctl/ctl_frontend_iscsi.c +++ b/sys/cam/ctl/ctl_frontend_iscsi.c @@ -1582,8 +1582,10 @@ cfiscsi_ioctl_handoff(struct ctl_iscsi *ci) mtx_lock(&softc->lock); if (ct->ct_online == 0) { mtx_unlock(&softc->lock); + CFISCSI_SESSION_LOCK(cs); cs->cs_handoff_in_progress = false; cfiscsi_session_terminate(cs); + CFISCSI_SESSION_UNLOCK(cs); cfiscsi_target_release(ct); ci->status = CTL_ISCSI_ERROR; snprintf(ci->error_str, sizeof(ci->error_str), 3. Why wee need at all to start the mantanance thread and than if ct->ct_online == 0 immidiatelly destroy it. Can we check ct->ct_online early?