System info =========== FreeBSD san01.bsh-ru.playkey.net 11.0-RELEASE-p7 FreeBSD 11.0-RELEASE-p7 #2 r314026M: Wed Mar 29 06:54:29 UTC 2017 emz@san01.bsh-ru.playkey.net:/usr/obj/usr/src/sys/SAN amd64 I have a bunch of FreeBSD's that run ctld and are used as iSCSI target. One of the started to crash periodically after the load was increased twice. "M" in the uname stands for the patchs that trasz@FreeBSD.org gave me (and it fixes some other issue, and isn't probably related to this one, because the system was running just fine with it for several months until the load has increased) and also for increasing the target limit beyond 512. I have a bunch of crashdumps from this system indicating that the crash happens inside the ctl_datamove() call. This system also complains in dmesg pretty often about [...] ctl_datamove: tag 0x293a4d on (23:34:0) aborted ctl_datamove: tag 0x293a4e on (23:34:0) aborted ctl_datamove: tag 0x1fe2d9 on (7:34:0) aborted ctl_datamove: tag 0x293a50 on (23:34:0) aborted ctl_datamove: tag 0x21e6cd on (15:34:0) aborted ctl_datamove: tag 0x21e6ce on (15:34:0) aborted ctl_datamove: tag 0xe453b on (10:34:0) aborted ctl_datamove: tag 0x21e6cf on (15:34:0) aborted ctl_datamove: tag 0x61c355 on (9:34:0) aborted ctl_datamove: tag 0x10cf20 on (2:34:0) aborted ctl_datamove: tag 0xe453d on (10:34:0) aborted ctl_datamove: tag 0x174e97 on (28:34:0) aborted [...] Don't know if it's related, so I decided to mention it. Since the size of these crashdumps is enormous (several gigabytes each), I've put them on the web-server (not the one hosted on the affected system). They can be found here (autoindexed location): http://files2.enaza.ru/freebsd/
A commit references this bug: Author: trasz Date: Thu Mar 15 17:36:14 UTC 2018 New revision: 331013 URL: https://svnweb.freebsd.org/changeset/base/331013 Log: Fix iSCSI target crash on session reinstation. The crash scenario goes like this: there's a thread waiting on "reinstate"; because it doesn't update the timeout counter it gets terminated by the callout; at this point the maintenance thread starts the termination routine. The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops the refcount, which allows the maintenance thread to free its resources. At this point another thread receives a PDU. Boom. PR: 222898, 219866 Reported by: Eugene M. Zheganin <emz at norma.perm.ru> Tested by: Eugene M. Zheganin <emz at norma.perm.ru> Reviewed by: mav@ (earlier version) MFC after: 2 weeks Sponsored by: playkey.net Changes: head/sys/cam/ctl/ctl_frontend_iscsi.c head/sys/cam/ctl/ctl_frontend_iscsi.h
A commit references this bug: Author: trasz Date: Mon Apr 16 17:24:33 UTC 2018 New revision: 332622 URL: https://svnweb.freebsd.org/changeset/base/332622 Log: MFC r331013: Fix iSCSI target crash on session reinstation. The crash scenario goes like this: there's a thread waiting on "reinstate"; because it doesn't update the timeout counter it gets terminated by the callout; at this point the maintenance thread starts the termination routine. The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops the refcount, which allows the maintenance thread to free its resources. At this point another thread receives a PDU. Boom. PR: 222898, 219866 Sponsored by: playkey.net Changes: _U stable/11/ stable/11/sys/cam/ctl/ctl_frontend_iscsi.c stable/11/sys/cam/ctl/ctl_frontend_iscsi.h