219866 – [iscsi] ctld crashes inside ctl_datamove()

Bug 219866 - [iscsi] ctld crashes inside ctl_datamove()

Summary: [iscsi] ctld crashes inside ctl_datamove()

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	11.0-RELEASE
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	Edward Tomasz Napierala

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-06-08 16:48 UTC by emz
Modified:	2018-04-16 17:33 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description emz 2017-06-08 16:48:42 UTC

System info
===========
FreeBSD san01.bsh-ru.playkey.net 11.0-RELEASE-p7 FreeBSD 11.0-RELEASE-p7 #2 r314026M: Wed Mar 29 06:54:29 UTC 2017     emz@san01.bsh-ru.playkey.net:/usr/obj/usr/src/sys/SAN  amd64

I have a bunch of FreeBSD's that run ctld and are used as iSCSI target. One of the started to crash periodically after the load was increased twice. "M" in the uname stands for the patchs that trasz@FreeBSD.org gave me (and it fixes some other issue, and isn't probably related to this one, because the system was running just fine with it for several months until the load has increased) and also for increasing the target limit beyond 512.

I have a bunch of crashdumps from this system indicating that the crash happens inside the ctl_datamove() call.


This system also complains in dmesg pretty often about

[...]
ctl_datamove: tag 0x293a4d on (23:34:0) aborted
ctl_datamove: tag 0x293a4e on (23:34:0) aborted
ctl_datamove: tag 0x1fe2d9 on (7:34:0) aborted
ctl_datamove: tag 0x293a50 on (23:34:0) aborted
ctl_datamove: tag 0x21e6cd on (15:34:0) aborted
ctl_datamove: tag 0x21e6ce on (15:34:0) aborted
ctl_datamove: tag 0xe453b on (10:34:0) aborted
ctl_datamove: tag 0x21e6cf on (15:34:0) aborted
ctl_datamove: tag 0x61c355 on (9:34:0) aborted
ctl_datamove: tag 0x10cf20 on (2:34:0) aborted
ctl_datamove: tag 0xe453d on (10:34:0) aborted
ctl_datamove: tag 0x174e97 on (28:34:0) aborted
[...]

Don't know if it's related, so I decided to mention it.

Since the size of these crashdumps is enormous (several gigabytes each), I've put them on the web-server (not the one hosted on the affected system).

They can be found here (autoindexed location): http://files2.enaza.ru/freebsd/

Comment 1 commit-hook freebsd_committer

2018-03-15 17:37:13 UTC

A commit references this bug:

Author: trasz
Date: Thu Mar 15 17:36:14 UTC 2018
New revision: 331013
URL: https://svnweb.freebsd.org/changeset/base/331013

Log:
  Fix iSCSI target crash on session reinstation.

  The crash scenario goes like this: there's a thread waiting on "reinstate";
  because it doesn't update the timeout counter it gets terminated by the
  callout; at this point the maintenance thread starts the termination routine.
  The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
  the refcount, which allows the maintenance thread to free its resources.  At
  this point another thread receives a PDU.  Boom.

  PR:		222898, 219866
  Reported by:	Eugene M. Zheganin <emz at norma.perm.ru>
  Tested by:	Eugene M. Zheganin <emz at norma.perm.ru>
  Reviewed by:	mav@ (earlier version)
  MFC after:	2 weeks
  Sponsored by:	playkey.net

Changes:
  head/sys/cam/ctl/ctl_frontend_iscsi.c
  head/sys/cam/ctl/ctl_frontend_iscsi.h

Comment 2 commit-hook freebsd_committer

2018-04-16 17:24:49 UTC

A commit references this bug:

Author: trasz
Date: Mon Apr 16 17:24:33 UTC 2018
New revision: 332622
URL: https://svnweb.freebsd.org/changeset/base/332622

Log:
  MFC r331013:

  Fix iSCSI target crash on session reinstation.

  The crash scenario goes like this: there's a thread waiting on "reinstate";
  because it doesn't update the timeout counter it gets terminated by the
  callout; at this point the maintenance thread starts the termination routine.
  The first thread finishes waiting, proceeds to icl_conn_handoff(), and drops
  the refcount, which allows the maintenance thread to free its resources.  At
  this point another thread receives a PDU.  Boom.

  PR:		222898, 219866
  Sponsored by:	playkey.net

Changes:
_U  stable/11/
  stable/11/sys/cam/ctl/ctl_frontend_iscsi.c
  stable/11/sys/cam/ctl/ctl_frontend_iscsi.h