Created attachment 242225 [details] Example ctl configuration file If two separate processes do "service ctld restart", then they can race. The result is ctl ports that are inaccessible (clients can't connect), and the ports don't get torn down after ctld exits. Attempting to start ctld again fails to fix the stuck ports (though new ports can be added). The only remedy is to restart. Steps to reproduce ================== 1) Create about 32 zvols (i've also observed this bug with file-backed LUNs) 2) Configure /etc/ctl.conf as shown in the attached file 3) Run the following in two separate terminals: for ((i=0; i<10000; i=$i+1)); do service ctld onerestart|| break; done After some time, usually < 1 second, one terminal will fail with an error like this: ctld: LUN modification error: LUN 31 is not managed by the block backend ctld: failed to modify lun "disk31", CTL lun 31 ctld: CTL_LUN_MAP ioctl failed: Device not configured ctld: failed to apply configuration; exiting /etc/rc.d/ctld: WARNING: failed to start ctld Then, kill the loop in the other terminal. Then ensure that no ctld process is running, and do "ctladm portlist". All 32 ports will be shown. Attempting to start ctld one more time will result in an error like this: ctld: error returned from port creation request: target "iqn.2018-10.myhost:disk0" for portal group tag 257 already exists ctld: failed to update port pg0-iqn.2018-10.myhost:disk0
I've discovered a highly undocumented command that can allow one to recover from this situation without a reboot. First shutdown ctld, then remove each iSCSI target port with the undocumented command, and then restart ctld. The command is: ctladm port -d iscsi -r -p DONTCARE -O cfiscsi_portal_group_tag=TAG -O cfiscsi_target=TARGET Where "DONTCARE" must be an integer but otherwise its value does not matter, "TAG" can be found via "ctladm portlist -v" and is typically 257 or greater, and TARGET can also be found via "ctladm portlist -v". I'll update the man page to document this syntax and also add ATF tests for it.
I've confirmed that the cause of the problem is that ctld opens its pidfile too late. It reads the current list of targets from the kernel, then reads the config file, then opens its pidfile, and then applies changes based on the differences between the kernel's state and the config file. But the kernel's state could've changed before the pidfile got opened. I've hacked ctld to open the pidfile earlier and verified that this fixes the problem. However, doing it properly is hard, because the code for opening the config file is intermingled with the code for interacting with the kernel. The biggest problem is the conf_pports list, added in 057abcb00413010898f3046f7704444b8f537bab .