I ran `sudo sesutil fault all off` on a 12.1-STABLE system with 18 SES expanders and lots of disks. It hung. Procstat shows the following:
49857 103730 sesutil - mi_switch+0xd4 sleepq_wait+0x2c _sleep+0x253 ses_set_elm_status+0x86 enc_ioctl+0x4f1 devfs_ioctl+0xb0 VOP_IOCTL_APV+0x7b vn_ioctl+0x16a devfs_ioctl_f+0x1e kern_ioctl+0x2b7 sys_ioctl+0xfa amd64_syscall+0x387 fast_syscall_common+0xf8
55 100353 enc_daemon8 - mi_switch+0xd4 sleepq_wait+0x2c _sx_xlock_hard+0x3ee ses_publish_cache+0x1d1 enc_daemon+0x37f fork_exit+0x7e fork_trampoline+0xe
It looks like sesutil acquired enc->enc_cache_lock in enc_ioctl, at line 438 (line numbers correspond to 13.0-RC2 sources), then went on to block on cam_periph_sleep(enc->periph, &req, PUSER, "encstat", 0); in ses_set_elm_status at line 2794. Meanwhile, enc_daemon is blocked trying to acquire enc->enc_cache_lock in ses_publish_cache at line 1971. But enc_daemon itself is responsible for waking up sesutil, via the wakeups in either ses_fill_control_request or ses_process_control_request.
I just hit it again on 12.2-RELEASE. I'm going to try to fix it, maybe in August.
I could not reproduce the problem on 14.0-CURRENT with a GENERIC kernel after about 2 hours of trying. However, with a GENERIC-NODEBUG kernel, it reproduced in two minutes.
Created attachment 226787 [details]
Drop enc_cache_lock before cam_periph_sleep
This patch fixes the deadlock by dropping enc_cache_lock before calling cam_periph_sleep. With this patch, I can do "sesutil fault all on; sesutil fault all off" in a tight loop for 3 hours, whereas before it would deadlock within a few minutes. However, it exposed another problem. About once an hour, a sesutil process hangs because of a missing wakeup. I haven't yet figured out why the wakeups are missing, but I don't think they were introduced by this patch.