The ioctl for PASSTHROUGH in the nvme driver should return an ERROR if the controller is in reset state,for example checking the ctrlr->is_resetting bit, Otherwise the ioctl might hang.
I'm seeing something similar, a box with 22x nvme and 2x SSD, running either 12-STABLE or 13-CURRENT (as of today). After certain amount of time (and workload), ioctls are hanging, rendering the box almost unresponsive to some operations.
ddb showed that there are two suspects:
the rest of processes and threads are in sched_switch()
OCR is Online Controller Reset.