Was doing hot swap drive testing with a mps-enabled chassis and SAS drives and when I pulled a SAS drive xpt(4) spit out a slew of errors from the following lines of code (from .../sys/cam/cam_xpt.c): 766 /* 767 * It is unfortunate that this is even necessary, 768 * but there are many, many clueless users out ther e. 769 * If this is true, the user is looking for the 770 * passthrough driver, but doesn't have one in his 771 * kernel. 772 */ 773 if (base_periph_found == 1) { 774 printf("xptioctl: pass driver is not in the " 775 "kernel\n"); 776 printf("xptioctl: put \"device pass\" in " 777 "your kernel config file\n"); 778 } Even though the pass device was in the kernel. The check should probably be refined to check for pass devices. How-To-Repeat: 1. Take a Dell/Supermicro machine with an mps controller and at least 2 devices; setup something other than a RAID0 config with either UFS or ZFS (the issue was reproduced with ZFSv28 and the zfsd patches) so you have enough disks to toy with without accidentally taking down the box. 2. Install 9.0-RELEASE [with the zfsd patches if desired]. 3. Add some devd rules to query/manipulate the drive enclosure state via getencstat/setencstat (see .../share/examples/ses/setencstat).
Seeing this on 10.1-STABLE amd64 r274721. Every two seconds, these two lines are added to /var/log/messages. I don't know what's triggering it, but it blows out my logs in about six hours. Oh, I _have_ device pass in my kernel.
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"
I've seen this a few times now, in a case where a disk was swapped, but somehow did not get a pass(4) allocated. <ATA ST10000NM0016-1T SND0> at scbus1 target 8 lun 0 (pass12,da11) <ATA ST10000NM0016-1T SND0> at scbus1 target 9 lun 0 (da12) <ATA ST10000NM0016-1T SND0> at scbus1 target 10 lun 0 (pass14,da13)
the same problem on 13.0-RELEASE-p1 the pass device have appeared, but ada device is not. the only fix is to reboot the system.
# camcontrol devlist <GJ0250EAGSQ HPG2> at scbus2 target 0 lun 0 (ada0,pass0) <WDC WD1000FYPS-01ZKB0 02.01B01> at scbus3 target 0 lun 0 (pass1,ada1) <Hitachi HTS542525K9A300 BBFOC3MP> at scbus4 target 0 lun 0 (ada2,pass2) <WDC WD10EACS-00ZJB0 01.01B01> at scbus5 target 0 lun 0 (ada3,pass3) # ls /dev/pass* /dev/pass0 /dev/pass1 /dev/pass2 /dev/pass3 # ls /dev/ada? /dev/ada0 /dev/ada2 /dev/ada3 # grep kern.crit messages Jun 11 00:06:56 <kern.crit> proxy kernel: ada1 at ahcich1 bus 0 scbus3 target 0 lun 0 Jun 11 00:06:56 <kern.crit> proxy kernel: ada1: <WDC WD1000FYPS-01ZKB0 02.01B01> s/n WD-WCASJ1759184 detached Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff Jun 11 00:07:01 <kern.crit> proxy kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted Jun 11 00:07:05 <kern.crit> proxy kernel: Jun 11 00:07:17 <kern.crit> proxy kernel: xptioctl: pass driver is not in the kernel Jun 11 00:07:17 <kern.crit> proxy kernel: xptioctl: put "device pass" in your kernel config file
I think it is caused by running camcontrol devlist command just several seconds after device removal. I have one glitchy drive, that sometimes disappears and to return it to the system I run camcontrol reset/rescan. I have wrote a devd script that should do it automaticly and have faced this problem. maybe it happened because of low delay. I have tried to inrease delay in the script, but still no luck. It have happened to me several times and now I can easily reproduce this behavior.
So is there a way that I can recreate this? This sounds bizarre and no scripts that would allow me to recreate it are attached to this bug. Also, it would be ideal if this could be triggered using lsiutil disabling and re-enabling the phy. Absent that, I have no clue what's going on and making any progress on this will have to wait for camcorder so I can see the sequence of events. (bumped the version to 13.0-stable since we have recent sighting there)
Also, this may well be 2 bugs because it happens to both ada and da devices.
I think this problem caused by the quick device detach/attach. I can now easily reproduce it on my system with SATA disks, but maybe I should enable some kind of debug while reproducing it?
I have a fleet of thousands of systems that has a few hundred at any given time that have bouncing devices and I don't see this with them. So there must be something else needed to recreate it.
I am running a SuperMicro system with over 800 disks on 13.1-RELEASE-p1 and somehow I got here, I was just starting to investigate if we needed to replace a disk. da786 at mps2 bus 0 scbus10 target 216 lun 0 da786: <SEAGATE ST6000NM0105 K002> s/n ZAD0EDKD0000C715N6ND detached da804 at mps2 bus 0 scbus10 target 204 lun 0 da804: <SEAGATE ST6000NM0105 K002> Fixed Direct Access SPC-4 SCSI device da804: Serial Number ZAD0EDKD0000C715N6ND da804: 600.000MB/s transfers da804: Command Queueing enabled da804: 5723166MB (1465130646 4096 byte sectors) (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 1337 Command timeout on target 210(0x00e0) 60000 set, 60.9778032 elapsed mps2: Sending abort to target 210 for SMID 1337 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 1337 Aborting command 0xfffffe0355b0e498 mps2: Finished abort recovery for target 210 xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 1633 Command timeout on target 210(0x00e0) 60000 set, 60.851374433 elapsed mps2: Sending abort to target 210 for SMID 1633 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 1633 Aborting command 0xfffffe0355b27258 mps2: Finished abort recovery for target 210 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 725 Command timeout on target 210(0x00e0) 60000 set, 60.598754866 elapsed mps2: Sending abort to target 210 for SMID 725 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 725 Aborting command 0xfffffe0355adae38 mps2: Finished abort recovery for target 210 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 1216 Command timeout on target 210(0x00e0) 60000 set, 60.616971935 elapsed mps2: Sending abort to target 210 for SMID 1216 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 1216 Aborting command 0xfffffe0355b04200 mps2: Finished abort recovery for target 210 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 583 Command timeout on target 210(0x00e0) 60000 set, 60.75966146 elapsed mps2: Sending abort to target 210 for SMID 583 (probe202:mps2:0:210:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 length 0 SMID 583 Aborting command 0xfffffe0355acef68 mps2: Finished abort recovery for target 210 (probe202:mps2:0:210:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 1026 Command timeout on target 210(0x00e0) 60000 set, 60.256923432 elapsed mps2: Sending abort to target 210 for SMID 1026 (probe202:mps2:0:210:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 1026 Aborting command 0xfffffe0355af42b0 mps2: Finished abort recovery for target 210 (probe202:mps2:0:210:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe202:mps2:0:210:0): CAM status: Command timeout (probe202:mps2:0:210:0): Retrying command, 3 more tries remain xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file xptioctl: pass driver is not in the kernel xptioctl: put "device pass" in your kernel config file (probe202:mps2:0:210:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 443 Command timeout on target 210(0x00e0) 60000 set, 60.611647969 elapsed mps2: Sending abort to target 210 for SMID 443 (probe202:mps2:0:210:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 443 Aborting command 0xfffffe0355ac3348 mps2: Finished abort recovery for target 210 (probe202:mps2:0:210:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe202:mps2:0:210:0): CAM status: Command timeout (probe202:mps2:0:210:0): Retrying command, 2 more tries remain