Bug 199495 - CAM status: command timeout (fix in HEAD, still in 9.3, 10.1) (relates to bug 195349)
Summary: CAM status: command timeout (fix in HEAD, still in 9.3, 10.1) (relates to bug...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Steven Hartland
URL: https://svnweb.freebsd.org/base?view=...
Keywords: easy, needs-qa, patch
Depends on:
Blocks:
 
Reported: 2015-04-17 04:13 UTC by Yudi
Modified: 2017-07-03 08:09 UTC (History)
8 users (show)

See Also:
koobs: mfc-stable10?
koobs: mfc-stable9?


Attachments
/var/run/dmesg.boot file from v10.1 (39.54 KB, text/plain)
2015-07-01 11:47 UTC, Yudi
no flags Details
/var/run/dmesg.boot file from v9.3 (42.54 KB, text/plain)
2015-07-01 19:28 UTC, Yudi
no flags Details
/var/run/dmesg.boot file from v11 (36.19 KB, text/plain)
2015-07-01 19:29 UTC, Yudi
no flags Details
/var/run/dmesg.boot file from v10.2 prerelease (40.08 KB, text/plain)
2015-07-02 19:15 UTC, Yudi
no flags Details
ahci quirk corrupt debug (1.79 KB, patch)
2015-07-03 10:52 UTC, Steven Hartland
no flags Details | Diff
/var/run/dmesg.boot file from v11+patch (40.80 KB, text/plain)
2015-07-04 15:39 UTC, Yudi
no flags Details
/var/run/dmesg.boot file from stable/10 rev 285310 (39.93 KB, text/plain)
2015-07-09 11:30 UTC, Yudi
no flags Details
/var/run/dmesg.boot file from v11 from rev285311 (42.24 KB, text/plain)
2015-07-09 20:29 UTC, Yudi
no flags Details
ata cleanup patch MFC r280451 (135.24 KB, patch)
2015-07-09 22:06 UTC, Steven Hartland
no flags Details | Diff
/var/run/dmesg.boot file from stable/10 rev 285365 (39.93 KB, text/plain)
2015-07-11 08:48 UTC, Yudi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yudi 2015-04-17 04:13:53 UTC
I got an HP microserver n40l and recently upgraded to 10.1 and this bug showed up.
Tried hint.ahci.0.msi="1" in /boot/loader.conf  but it did not fix the issue fully. I only see this issue on the root pool (ada2p3 and ada3p3 are in zfs mirror) and that too mainly when I run zpool scrub. 
without hint.ahci.0.msi="1" there there were quite a lot of these timeouts. 

This issue is not present when I boot into 9.3 dataset on the same system.

There are 4 disks in the system ada0 - ada3. I only see the CAM status timeout on just ada2 and ada3.
ada0 and ada1 are on SATA 3Gbps link but ada2 and ada3 are on SATA 1.5Gbps. ada2 and ada3 use the internal ODD SATA port and the eSATA port respectively and they are in ZFS mirror config and have the root on them.

Apparently this issue can be fixed by flashing with a modified BIOS  and making the ODD sata port and eSATA port run at 3Gbps. I am not keen on using the BIOS hack.

If there is no quick fix for this, I can probably stay on 9.3 given it has the same EOL as 10.1.

Please let me know if any additional information is needed. 

Thank you.
Comment 1 stenio 2015-04-21 09:53:04 UTC
Hi,

I think I have the same problem: for some reason my hardware (Fabiatech FX5621) doesn't work with FreeBSD 10.1 while it works perfectly with version 8.3. I tried to boot from a USB drive, disabling from BIOS all IDE drives and using all hints commands I found with no luck! This is what I tried:

set hint.atapci.1.msi=0
set hint.atapci.0.msi=0
set hint.ata.1.mode="PIO4"
set hint.ata.0.mode="PIO4"
set hint.acpi.0.disabled=1
set kern.cam.ada.write_cache=0
boot

The boot loops with this error:

(aprobe0:ata1:0:1:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata1:0:1:0): CAM status: Command timeout
(aprobe0:ata1:0:1:0): Retrying command
run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
(aprobe0:ata1:0:1:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata1:0:1:0): CAM status: Command timeout
(aprobe0:ata1:0:1:0): Error 5, Retries exhausted
(aprobe0:ata1:0:1:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata1:0:1:0): CAM status: Command timeout
(aprobe0:ata1:0:1:0): Retrying command
run_interrupt_driven_hooks: still waiting after 120 seconds for xpt_config
(aprobe0:ata1:0:1:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata1:0:1:0): CAM status: Command timeout
(aprobe0:ata1:0:1:0): Error 5, Retries exhausted

Please let me know if you have any suggestion.

Thanks,
Stenio
Comment 2 Yudi 2015-06-30 09:31:16 UTC
I just want to update my original post it had some inaccurate info.
First, this bug is present in v9.3 as well as v10.1 but NOT in v11. 

I have added the following to /boot/loader.conf but did not resolve the issue:
"hint.atapci.0.msi="0" (also 1)
"hint.atapci.1.msi="0"
"hint.ahci.0.msi="1"

I even tried rebuilding the kernel but looks like it did not fix the issue. Well, this was my first time building a custom kernel, so I am not sure I got this right.

the steps I followed were:

I used the LINT config instead of creating my own,

# svn checkout svn-mirror/base/head     /usr/src
# cd /usr/src/sys/amd64/conf && make LINT
# cd /usr/src 
# make buildkernel KERNCONF=LINT 
# make installkernel KERNCONF=LINT

one of the replies I receive on the mailing list suggested this is not the right way to rebuild the kernel. He said I was using v11 source files to build the kernel for v10.

I dont think that was accurate. Can someone please explain how I can apply the patch.

Bit of info on the system,
the Hitachi drives are in a 2-way mirror connected to 3Gbps SATA ports (AHCI mode) which I am not using until I fix this issue. OS is installed on the samsung drives (2-way mirror) connected to 1.5Gbps SATA ports  running in IDE mode (cannot change this to AHCI without installing a hacked version of BIOS which I dont want to do).

I would greatly appreciate any advise on how to fix this issue.
Below I added some output from the system that might help.

=============================
output of "camcontrol devlist"
===================================
<Hitachi HDS723030ALA640 MKAOAA10>  at scbus0 target 0 lun 0 (ada0,pass0)
<Hitachi HDS723030ALA640 MKAOAA10>  at scbus2 target 0 lun 0 (ada1,pass1)
<SAMSUNG HM080HI AB100-17>         at scbus4 target 0 lun 0 (ada2,pass2)
<SAMSUNG HM080HI AB100-17>         at scbus4 target 1 lun 0 (ada3,pass3)
======================================================
====================================
ERROR from /var/log/messages
=================================

Jun 28 21:22:47 10p1test kernel: (ada3:ata0:0:1:0): READ_DMA. ACB: c8 00 88 00 41 44 00 00 00 00 01 00
Jun 28 21:22:47 10p1test kernel: (ada3:ata0:0:1:0): CAM status: Command timeout
Jun 28 21:22:47 10p1test kernel: (ada3:ata0:0:1:0): Retrying command
Jun 28 21:23:21 10p1test kernel: (ada2:ata0:0:0:0): READ_DMA. ACB: c8 00 0d 30 c0 45 00 00 00 00 01 00
Jun 28 21:23:21 10p1test kernel: (ada2:ata0:0:0:0): CAM status: Command timeout
Jun 28 21:23:21 10p1test kernel: (ada2:ata0:0:0:0): Retrying command
Jun 28 21:40:33 10p1test kernel: (ada2:ata0:0:0:0): READ_DMA. ACB: c8 00 51 30 70 45 00 00 00 00 01 00
Jun 28 21:40:33 10p1test kernel: (ada2:ata0:0:0:0): CAM status: Command timeout
Jun 28 21:40:33 10p1test kernel: (ada2:ata0:0:0:0): Retrying command
========================================

output of  "dmesg | grep ahci"

    =============================
    ahci0: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port 0xd000-0xd007,0xc000-0xc003,0xb000-0xb007,0xa000-0xa003,0x9000-0x900f mem 0xfe6ffc00-0xfe6fffff irq 19 at device 17.0 on pci0
    ahci0: AHCI v1.20 with 4 3Gbps ports, Port Multiplier supported
    ahcich0: <AHCI channel> at channel 0 on ahci0
    ahcich1: <AHCI channel> at channel 1 on ahci0
    ahcich2: <AHCI channel> at channel 2 on ahci0
    ahcich3: <AHCI channel> at channel 3 on ahci0
    ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
    ada1 at ahcich2 bus 0 scbus2 target 0 lun 0

======================================


output of "dmesg |grep ada"
============================
 random: selecting highest priority adaptor <Dummy>
random: selecting highest priority adaptor <Yarrow>
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <Hitachi HDS723030ALA640 MKAOAA10> ATA8-ACS SATA 3.x device
ada0: Serial Number MK0301YHKT8A2A
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
ada1: <Hitachi HDS723030ALA640 MKAOAA10> ATA8-ACS SATA 3.x device
ada1: Serial Number MK0301YHKV1JWD
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad8
ada2 at ata0 bus 0 scbus4 target 0 lun 0
ada2: <SAMSUNG HM080HI AB100-17> ATA-7 SATA 1.x device
ada2: Serial Number S0ZAJD0P700140
ada2: 150.000MB/s transfers (SATA, UDMA5, PIO 8192bytes)
ada2: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad0
ada3 at ata0 bus 0 scbus4 target 1 lun 0
ada3: <SAMSUNG HM080HI AB100-17> ATA-7 SATA 1.x device
ada3: Serial Number S0ZAJD0P700102
ada3: 150.000MB/s transfers (SATA, UDMA5, PIO 8192bytes)
ada3: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad1
===============================
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2015-07-01 11:05:42 UTC
Appears related to r278034 which may not? have been MFC'd to stable/9 and stable/10. 

CC committer (smh) of that change, opportunity to address this resolve this prior to 10.2-RELEASE
Comment 4 Yudi 2015-07-01 11:12:57 UTC
after speaking with users at #freebsd IRC channel, I realized I made couple of mistakes in rebuilding the kernel. 
I was advised to track base/stable/10 rather than base/head and use GENERIC instead of LINT.

rebuild the kernel again as follows:


renamed /usr/src
Then created /usr/src
# svn checkout https://svn0.us-west.FreeBSD.org/base/stable/10    /usr/src
# cd /usr/src/sys/amd64/conf
# cp GENERIC MYKERNEL1
# cd /usr/src
# make buildkernel KERNCONF=MYKERNEL1
# make installkernel KERNCONF=MYKERNEL1

rebooted the system and the issue is still present.
Comment 5 Steven Hartland freebsd_committer freebsd_triage 2015-07-01 11:22:52 UTC
(In reply to Kubilay Kocak from comment #3)
It's not clear from this report what the controller in question is but 278034 has already been merged to stable/10, stable/9 is too different for a direct merge.
Comment 6 Steven Hartland freebsd_committer freebsd_triage 2015-07-01 11:24:31 UTC
(In reply to Steven Hartland from comment #5)
Ok so saw it was the same controller, so should be fixed by that commit.
Comment 7 Steven Hartland freebsd_committer freebsd_triage 2015-07-01 11:25:18 UTC
(In reply to Yudi from comment #4)
Can you confirm the quirks list for you controller from a verbose boot please?
Comment 8 Yudi 2015-07-01 11:47:27 UTC
Created attachment 158218 [details]
/var/run/dmesg.boot file from v10.1

attached /var/run/dmesg.boot file
Comment 9 Yudi 2015-07-01 11:48:37 UTC
(In reply to Steven Hartland from comment #7)
Steven, 
attached /var/run/dmesg.boot file
Let me know if anything else is needed
Comment 10 Steven Hartland freebsd_committer freebsd_triage 2015-07-01 13:23:53 UTC
As you're only seeing issues on ada2 and ada3 (the ports @1.5Gbps) I suspect the issue is indeed caused by a BIOS SATA config / compatibility issue.

If that's the case applying the BIOS fix is likely the best cause of action.
Comment 11 Steven Hartland freebsd_committer freebsd_triage 2015-07-01 14:03:30 UTC
It would be interesting to see the verbose boot from 9.3 to see if there are any significant differences on how it reports your hardware has been initialised.
Comment 12 Yudi 2015-07-01 19:28:23 UTC
Created attachment 158236 [details]
/var/run/dmesg.boot file from v9.3
Comment 13 Yudi 2015-07-01 19:29:37 UTC
Created attachment 158237 [details]
/var/run/dmesg.boot file from v11
Comment 14 Yudi 2015-07-01 19:38:29 UTC
(In reply to Steven Hartland from comment #11)
Please ignore the comment from my first post, this bug is present in v9.3 but NOT in v11. I really hope it can be backported to v10.1.
I am not comfortable using 3rd party BIOS hack ( people had success with it as it enables AHCI mode for the two ports that are running in IDE mode right now). 
If this bug can't be fixed in v10.1, I will most likely use v11.
 
I have attached /var/run/dmesg.boot files from v9.3 and v11, I found this on line 410 in v11 file - ahci0: quirks=0x1b5f0<ATI_PMP_BUG,1MSI,FORCE_PI> but not in the v10.1 or v9.3.
Comment 15 Steven Hartland freebsd_committer freebsd_triage 2015-07-02 16:12:07 UTC
Ok so could you try stable/10 and provide a verbose from that?

While reading down thought that your 10.1 log was actually as stable/10 as thats what you'd said you where running just above.
Comment 16 Steven Hartland freebsd_committer freebsd_triage 2015-07-02 16:17:48 UTC
(In reply to Yudi from comment #14)
For reference once we identify the issue its highly unlikely to be fixed in 10.1 as that boat has sailed, so stable/10 or the upcoming 10.2 release would be your options.
Comment 17 Yudi 2015-07-02 19:15:43 UTC
Created attachment 158260 [details]
/var/run/dmesg.boot file from v10.2 prerelease
Comment 18 Yudi 2015-07-02 19:40:10 UTC
(In reply to Steven Hartland from comment #16)
This is not a production system, I am still testing it. If it can be fixed in 10.2 that will be great.

 (In reply to Steven Hartland from comment #15)

I have different versions installed to different ZFS datasets on the same system.
Based on your comment at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195349#c30  I rebuilt the kernel first by tracking base/head and LINT config (my comment #2). Users in the IRC channel suggested I track base/stable/10 instead and use GENERIC. So I rebuilt the kernel again tracking base/stable/10 and GENERIC (my comment 4). This was done on v10.1 install.
I followed the instructions at https://www.freebsd.org/doc/handbook/book.html#kernelconfig

Not sure why the uname -a output shows 10.2-PRERELEASE when I only rebuilt the kernel. 

Wouldn't that be the case when rebuilding world? (https://www.freebsd.org/doc/handbook/makeworld.html)

The last dmesg.boot file I attached is from the install where the kernel was rebuilt. 

It has a quirk on line 474 - ahci0: quirks=0x22000<ATI_PMP_BUG,1MSI> different from the one from v11.
Comment 19 Steven Hartland freebsd_committer freebsd_triage 2015-07-03 08:43:37 UTC
(In reply to Yudi from comment #18)
Just to confirm your still seeing the issue on stable/10?
Comment 20 Yudi 2015-07-03 10:05:08 UTC
(In reply to Steven Hartland from comment #19)
Yes, in 10.2-PRERELEASE.
Comment 21 Steven Hartland freebsd_committer freebsd_triage 2015-07-03 10:50:44 UTC
Something not right there on HEAD, as the output from quirks looks corrupted.
ahci0: quirks=0x1b5f0<ATI_PMP_BUG,1MSI,FORCE_PI>

This is not a valid combination as there is only one controller which is flagged with FORCE_PI and its not the one your using; in addition there's loads of flags in there which are never set (11011010111110000).
Comment 22 Steven Hartland freebsd_committer freebsd_triage 2015-07-03 10:52:25 UTC
Created attachment 158282 [details]
ahci quirk corrupt debug

Could you test with head + this patch to see if will narrow down where the corruption of quirks is occuring.
Comment 23 Yudi 2015-07-04 15:39:19 UTC
Created attachment 158338 [details]
/var/run/dmesg.boot file from v11+patch

as requested.
rebuild kernel from revision 285130 after applying the patch.
Comment 24 Steven Hartland freebsd_committer freebsd_triage 2015-07-06 09:55:11 UTC
No idea how you got previous output from 11. I've fixed the incorrect output of quirks include FORCE_PI in https://svnweb.freebsd.org/changeset/base/285200.

With regards to the main issue, there are no differences between stable/10 and head in the ahci device which would effect the HW you have.

Given this could I ask you to ensure re-test both HEAD and stable/10, as of today, to confirm we're still seeing the same behaviour?
Comment 25 Yudi 2015-07-09 11:30:56 UTC
Created attachment 158562 [details]
/var/run/dmesg.boot file from stable/10 rev 285310
Comment 26 Yudi 2015-07-09 11:33:13 UTC
(In reply to Steven Hartland from comment #24)
Sorry Steven, been very busy with work. 
rebuilt kernel from rev 285310, stable/10, the issue is still present (using zpool scrub, the issue pops up right away).
I will rebuild kernel from HEAD tomorrow and let you know.

Thanks!
Comment 27 Yudi 2015-07-09 11:36:26 UTC
tail /var/log/messages output from rev 285310 build:
=======================================================
Jul  9 21:26:31 test kernel: (ada3:ata0:0:1:0): READ_DMA. ACB: c8 00 3a af 05 45 00 00 00 00 01 00
Jul  9 21:26:31 test kernel: (ada3:ata0:0:1:0): CAM status: Command timeout
Jul  9 21:26:31 test kernel: (ada3:ata0:0:1:0): Retrying command
Jul  9 21:33:35 test kernel: ata0: (ada2:ata0:0:0:0): READ_DMA. ACB: c8 00 f8 1f a7 43 00 00 00 00 01 00
Jul  9 21:33:35 test kernel: reset tp1 mask=03 ostat0=50 ostat1=50
Jul  9 21:33:35 test kernel: (ada2:ata0:0:0:0): CAM status: Command timeout
Jul  9 21:33:35 test kernel: (ada2:ata0:0:0:0): Retrying command
Jul  9 21:33:35 test kernel: ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
Jul  9 21:33:35 test kernel: ata0: stat1=0x50 err=0x01 lsb=0x00 msb=0x00
Jul  9 21:33:35 test kernel: ata0: reset tp2 stat0=50 stat1=50 devices=0x3
Comment 28 Yudi 2015-07-09 20:29:35 UTC
Created attachment 158570 [details]
/var/run/dmesg.boot file from v11 from rev285311

rebuilt kernel from HEAD, revision 285311 and attached the dmesg.boot file.
Comment 29 Steven Hartland freebsd_committer freebsd_triage 2015-07-09 22:06:53 UTC
Created attachment 158572 [details]
ata cleanup patch MFC r280451

mav kindly pointed out that your second 2 drivers, which are causing the problem, are attached to the legacy ata driver not ahci.

There's one noticeable commit which effects ata for ati (your chipset) in head which isn't in stable/10.

This attachment is said commit merged to stable/10. So if you could apply this on top of your stable/10 build and see if eliminates the timeouts?

Oh your can remove the debug patch from your head builds as that seems to be resolved now.
Comment 30 Yudi 2015-07-11 08:47:45 UTC
(In reply to Steven Hartland from comment #29)
# cd /usr/src
# svn update /usr/src      //(updated to rev 285365 (stable/10))
# patch < ata-cleanup.patch

I checked the output later and the below three files failed to patch.
  
Sys/arm/mv/mv_sata.c
Sys/dev/ata/chipsets/ata-adaptec.c 
sys/dev/ata/chipsets/ata-ahci.c 


I rebuilt the kernel but the issue is still present. 
/var/run/dmesg.boot file after rebuild, is attached.
Comment 31 Yudi 2015-07-11 08:48:17 UTC
Created attachment 158605 [details]
/var/run/dmesg.boot file from stable/10 rev 285365
Comment 32 Steven Hartland freebsd_committer freebsd_triage 2015-07-13 08:58:39 UTC
All three of the files that failed where removed by the patch, so that should have not impact apart from dangling files.

With ata the same we're running out of options.

The next thing to try is a binary chop on head to try an determine the change that fixed the issue. It can be quite time consuming but given the details from generic inspection I can't see anything more obvious to try.
Comment 33 Steven Hartland freebsd_committer freebsd_triage 2015-07-13 09:17:42 UTC
Just seen another report on freebsd-scsi@ which might be related.

Can you try setting: kern.racct.enable="0" in /boot/loader.conf for stable/10 and see if that has any impact on the timeouts at all (don't need a dmesg). But confirming its set correct with sysctl kern.racct.enable would be a good idea.
Comment 34 Yudi 2015-07-14 20:12:34 UTC
(In reply to Steven Hartland from comment #33)
tried kern.racct.enable="0" in /boot/loader.conf for stable/10, did not make a difference.

What's a binary chop?
Comment 35 Steven Hartland freebsd_committer freebsd_triage 2015-07-14 21:02:04 UTC
Identify the edges where the issue occurs and doesn't occur then pick the mid point, see if it does or doesn't happen then adjust go for the next mid point

e.g. working = commit 1000, broken = 500
1. checkout, build and test commit 750
1.1. if works repeat with commit 625
1.2. if broken repeat with commit 875
...

Each time you test you halve the number of commits that may be the fix, until you have identified the fix / fixes.
Comment 36 Yudi 2015-07-17 23:06:04 UTC
(In reply to Steven Hartland from comment #35)

If I have time to go down the binary chop method, which revisions do I start with?

Another option is to just update the BIOS and move on. If this bug is not affecting many then I guess updating the BIOS might save a lot of time.
Comment 37 Steven Hartland freebsd_committer freebsd_triage 2015-08-27 08:31:08 UTC
Did you manage to make any progress on this Yudi?
Comment 38 Yudi 2015-09-05 07:18:13 UTC
(In reply to Steven Hartland from comment #37)
Sorry haven't had much free time. 
I flashed the 3rd party BIOS and enabled AHCI on all the SATA ports. The bug disappeared right away.
looks like the bug is restricted to IDE ports. 

v11 definitely fixes the ATA/IDE bug but I could not run v11 on a production server.

Thank you very much for all your help and advise. 
Let me know if there is anything I can help with, I would like to contribute back if possible.

cheers
Yudi
Comment 39 Steven Hartland freebsd_committer freebsd_triage 2015-09-05 21:06:12 UTC
If you can still test the failure it would be good to see if we can identify the kernel changes which fix the issue so we can ensure they get MFC'ed.

I detailed the binary chop method above, so would be good if you could help identify the relaxant fix.
Comment 40 Yudi 2015-09-12 09:09:59 UTC
(In reply to Steven Hartland from comment #39)

I can reproduce the bug when I set the two SATA ports to IDE mode in 10.1. 
Before I go ahead with the binary chop and narrow down the commit that fixed this in HEAD can you please confirm the below process is correct.

I am guessing because this is fixed in HEAD, I need to start with a copy of the kernel from HEAD, I know that r283160 from HEAD did not have the bug, so I am guessing I need to go back from this one until I find the bug.

checkout r270000 (is that reasonable) and see if the bug is present.
once I can find the bug, I go forward using the binary chop.

Should I do this on a v11 install and roll back the kernel or do it on 10.1 release and install the kernel from the HEAD?
Comment 41 Steven Hartland freebsd_committer freebsd_triage 2015-11-02 09:08:12 UTC
(In reply to Yudi from comment #40)
First off sorry Yudi, I missed the notification of your reply.

Yes your approach is reasonable, start a point which you think would have the bug and work through the space in HEAD (v11) until you identify the issue.

As it kernel related then you only need to build / install kernel.

I would advise you to create a cut down kernel config to test with as that will be much quicker to build than a full generic.

One think that came up the other day was that another user had a timeout issue and that turned out to be related to smartmontools. When they ran it caused his drives to throw timeout errors, so just in case your running them thought it would be worth mentioning.
Comment 42 sasamotikomi 2016-01-13 14:17:00 UTC
I can repeat this bug:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206200
Workaround for non-patched kernel:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202712#c3
Later I will test this patch from current.