Created attachment 232180 [details] Kernel Breaks on trying to load Ubt0 - USB devices May I humbly refer you to this recent post - https://forums.freebsd.org/threads/ubt0-ubt_bulk_read_callback-usb-err_stalled-13-1prerelease-stable-vs-14-0-current.84293/#post-558371?
This is not related to USB, but a bug in the ATA driver. Thank you for the report, I've added some more people to look at this.
s/ATA/AHCI/
I am not sure if this background information would be helpful - https://forums.freebsd.org/threads/graphics-driver-for-ryzen5600-gigabyte-b550.83816/.
Where did you get the FreeBSD-14 kernel from? It might be we need to ask you to run some commands from "kgdb" to dig out the exact location of the panic. --HPS
I suspect it has something to do with this change. There has been a couple of issues fixed since then: commit 3394d4239b85b5577845d9e6de4e97b18d3dba58 Author: Edward Tomasz Napierala <trasz@FreeBSD.org> Date: Sat May 15 11:17:22 2021 +0100 cam: allocate CCBs from UMA for SCSI and ATA IO This patch makes it possible for CAM to use small CCBs allocated from an periph-specific UMA zone instead of the usual, huge ones. The end result is that CCBs issued via da(4) take 544B (size of ccb_scsiio) instead of the usual 2kB (size of 'union ccb', ~1.5kB, rounded up by malloc(9)). For ATA it's 272B. We waste less memory, we avoid zeroing the unused 1kB, and it should be easier to allocate those CCBs in low memory conditions. It should also be possible to use uma_zone_reserve(9) to improve behaviour in low memory conditions even further. Note that this does not change the size, or the layout, of CCBs as such. CCBs get allocated in various different ways, in particular on the stack, and I don't want to redo all that. Instead, this provides an opt-in mechanism for the periph to declare "my start() callback is fine with receiving a CCB allocated from this UMA zone". In other words, most of the code works exactly as it used to; the change only happens to IOs issued by xpt_run_allockq(), which is - conveniently - pretty much all that matters for performance. The reason for doing it this way is that it's pretty small, localized change, and can be implemented gradually and iteratively: take a periph, make sure its start() callback only casts the CCBs it takes to a particular type of CCB, for example ccb_scsiio, and that it only casts CCBs returned by cam_periph_getccb() to that type, then add UMA zone for that size, and declare it safe to XPT. This is disabled by default. Set 'kern.cam.ada.enable_uma_ccbs=1' and 'kern.cam.da.enable_uma_ccbs=1' tunables to enable it. Testing is welcome; I will flip the default to enable in two weeks from now. Reviewed By: imp Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D28674
Try entering these commands before booting the kernel (in the loader prompt) set kern.cam.ada.enable_uma_ccbs=0 boot --HPS
Done but no difference for the 13.1PreRelease.
None of this was (or is going to be) MFC-Ed, so if it happens in 13, it must be something else.
USB_Err_Stalled started before 13.1Prelease, same as Stable. I could run poudriere builds on Stable despite the error before now. Suddenly, a poudriere command breaks display and tonnes of the error are emitted on the terminal. Then I updated Stable, and it turned to 13.1PreRelease, but the error would not vanish. I did all possible combinations - Bluetooth, USB disabled too, but no luck. Make InstallWorld was problematic at "mt ree...../" but I got around it. I upgraded to Current on a new BE then the kernel crashed. PS: In another related matter, there is a live ECC-mem server that automatically restarts at an attempt to run portmaster in one of its jails. The server runs 13.0-RELEASE and is regularly updated. Using pkgs is not an option. I must have mentioned it in the previous post@Forums.
@Lamia, there seem to be two independent issues here: one for USB, one for AHCI. Please do not mix them. I have doubts that panic you see on 14 should be reproducible on 13.1, but please correct me if I read your wrong. @trasz I think I see the problem, and it may indeed be related to your change. In ahci_issue_recovery() I see such a line: ccb->ccb_h = ch->hold[i]->ccb_h; /* Reuse old header. */ , which should also copy alloc_flags from read periph CCB to the locally allocated one. When it comes time to free the CCB, it is probably getting freed to the wrong zone.
You are correct. Panic on 14.0 is likely not reproducible on 13.1. The update on stable went fine though errors were still emitted. I suspect they are two issues too - AHCI & USB. I have had issues with the drives for the board.
Created attachment 232194 [details] AHCI patch I think this patch should fix the 14 crash inside AHCI by not overwriting alloc_flags field in CCB header.
Please try the patch attached. I only checked that it builds, but haven't really tested.
Lamia: Do you need help building a kernel with this fix?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=25375b1415f8a0b0290b56c00c31d20e218ffab9 commit 25375b1415f8a0b0290b56c00c31d20e218ffab9 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2022-03-05 01:49:05 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2022-03-05 01:55:23 +0000 ahci/siis/mvs: Fix panics after 3394d4239b. Full CCB header overwrites made frees go into wrong zones, causing kernel panics. Instead of copying full header use xpt_setup_ccb(), since the only field I see used from all the header is target_id. PR: 262263 sys/dev/ahci/ahci.c | 3 ++- sys/dev/mvs/mvs.c | 3 ++- sys/dev/siis/siis.c | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-)