Bug 262263 - ahci: Unaligned free to UMA zone (ada_ccb)
Summary: ahci: Unaligned free to UMA zone (ada_ccb)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Alexander Motin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-01 09:11 UTC by Lamia
Modified: 2022-03-05 01:58 UTC (History)
5 users (show)

See Also:


Attachments
Kernel Breaks on trying to load Ubt0 - USB devices (451.81 KB, image/jpeg)
2022-03-01 09:11 UTC, Lamia
no flags Details
AHCI patch (512 bytes, patch)
2022-03-02 00:45 UTC, Alexander Motin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Lamia 2022-03-01 09:11:00 UTC
Created attachment 232180 [details]
Kernel Breaks on trying to load Ubt0 - USB devices

May I humbly refer you to this recent post - https://forums.freebsd.org/threads/ubt0-ubt_bulk_read_callback-usb-err_stalled-13-1prerelease-stable-vs-14-0-current.84293/#post-558371?
Comment 1 Hans Petter Selasky freebsd_committer freebsd_triage 2022-03-01 09:25:01 UTC
This is not related to USB, but a bug in the ATA driver.

Thank you for the report, I've added some more people to look at this.
Comment 2 Hans Petter Selasky freebsd_committer freebsd_triage 2022-03-01 09:25:25 UTC
s/ATA/AHCI/
Comment 3 Lamia 2022-03-01 09:30:11 UTC
I am not sure if this background information would be helpful - https://forums.freebsd.org/threads/graphics-driver-for-ryzen5600-gigabyte-b550.83816/.
Comment 4 Hans Petter Selasky freebsd_committer freebsd_triage 2022-03-01 09:38:28 UTC
Where did you get the FreeBSD-14 kernel from?

It might be we need to ask you to run some commands from "kgdb" to dig out the exact location of the panic.

--HPS
Comment 5 Hans Petter Selasky freebsd_committer freebsd_triage 2022-03-01 09:40:46 UTC
I suspect it has something to do with this change. There has been a couple of issues fixed since then:

commit 3394d4239b85b5577845d9e6de4e97b18d3dba58
Author: Edward Tomasz Napierala <trasz@FreeBSD.org>
Date:   Sat May 15 11:17:22 2021 +0100

    cam: allocate CCBs from UMA for SCSI and ATA IO
    
    This patch makes it possible for CAM to use small CCBs allocated
    from an periph-specific UMA zone instead of the usual, huge ones.
    The end result is that CCBs issued via da(4) take 544B (size of
    ccb_scsiio) instead of the usual 2kB (size of 'union ccb', ~1.5kB,
    rounded up by malloc(9)).  For ATA it's 272B.  We waste less
    memory, we avoid zeroing the unused 1kB, and it should be easier
    to allocate those CCBs in low memory conditions.  It should also
    be possible to use uma_zone_reserve(9) to improve behaviour
    in low memory conditions even further.
    
    Note that this does not change the size, or the layout, of CCBs
    as such.  CCBs get allocated in various different ways, in particular
    on the stack, and I don't want to redo all that.  Instead, this
    provides an opt-in mechanism for the periph to declare "my start()
    callback is fine with receiving a CCB allocated from this UMA zone".
    In other words, most of the code works exactly as it used to; the
    change only happens to IOs issued by xpt_run_allockq(), which
    is - conveniently - pretty much all that matters for performance.
    
    The reason for doing it this way is that it's pretty small, localized
    change, and can be implemented gradually and iteratively: take a
    periph, make sure its start() callback only casts the CCBs it takes
    to a particular type of CCB, for example ccb_scsiio, and that it only
    casts CCBs returned by cam_periph_getccb() to that type, then add UMA
    zone for that size, and declare it safe to XPT.
    
    This is disabled by default.  Set 'kern.cam.ada.enable_uma_ccbs=1'
    and 'kern.cam.da.enable_uma_ccbs=1' tunables to enable it.  Testing
    is welcome; I will flip the default to enable in two weeks from now.
    
    Reviewed By:    imp
    Sponsored by:   NetApp, Inc.
    Sponsored by:   Klara, Inc.
    Differential Revision:  https://reviews.freebsd.org/D28674
Comment 6 Hans Petter Selasky freebsd_committer freebsd_triage 2022-03-01 09:42:48 UTC
Try entering these commands before booting the kernel (in the loader prompt)

set kern.cam.ada.enable_uma_ccbs=0
boot

--HPS
Comment 7 Lamia 2022-03-01 10:34:05 UTC
Done but no difference for the 13.1PreRelease.
Comment 8 Edward Tomasz Napierala freebsd_committer freebsd_triage 2022-03-01 13:56:39 UTC
None of this was (or is going to be) MFC-Ed, so if it happens in 13, it must be something else.
Comment 9 Lamia 2022-03-01 15:04:49 UTC
USB_Err_Stalled started before 13.1Prelease, same as Stable. I could run poudriere builds on Stable despite the error before now. 

Suddenly, a poudriere command breaks display and tonnes of the error are emitted on the terminal. Then I updated Stable, and it turned to 13.1PreRelease, but the error would not vanish. I did all possible combinations - Bluetooth, USB disabled too, but no luck. Make InstallWorld was problematic at "mt ree...../" but I got around it.

I upgraded to Current on a new BE then the kernel crashed. 

PS: In another related matter, there is a live ECC-mem server  that automatically restarts at an attempt to run portmaster in one of its jails. The server runs 13.0-RELEASE and is regularly updated. Using pkgs is not an option. I must have mentioned it in the previous post@Forums.
Comment 10 Alexander Motin freebsd_committer freebsd_triage 2022-03-01 15:06:49 UTC
@Lamia, there seem to be two independent issues here: one for USB, one for AHCI.  Please do not mix them.  I have doubts that panic you see on 14 should be reproducible on 13.1, but please correct me if I read your wrong.

@trasz I think I see the problem, and it may indeed be related to your change.  In ahci_issue_recovery() I see such a line:
    ccb->ccb_h = ch->hold[i]->ccb_h;        /* Reuse old header. */

, which should also copy alloc_flags from read periph CCB to the locally allocated one.  When it comes time to free the CCB, it is probably getting freed to the wrong zone.
Comment 11 Lamia 2022-03-01 15:32:18 UTC
You are correct. Panic on 14.0 is likely not reproducible on 13.1. The update on stable went fine though errors were still emitted. I suspect they are two issues too - AHCI & USB. I have had issues with the drives for the board.
Comment 12 Alexander Motin freebsd_committer freebsd_triage 2022-03-02 00:45:49 UTC
Created attachment 232194 [details]
AHCI patch

I think this patch should fix the 14 crash inside AHCI by not overwriting alloc_flags field in CCB header.
Comment 13 Alexander Motin freebsd_committer freebsd_triage 2022-03-02 00:47:14 UTC
Please try the patch attached.  I only checked that it builds, but haven't really tested.
Comment 14 Hans Petter Selasky freebsd_committer freebsd_triage 2022-03-02 12:49:33 UTC
Lamia: Do you need help building a kernel with this fix?
Comment 15 commit-hook freebsd_committer freebsd_triage 2022-03-05 01:56:26 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=25375b1415f8a0b0290b56c00c31d20e218ffab9

commit 25375b1415f8a0b0290b56c00c31d20e218ffab9
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2022-03-05 01:49:05 +0000
Commit:     Alexander Motin <mav@FreeBSD.org>
CommitDate: 2022-03-05 01:55:23 +0000

    ahci/siis/mvs: Fix panics after 3394d4239b.

    Full CCB header overwrites made frees go into wrong zones, causing
    kernel panics.  Instead of copying full header use xpt_setup_ccb(),
    since the only field I see used from all the header is target_id.

    PR:     262263

 sys/dev/ahci/ahci.c | 3 ++-
 sys/dev/mvs/mvs.c   | 3 ++-
 sys/dev/siis/siis.c | 3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)