Bug 259541 - [smartpqi] panic: Segment size is not aligned, in a call to bus_dmamap_load_ccb() from smartpqi_cam_action.
Summary: [smartpqi] panic: Segment size is not aligned, in a call to bus_dmamap_load_c...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: amd64 Any
: --- Affects Some People
Assignee: Warner Losh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-30 02:33 UTC by Ka Ho Ng
Modified: 2024-02-19 05:42 UTC (History)
8 users (show)

See Also:
imp: mfc-stable13-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ka Ho Ng freebsd_committer freebsd_triage 2021-10-30 02:33:55 UTC
The following is a stack trace and ddb(4) show pcpu output:
----
db> show pcpu
cpuid        = 151
dynamic pcpu = 0xfffffe020877ca40
curthread    = 0xfffffe019120a1e0: pid 571 tid 106262 critnest 1 "fsck_msdosfs"
curpcb       = 0xfffffe019120a6f0
fpcurthread  = 0xfffffe019120a1e0: pid 571 "fsck_msdosfs"
idlethread   = 0xfffffe01a0eb9900: tid 100154 "idle: cpu151"
self         = 0xffffffff82ea7000
curpmap      = 0xfffffe019120f518
tssp         = 0xffffffff82ea7384
rsp0         = 0xfffffe03dda97000
kcr3         = 0xffffffffffffffff
ucr3         = 0xffffffffffffffff
scr3         = 0x0
gs32p        = 0xffffffff82ea7404
ldt          = 0xffffffff82ea7444
tss          = 0xffffffff82ea7434
curvnet      = 0
spin locks held:
db> bt
Tracing pid 571 tid 106262 td 0xfffffe019120a1e0
kdb_enter() at kdb_enter+0x37/frame 0xfffffe03dda966d0
vpanic() at vpanic+0x1b8/frame 0xfffffe03dda96730
panic() at panic+0x43/frame 0xfffffe03dda96790
bounce_bus_dmamap_load_ma() at bounce_bus_dmamap_load_ma+0x3a9/frame 0xfffffe03dda96810
_bus_dmamap_load_bio() at _bus_dmamap_load_bio+0x113/frame 0xfffffe03dda96870
bus_dmamap_load_ccb() at bus_dmamap_load_ccb+0x92/frame 0xfffffe03dda968d0
smartpqi_cam_action() at smartpqi_cam_action+0xdb9/frame 0xfffffe03dda96940
xpt_run_devq() at xpt_run_devq+0x2f9/frame 0xfffffe03dda969a0
xpt_action_default() at xpt_action_default+0x471/frame 0xfffffe03dda969f0
dastart() at dastart+0x336/frame 0xfffffe03dda96a40
xpt_run_allocq() at xpt_run_allocq+0xb3/frame 0xfffffe03dda96a90
dastrategy() at dastrategy+0x6f/frame 0xfffffe03dda96ac0
g_disk_start() at g_disk_start+0x31c/frame 0xfffffe03dda96b30
g_io_request() at g_io_request+0x2d7/frame 0xfffffe03dda96b60
g_part_start() at g_part_start+0x289/frame 0xfffffe03dda96be0
g_io_request() at g_io_request+0x2d7/frame 0xfffffe03dda96c10
g_dev_strategy() at g_dev_strategy+0x155/frame 0xfffffe03dda96c40
physio() at physio+0x49b/frame 0xfffffe03dda96ce0
devfs_read_f() at devfs_read_f+0xe5/frame 0xfffffe03dda96d40
dofileread() at dofileread+0x81/frame 0xfffffe03dda96d90
sys_read() at sys_read+0xc0/frame 0xfffffe03dda96e00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe03dda96f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03dda96f30
--- syscall (3, FreeBSD ELF64, sys_read), rip = 0x8011c4cca, rsp = 0x7fffffffdac8, rbp = 0x7fffffffdb00 ---
db>
----

I git-bisected and found the offending commits to be 9fac68fc3853b696c8479bb3a8181d62cb9f59c9 in main, and 1569aab1cb38a38fb619f343ed1e47d4b4070ffe in stable/13. The following changes in the commits look suspicious in my eye:
----
@@ -313,27 +347,31 @@ smartpqi_attach(device_t dev)
...
-
-        /*
-         * Create DMA tag for mapping buffers into controller-addressable space.
-         */
-        if (bus_dma_tag_create(softs->os_specific.pqi_parent_dmat,/* parent */
-				1, 0,			/* algnmnt, boundary */
+    /*
+     * Create DMA tag for mapping buffers into controller-addressable space.
+     */
+    if (bus_dma_tag_create(softs->os_specific.pqi_parent_dmat,/* parent */
+				PAGE_SIZE, 0,		/* algnmnt, boundary */
 				BUS_SPACE_MAXADDR_32BIT,/* lowaddr */
 				BUS_SPACE_MAXADDR,	/* highaddr */
 				NULL, NULL,		/* filter, filterarg */
-				softs->pqi_cap.max_sg_elem*PAGE_SIZE,/*maxsize*/
+				(bus_size_t)softs->pqi_cap.max_sg_elem*PAGE_SIZE,/* maxsize */
 				softs->pqi_cap.max_sg_elem,	/* nsegments */
 				BUS_SPACE_MAXSIZE_32BIT,	/* maxsegsize */
 				BUS_DMA_ALLOCNOW,		/* flags */
----

Reverting the change on algnmnt param worked around the issue.
Comment 1 Ed Maste freebsd_committer freebsd_triage 2021-11-03 19:43:33 UTC
See also https://reviews.freebsd.org/D30182
Comment 2 Michael Dexter 2023-02-22 22:04:42 UTC
I am experiencing this same issue with FreeBSD-14.0-CURRENT-amd64-20230216-2894c8c96b9b-260969-disc1.iso|memstick.img when the user confirms the ZFS partitioning/formatting.

curthread    = 0xfffffe016cc081e0: pid 2494 tid 100630 critnest 1 "fsck_msdosfs"
curpcb       = 0xfffffe016cc89700
fpcurthread  = 0xfffffe016cc891e0: pid 2494 "fsck_msdosfs"
idlethread   = 0xfffffe0164d5ce40: tid 100042 "idle: cpu39"

Hardware: HPE ProLiant DL325 G10 Server 1x EPYC 7402P 64GB P408i P16696-B21

Booting to a pre-configured drive also causes a panic.

What additional information would help?
Comment 3 John Hall 2023-08-28 17:28:16 UTC
review D41619
Comment 4 commit-hook freebsd_committer freebsd_triage 2023-10-19 03:25:34 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f07b267d8cc87e88be3c78aa69504b5ebc6571ee

commit f07b267d8cc87e88be3c78aa69504b5ebc6571ee
Author:     John Hall <john.hall@microchip.com>
AuthorDate: 2023-10-19 03:10:58 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2023-10-19 03:12:27 +0000

    smartpqi: Change alignment for dma tags

    Problem: Under certain I/O conditions, a program doing large block disk
    reads can cause a controller to crash.

    Root Cause: The SCSI read request and destination address in the BDMA
    descriptor is incorrect, causing the BDMA engine in the controller to
    assert.

    Fix: Change the alignment for creating bus_dma_tags in the driver from
    PAGE_SIZE (4k) to 1, which allows the controller to manage it's own
    address range for BDMA transactions.

    Risk: Medium

    Exposure: This reverts a change first made to support NVMe drives on
    Excalibur. At that time a 4k alignment was necessary. This no longer
    seems to be the case.

    PR: 259541
    Reported by: Ka Ho Ng <khng@freebsd.org>
    Reviewed by: imp
    Differential Revision:  https://reviews.freebsd.org/D41619

 sys/dev/smartpqi/smartpqi_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2023-10-19 21:24:11 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=01619a8fafcfd99d1811b2c14a92bac1a48c6d31

commit 01619a8fafcfd99d1811b2c14a92bac1a48c6d31
Author:     John Hall <john.hall@microchip.com>
AuthorDate: 2023-10-19 03:25:32 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2023-10-19 21:21:11 +0000

    smartpqi: Change alignment for dma tags

    Problem: Under certain I/O conditions, a program doing large block disk
    reads can cause a controller to crash.

    Root Cause: The SCSI read request and destination address in the BDMA
    descriptor is incorrect, causing the BDMA engine in the controller to
    assert.

    Fix: Change the alignment for creating bus_dma_tags in the driver from
    PAGE_SIZE (4k) to 1, which allows the controller to manage it's own
    address range for BDMA transactions.

    Risk: Medium

    Exposure: This reverts a change first made to support NVMe drives on
    Excalibur. At that time a 4k alignment was necessary. This no longer
    seems to be the case.

    PR: 259541
    Reported by: Ka Ho Ng <khng@freebsd.org>
    Reviewed by: imp
    Differential Revision:  https://reviews.freebsd.org/D41619

    (cherry picked from commit f07b267d8cc87e88be3c78aa69504b5ebc6571ee)

 sys/dev/smartpqi/smartpqi_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2023-10-19 21:41:18 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0efd0d6fa7e0d713ea7e4e9a3e3ac7858475f707

commit 0efd0d6fa7e0d713ea7e4e9a3e3ac7858475f707
Author:     John Hall <john.hall@microchip.com>
AuthorDate: 2023-10-19 03:25:32 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2023-10-19 21:37:33 +0000

    smartpqi: Change alignment for dma tags

    Problem: Under certain I/O conditions, a program doing large block disk
    reads can cause a controller to crash.

    Root Cause: The SCSI read request and destination address in the BDMA
    descriptor is incorrect, causing the BDMA engine in the controller to
    assert.

    Fix: Change the alignment for creating bus_dma_tags in the driver from
    PAGE_SIZE (4k) to 1, which allows the controller to manage it's own
    address range for BDMA transactions.

    Risk: Medium

    Exposure: This reverts a change first made to support NVMe drives on
    Excalibur. At that time a 4k alignment was necessary. This no longer
    seems to be the case.

    PR: 259541
    Reported by: Ka Ho Ng <khng@freebsd.org>
    Reviewed by: imp
    Differential Revision:  https://reviews.freebsd.org/D41619

    (cherry picked from commit f07b267d8cc87e88be3c78aa69504b5ebc6571ee)
    (cherry picked from commit 01619a8fafcfd99d1811b2c14a92bac1a48c6d31)
    Approved-by: re (gjb)

 sys/dev/smartpqi/smartpqi_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 7 Mark Linimon freebsd_committer freebsd_triage 2023-12-27 12:42:21 UTC
^Triage: assign to committer that resolved.  Set flag for possible MFC to 13.
Comment 8 Warner Losh freebsd_committer freebsd_triage 2024-02-19 05:42:02 UTC
Merged to 14, can't merge to 13...