The following is a stack trace and ddb(4) show pcpu output: ---- db> show pcpu cpuid = 151 dynamic pcpu = 0xfffffe020877ca40 curthread = 0xfffffe019120a1e0: pid 571 tid 106262 critnest 1 "fsck_msdosfs" curpcb = 0xfffffe019120a6f0 fpcurthread = 0xfffffe019120a1e0: pid 571 "fsck_msdosfs" idlethread = 0xfffffe01a0eb9900: tid 100154 "idle: cpu151" self = 0xffffffff82ea7000 curpmap = 0xfffffe019120f518 tssp = 0xffffffff82ea7384 rsp0 = 0xfffffe03dda97000 kcr3 = 0xffffffffffffffff ucr3 = 0xffffffffffffffff scr3 = 0x0 gs32p = 0xffffffff82ea7404 ldt = 0xffffffff82ea7444 tss = 0xffffffff82ea7434 curvnet = 0 spin locks held: db> bt Tracing pid 571 tid 106262 td 0xfffffe019120a1e0 kdb_enter() at kdb_enter+0x37/frame 0xfffffe03dda966d0 vpanic() at vpanic+0x1b8/frame 0xfffffe03dda96730 panic() at panic+0x43/frame 0xfffffe03dda96790 bounce_bus_dmamap_load_ma() at bounce_bus_dmamap_load_ma+0x3a9/frame 0xfffffe03dda96810 _bus_dmamap_load_bio() at _bus_dmamap_load_bio+0x113/frame 0xfffffe03dda96870 bus_dmamap_load_ccb() at bus_dmamap_load_ccb+0x92/frame 0xfffffe03dda968d0 smartpqi_cam_action() at smartpqi_cam_action+0xdb9/frame 0xfffffe03dda96940 xpt_run_devq() at xpt_run_devq+0x2f9/frame 0xfffffe03dda969a0 xpt_action_default() at xpt_action_default+0x471/frame 0xfffffe03dda969f0 dastart() at dastart+0x336/frame 0xfffffe03dda96a40 xpt_run_allocq() at xpt_run_allocq+0xb3/frame 0xfffffe03dda96a90 dastrategy() at dastrategy+0x6f/frame 0xfffffe03dda96ac0 g_disk_start() at g_disk_start+0x31c/frame 0xfffffe03dda96b30 g_io_request() at g_io_request+0x2d7/frame 0xfffffe03dda96b60 g_part_start() at g_part_start+0x289/frame 0xfffffe03dda96be0 g_io_request() at g_io_request+0x2d7/frame 0xfffffe03dda96c10 g_dev_strategy() at g_dev_strategy+0x155/frame 0xfffffe03dda96c40 physio() at physio+0x49b/frame 0xfffffe03dda96ce0 devfs_read_f() at devfs_read_f+0xe5/frame 0xfffffe03dda96d40 dofileread() at dofileread+0x81/frame 0xfffffe03dda96d90 sys_read() at sys_read+0xc0/frame 0xfffffe03dda96e00 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe03dda96f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03dda96f30 --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x8011c4cca, rsp = 0x7fffffffdac8, rbp = 0x7fffffffdb00 --- db> ---- I git-bisected and found the offending commits to be 9fac68fc3853b696c8479bb3a8181d62cb9f59c9 in main, and 1569aab1cb38a38fb619f343ed1e47d4b4070ffe in stable/13. The following changes in the commits look suspicious in my eye: ---- @@ -313,27 +347,31 @@ smartpqi_attach(device_t dev) ... - - /* - * Create DMA tag for mapping buffers into controller-addressable space. - */ - if (bus_dma_tag_create(softs->os_specific.pqi_parent_dmat,/* parent */ - 1, 0, /* algnmnt, boundary */ + /* + * Create DMA tag for mapping buffers into controller-addressable space. + */ + if (bus_dma_tag_create(softs->os_specific.pqi_parent_dmat,/* parent */ + PAGE_SIZE, 0, /* algnmnt, boundary */ BUS_SPACE_MAXADDR_32BIT,/* lowaddr */ BUS_SPACE_MAXADDR, /* highaddr */ NULL, NULL, /* filter, filterarg */ - softs->pqi_cap.max_sg_elem*PAGE_SIZE,/*maxsize*/ + (bus_size_t)softs->pqi_cap.max_sg_elem*PAGE_SIZE,/* maxsize */ softs->pqi_cap.max_sg_elem, /* nsegments */ BUS_SPACE_MAXSIZE_32BIT, /* maxsegsize */ BUS_DMA_ALLOCNOW, /* flags */ ---- Reverting the change on algnmnt param worked around the issue.
See also https://reviews.freebsd.org/D30182
I am experiencing this same issue with FreeBSD-14.0-CURRENT-amd64-20230216-2894c8c96b9b-260969-disc1.iso|memstick.img when the user confirms the ZFS partitioning/formatting. curthread = 0xfffffe016cc081e0: pid 2494 tid 100630 critnest 1 "fsck_msdosfs" curpcb = 0xfffffe016cc89700 fpcurthread = 0xfffffe016cc891e0: pid 2494 "fsck_msdosfs" idlethread = 0xfffffe0164d5ce40: tid 100042 "idle: cpu39" Hardware: HPE ProLiant DL325 G10 Server 1x EPYC 7402P 64GB P408i P16696-B21 Booting to a pre-configured drive also causes a panic. What additional information would help?
review D41619
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f07b267d8cc87e88be3c78aa69504b5ebc6571ee commit f07b267d8cc87e88be3c78aa69504b5ebc6571ee Author: John Hall <john.hall@microchip.com> AuthorDate: 2023-10-19 03:10:58 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2023-10-19 03:12:27 +0000 smartpqi: Change alignment for dma tags Problem: Under certain I/O conditions, a program doing large block disk reads can cause a controller to crash. Root Cause: The SCSI read request and destination address in the BDMA descriptor is incorrect, causing the BDMA engine in the controller to assert. Fix: Change the alignment for creating bus_dma_tags in the driver from PAGE_SIZE (4k) to 1, which allows the controller to manage it's own address range for BDMA transactions. Risk: Medium Exposure: This reverts a change first made to support NVMe drives on Excalibur. At that time a 4k alignment was necessary. This no longer seems to be the case. PR: 259541 Reported by: Ka Ho Ng <khng@freebsd.org> Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D41619 sys/dev/smartpqi/smartpqi_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=01619a8fafcfd99d1811b2c14a92bac1a48c6d31 commit 01619a8fafcfd99d1811b2c14a92bac1a48c6d31 Author: John Hall <john.hall@microchip.com> AuthorDate: 2023-10-19 03:25:32 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2023-10-19 21:21:11 +0000 smartpqi: Change alignment for dma tags Problem: Under certain I/O conditions, a program doing large block disk reads can cause a controller to crash. Root Cause: The SCSI read request and destination address in the BDMA descriptor is incorrect, causing the BDMA engine in the controller to assert. Fix: Change the alignment for creating bus_dma_tags in the driver from PAGE_SIZE (4k) to 1, which allows the controller to manage it's own address range for BDMA transactions. Risk: Medium Exposure: This reverts a change first made to support NVMe drives on Excalibur. At that time a 4k alignment was necessary. This no longer seems to be the case. PR: 259541 Reported by: Ka Ho Ng <khng@freebsd.org> Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D41619 (cherry picked from commit f07b267d8cc87e88be3c78aa69504b5ebc6571ee) sys/dev/smartpqi/smartpqi_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0efd0d6fa7e0d713ea7e4e9a3e3ac7858475f707 commit 0efd0d6fa7e0d713ea7e4e9a3e3ac7858475f707 Author: John Hall <john.hall@microchip.com> AuthorDate: 2023-10-19 03:25:32 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2023-10-19 21:37:33 +0000 smartpqi: Change alignment for dma tags Problem: Under certain I/O conditions, a program doing large block disk reads can cause a controller to crash. Root Cause: The SCSI read request and destination address in the BDMA descriptor is incorrect, causing the BDMA engine in the controller to assert. Fix: Change the alignment for creating bus_dma_tags in the driver from PAGE_SIZE (4k) to 1, which allows the controller to manage it's own address range for BDMA transactions. Risk: Medium Exposure: This reverts a change first made to support NVMe drives on Excalibur. At that time a 4k alignment was necessary. This no longer seems to be the case. PR: 259541 Reported by: Ka Ho Ng <khng@freebsd.org> Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D41619 (cherry picked from commit f07b267d8cc87e88be3c78aa69504b5ebc6571ee) (cherry picked from commit 01619a8fafcfd99d1811b2c14a92bac1a48c6d31) Approved-by: re (gjb) sys/dev/smartpqi/smartpqi_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
^Triage: assign to committer that resolved. Set flag for possible MFC to 13.
Merged to 14, can't merge to 13...