Summary: | mpr: panic in mpr_complete_command during zpool import | ||
---|---|---|---|
Product: | Base System | Reporter: | Dan Kotowski <dan.kotowski> |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Open --- | ||
Severity: | Affects Only Me | CC: | asomers, grahamperrin, imp, marklmi26-fbsd |
Priority: | --- | Keywords: | crash |
Version: | CURRENT | ||
Hardware: | arm64 | ||
OS: | Any |
Description
Dan Kotowski
2023-03-10 16:07:20 UTC
# mprutil show all Adapter: mpr0 Adapter: Board Name: SAS9311-8i Board Assembly: H3-25461-02H Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 8.37.00.00 Firmware Revision: 16.00.10.00 Integrated RAID: no SATA NCQ: ENABLED PCIe Width/Speed: x8 (8.0 GB/sec) IOC Speed: Full Temperature: 81 C PhyNum CtlrHandle DevHandle Disabled Speed Min Max Device 0 0001 0009 N 6.0 3.0 12 SAS Initiator 1 0002 000a N 6.0 3.0 12 SAS Initiator 2 0003 000b N 6.0 3.0 12 SAS Initiator 3 0004 000c N 6.0 3.0 12 SAS Initiator 4 N 3.0 12 SAS Initiator 5 N 3.0 12 SAS Initiator 6 0005 000d N 6.0 3.0 12 SAS Initiator 7 N 3.0 12 SAS Initiator Devices: B____T SAS Address Handle Parent Device Speed Enc Slot Wdt 00 03 4433221100000000 0009 0001 SATA Target 6.0 0001 03 1 00 02 4433221101000000 000a 0002 SATA Target 6.0 0001 02 1 00 00 4433221102000000 000b 0003 SATA Target 6.0 0001 00 1 00 01 4433221103000000 000c 0004 SATA Target 6.0 0001 01 1 00 04 4433221106000000 000d 0005 SATA Target 6.0 0001 04 1 Enclosures: Slots Logical ID SEPHandle EncHandle Type 08 500605b00993f2f0 0001 Direct Attached SGPIO Expanders: NumPhys SAS Address DevHandle Parent EncHandle SAS Level # zpool import pool: tank id: 4890533244228042504 state: ONLINE status: Some supported features are not enabled on the pool. (Note that they may be intentionally disabled if the 'compatibility' property is set.) action: The pool can be imported using its name or numeric identifier, though some features will not be available without an explicit 'zpool upgrade'. config: tank ONLINE raidz1-0 ONLINE diskid/DISK-S6EPNE0T600261 ONLINE diskid/DISK-S6EPNE0T600275 ONLINE diskid/DISK-S6EPNE0TA03058 ONLINE diskid/DISK-S6EPNE0TA03099 ONLINE diskid/DISK-S6EPNE0TA03107 ONLINE # zpool import tank panic: command not inqueue, state = 0 cpuid = 12 time = 946694892 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x13c panic() at panic+0x44 mpr_complete_command() at mpr_complete_command+0x12c mpr_intr_locked() at mpr_intr_locked+0x7c mpr_intr_msi() at mpr_intr_msi+0x58 ithread_loop() at ithread_loop+0x2a0 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100154 ] Stopped at kdb_enter+0x44: undefined f900027f db> reset Uptime: 16m2s mpr0: Sending StopUnit: path (xpt0:mpr0:0:0:ffffffff): handle 11 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:1:ffffffff): handle 12 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:2:ffffffff): handle 10 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:3:ffffffff): handle 9 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:4:ffffffff): handle 13 mpr0: Incrementing SSU count panic: command not inqueue, state = 0 cpuid = 12 time = 946694892 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x13c panic() at panic+0x44 mpr_complete_command() at mpr_complete_command+0x12c mpr_intr_locked() at mpr_intr_locked+0x7c xpt_sim_poll() at xpt_sim_poll+0x54 mprsas_ir_shutdown() at mprsas_ir_shutdown+0x458 kern_reboot() at kern_reboot+0x6c4 db_reset() at db_reset+0xd0 db_command() at db_command+0x2d8 db_command_loop() at db_command_loop+0x54 db_trap() at db_trap+0xf8 kdb_trap() at kdb_trap+0x28c handle_el1h_sync() at handle_el1h_sync+0x10 --- exception, esr 0 (null)() at 0 KDB: enter: panic [ thread pid 12 tid 100154 ] Stopped at kdb_enter+0x44: undefined ff900027f900027f db> reboot Uptime: 16m2s mpr0: Sending StopUnit: path (xpt0:mpr0:0:0:ffffffff): handle 11 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:1:ffffffff): handle 12 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:2:ffffffff): handle 10 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:3:ffffffff): handle 9 mpr0: Incrementing SSU count mpr0: Sending StopUnit: path (xpt0:mpr0:0:4:ffffffff): handle 13 mpr0: Incrementing SSU count mpr0: Decrementing SSU count. mpr0: Decrementing SSU count. mpr0: Decrementing SSU count. mpr0: Decrementing SSU count. mpr0: Completing stop unit for (xpt0:mpr0:0:3:ffffffff): mpr0: Completing stop unit for (xpt0:mpr0:0:1:ffffffff): mpr0: Completing stop unit for (xpt0:mpr0:0:4:ffffffff): mpr0: Completing stop unit for (xpt0:mpr0:0:2:ffffffff): mpr0: Decrementing SSU count. mpr0: Completing # uname -mv FreeBSD 14.0-CURRENT main-n261327-c237c10a2346 GENERIC arm64 # mprutil show adapters Device Name Chmpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68) ip Name Board Name Firmware /dev/mpr0 LSISAS3008 SAS9311-8i 10000a00 # mprutil show adapter mpr0 Adapter: mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68) Board Name: SAS9311-8i Board Assembly: H3-25461-02H Chip Name: LSISAS3008 Chip Revision: ALL BIOS Revision: 8.37.00.00 Firmware Revision: 16.00.10.00 Integrated RAID: no SATA NCQ: ENABLED PCIe Width/Speed: x8 (8.0 GB/sec) IOC Speed: Full Temperature: 79 C PhyNum CtlrHandle DevHandle Disabled Speed Min Max Device 0 0001 0009 N 6.0 3.0 12 SAS Initiator 1 0002 000a N 6.0 3.0 12 SAS Initiator 2 0003 000b N 6.0 3.0 12 SAS Initiator 3 0004 000c N 6.0 3.0 12 SAS Initiator 4 N 3.0 12 SAS Initiator 5 N 3.0 12 SAS Initiator 6 0005 000d N 6.0 3.0 12 SAS Initiator 7 N 3.0 12 SAS Initiator # mprutil show iocfacts mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68) s5 MsgLength: 17 Function: 0x3 HeaderVersion: 50,00 IOCNumber: 0 MsgFlags: 0x0 VP_ID: 0 VF_ID: 0 IOCExceptions: 0 IOCStatus: 0 IOCLogInfo: 0x0 MaxChainDepth: 128 WhoInit: 0x4 NumberOfPorts: 1 MaxMSIxVectors: 96 RequestCredit: 9856 ProductID: 0x2221 IOCCapabilities: 0x7a85c <ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray> FWVersion: 16.00.10.00 IOCRequestFrameSize: 32 MaxChainSegmentSize: 8 MaxInitiators: 32 MaxTargets: 1024 MaxSasExpanders: 42 MaxEnclosures: 43 ProtocolFlags: 0x3 <ScsiTarget,ScsiInitiator> HighPriorityCredit: 104 MaxRepDescPostQDepth: 65504 ReplyFrameSize: 32 MaxVolumes: 0 MaxDevHandle: 1106 MaxPersistentEntries: 128 MinDevHandle: 9 CurrentHostPageSize: 0 # pciconf -l -BbcevV mpr0 mpr0@pci4:1:0:0: class=0x010700 rev=0x02 hdr=0x00 vendor=0x1000 device=0x0097 subvendor=0x1000 subdevice=0x30e0 vendor = 'Broadcom / LSI' device = 'SAS3008 PCI-Express Fusion-MPT SAS-3' class = mass storage subclass = SAS bar [10] = type I/O Port, range 32, base r, size 256, disabled bar [14] = type Memory, range 64, base rx40040000, size 65536, enabled bar [1c] = type Memory, range 64, base rx40000000, size 262144, enabled cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0 cap 10[68] = PCI-Express 2 endpoint max data 128(4096) FLR RO NS max read 512 link x8(x8) speed 8.0(8.0) cap 05[a8] = MSI supports 1 message, 64 bit, vector masks cap 11[c0] = MSI-X supports 96 messages, enabled Table in map 0x14[0xe000], PBA in map 0x14[0xf000] ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected ecap 0019[1e0] = PCIe Sec 1 lane errors 0 ecap 0004[1c0] = Power Budgeting 1 ecap 0016[190] = DPA 1 ecap 000e[148] = ARI 1 Seems to exist for mps as well # zpool import tank panic: command not inqueue, state = 0 cpuid = 12 time = 946689084 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x13c panic() at panic+0x44 mps_complete_command() at mps_complete_command+0x130 mps_intr_locked() at mps_intr_locked+0xc4 mps_intr_msi() at mps_intr_msi+0x58 ithread_loop() at ithread_loop+0x2a0 fork_exit() at fork_exit+0x74 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 12 tid 100154 ] Stopped at kdb_enter+0x44: undefined f900027f I have since brought the 9300-8i controller up to 16.00.12.00 but the problem persists. https://www.truenas.com/community/resources/lsi-9300-xx-firmware-update.145/ Interestingly, I can reproduce the issue with the 9300-8i on an amd64 system also running CURRENT, but the SAS9211-8i on 20.00.04.00 only fails on arm64 (works on amd64). The following seems to get emitted to messages upon almost any command: kernel: [1135] mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68) (In reply to Dan Kotowski from comment #8) You an 100% ignore that message. It's saying there's more information that could be copied to userland based on the iofacts, so userland should have given it a bigger buffer. However, nothing in userland that we have, or that I'm aware of uses those extra 4 bytes to make decisions or report anything. The message is eliminated in future versions of FreeBSD. (In reply to Warner Losh from comment #9) > The message is eliminated in future versions of FreeBSD. Actually, not yet. It's still waiting for review. https://reviews.freebsd.org/D38739 , if you're interested. Is it possible this is related to a PCI bus coherency issue? The platform is a SolidRun Honeycomb LX2K, which is built around the Cortex-A72 but with a full PCIe bus. Except that testing on Linux found an issue in the PCI coherency. Test: https://gist.github.com/jnettlet/80f8d09d01c0dc0ffc0122f36ed78de6 glibc patch: https://gist.github.com/jnettlet/f6f8b49bb7c731255c46f541f875f436 I checked our arm64 memcpy.S and sure enough we use the same ordering as Linux used to but that Jon patched out. diff --git a/sys/arm64/arm64/memcpy.S b/sys/arm64/arm64/memcpy.S index d5fbfa64e0fa..d65910a0a0c8 100644 --- a/sys/arm64/arm64/memcpy.S +++ b/sys/arm64/arm64/memcpy.S @@ -132,12 +132,12 @@ L(copy128): stp G_l, G_h, [dstend, -64] stp H_l, H_h, [dstend, -48] L(copy96): + stp C_l, C_h, [dstend, -32] + stp D_l, D_h, [dstend, -16] stp A_l, A_h, [dstin] stp B_l, B_h, [dstin, 16] stp E_l, E_h, [dstin, 32] stp F_l, F_h, [dstin, 48] - stp C_l, C_h, [dstend, -32] - stp D_l, D_h, [dstend, -16] ret .p2align 4 @@ -232,10 +232,10 @@ L(copy64_from_start): stp C_l, C_h, [dstend, -48] ldp C_l, C_h, [src] stp D_l, D_h, [dstend, -64] - stp G_l, G_h, [dstin, 48] - stp A_l, A_h, [dstin, 32] - stp B_l, B_h, [dstin, 16] stp C_l, C_h, [dstin] + stp B_l, B_h, [dstin, 16] + stp A_l, A_h, [dstin, 32] + stp G_l, G_h, [dstin, 48] ret EEND(memmove) END(memcpy) (In reply to Dan Kotowski from comment #12) Are you able to test of the patch sidesteps the issue for you? Tested, no it does not sidestep :( This driver works well on amd64... If we're failing on arm64, that likely means that we've missed some busdma thing to ensure things are coherent in that environment... That's why the controller is likely finding completed items not in the queue... Or there's some race between inserting it into the queue, setting its state and reading it back out of the pending list when we're trying to complete... Or we're seeing 'stale' data for the completion records, though I suspect that's a bit less likely... I don't have an arm64 machine I can add in mpr or mps cards. The fact you see it on both likely means the same systemic error was made in both places (mpr is a copy of mps that's been augmented for mpr's new features). Is it possible that it has to do with the msleep timeouts? I noticed that after almost every panic, passing "reset" to kdb will NOT reset the system, but mpr0 will dump a bit more to the console and then take me to another kdb prompt. Eg comment #c2 we can see mpr0 still emitting more to the console after both a "db> reset" and a subsequent "db> reboot". Even more so when I enable full debugging 0x07ff, the zpool import actually works! And I am able to interact with non-zfs drives a little bit as well. All of this to me points to bad timeouts somewhere. A suggestion from Jon Nettleton at SolidRun is to disable PCI ASPM - this seems to be a known issue elsewhere? As it turns out my drives do not support NCQed TRIM. This was fixed for ada by review D43961 but not for da. From base b7dce5b "scsi_da: add 4K quirks for Samsung SSD 860 and 870": ``` diff --git a/sys/cam/scsi/scsi_da.c b/sys/cam/scsi/scsi_da.c index d578e4ccb712..9b3d706d6168 100644 --- a/sys/cam/scsi/scsi_da.c +++ b/sys/cam/scsi/scsi_da.c @@ -1397,6 +1397,22 @@ static struct da_quirk_entry da_quirk_table[] = }, { /* + * Samsung 860 SSDs + * 4k optimised & trim only works in 4k requests + 4k aligned + */ + { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "Samsung SSD 860*", "*" }, + /*quirks*/DA_Q_4K + }, + { + /* + * Samsung 870 SSDs + * 4k optimised & trim only works in 4k requests + 4k aligned + */ + { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "Samsung SSD 870*", "*" }, + /*quirks*/DA_Q_4K + }, + { + /* * Samsung 843T Series SSDs (MZ7WD*) * Samsung PM851 Series SSDs (MZ7TE*) * Samsung PM853T Series SSDs (MZ7GE*) ``` From base c01af41 "ata_da: add quirk to disable NCQ TRIM for Samsung 860/870 SSDs": ``` diff --git a/sys/cam/ata/ata_da.c b/sys/cam/ata/ata_da.c index f5d3aeca9329..d4a591943307 100644 --- a/sys/cam/ata/ata_da.c +++ b/sys/cam/ata/ata_da.c @@ -729,6 +729,22 @@ static struct ada_quirk_entry ada_quirk_table[] = }, { /* + * Samsung 860 SSDs + * 4k optimised, NCQ TRIM broken (normal TRIM fine) + */ + { T_DIRECT, SIP_MEDIA_FIXED, "*", "Samsung SSD 860*", "*" }, + /*quirks*/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN + }, + { + /* + * Samsung 870 SSDs + * 4k optimised, NCQ TRIM broken (normal TRIM fine) + */ + { T_DIRECT, SIP_MEDIA_FIXED, "*", "Samsung SSD 870*", "*" }, + /*quirks*/ADA_Q_4K | ADA_Q_NCQ_TRIM_BROKEN + }, + { + /* * Samsung SM863 Series SSDs (MZ7KM*) * 4k optimised, NCQ believed to be working */ ``` At least some of my affected drives are in the Samsung SSD 860 family. I do not see anything like NCQ_TRIM_BROKEN in scsi_da.c and we would probably need to implement there. A few problems with your ncq trim theory. (1) scsi_da doesn't implement ncq trim at all. Until mpi3mr was imported, there was nothing in the tree that could create the necessary ATA command with the extra registers apart from ahci (which uses ata_da). (2) mpr can't possibly send ncq trims if da were generating them and (3) import is a read intensive operation, so is not doing trims and will do minimal writes. As an aside: We likely should just assume the 4k quirk always. That would eliminate 90% of the quirks we have if we also stop doing READ6/WRITE6 commands entirely (they are a compat hack for SASI and READ10 was in SCSI1, though not universally working, SCSI1 drives are not relevant today, certainly not ultra-low capacity ones that were quirky at the time). But that's a different issue. But that's not the main issue here. I fixed a lot of 'state machine' bugs, which this panic as, and Scott Long fixed even more before I did. Those changes should have been pushed upstream several years prior to the uname date in this bug report. Since this is on an ARM server, there may be something subtle there due to arm's weaker memory model than amd64 that's causing this. My testing of mpr on aarch64 has been light since we don't use it at $WORK and my aarch64 chassis that I have don't have slots for hard drives... So I've just done bench testing to see that I Can see the disk and do some I/O, but not much beyond that. And of late it's not feasible to redo that bench testing due to changes in the amount of junk I have on my bench. Out of curiosity: is this a zpool import from a pool that was created on another system? Or was it working fine and then this started happening after some upgrade. mpr and mps both share a common history, including the state tracking code, so it's not super surprising that this is being hit on both. > Since this is on an ARM server, there may be something subtle there due to arm's weaker memory model than amd64 that's causing this. I don't know if it's related or not, but PCIe GPUs under Linux DRM can experience weird artifacting and tearing as a result of some sort of memory issue. The fix from the firmware developer was to reorder operations in glibc memcpy. > Subject: [PATCH] Aarch64: Make memcpy more compatible with device memory > > For normal non-cacheable memory ACE supports 4x128 bit r/w WRAP > transfers or 1x128 bit r/w INCR transfers. By re-ordering the > stp's in memcpy / memmove we can accomodate this better without > impacting the existing code. > > This fixes an issue seen on multiple Cortex-A72 SOCs when writing > directly to a PCIe memmapped frame-buffer, which resulted in > corruption. https://gist.github.com/jnettlet/f6f8b49bb7c731255c46f541f875f436 Test for framebuffer memcpy bugs: https://gist.github.com/jnettlet/80f8d09d01c0dc0ffc0122f36ed78de6 Unfortunately I lack the knowledge to know how to build a test util for FreeBSD, but I do wonder if the coherency issue on the bus is impacting my case as well? Another user in the vendor's Discord channel recently stated that they've been experiencing issues with using 2x NVMe drives off of a SuperMicro bifurcated PCIe-to-NVMe adapter. And I've since been able to replicate the issues using a Linux mdadm mirror as well. I have not seen panics when testing single drives, only when using pools of 2 or more. Perhaps it only presents when there are multiple commands issued in parallel? > is this a zpool import from a pool that was created on another system? I have been able to reproduce with zpools from known-working systems and trying to create new ones as well. An interesting update from Solidrun: https://developer.arm.com/documentation/ddi0517/f/functional-description/constraints-and-limitations-of-use/axi3-and-axi4-support > The MMU-500 supports the AXI3 and AXI4 protocols when the sysbardisable_<tbuname> input signal is tied HIGH. In such cases, the following AXI3 features are not supported: > > Write data interleaving > > Write data and write address ordering must be the same, otherwise data corruption can occur. NXP pulls this high in their PBI code. Could write-interleaving enablement be a sysctl? Another recent note from Solidrun's chief systems architect:
> The way that the Cortex-A72 cores do gathering is an undefinied behaviour in the PCIe spec, and since AXI / ACE / CHI don't know about the gathering memory property (it is part of the Arm core design) it causes inconsistencies.
|