Bug 250629 - lockups on Dell R740xd with 12-STABLE
Summary: lockups on Dell R740xd with 12-STABLE
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-26 11:22 UTC by Juraj Lutter
Modified: 2020-11-15 18:30 UTC (History)
0 users

See Also:


Attachments
nvme (421.79 KB, image/png)
2020-11-15 18:29 UTC, Juraj Lutter
no flags Details
mrsas ocr thread (290.91 KB, image/png)
2020-11-15 18:30 UTC, Juraj Lutter
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Juraj Lutter 2020-10-26 11:22:41 UTC
Hi,

since some point in time, I am experiencing lockups (similar to PR #236989) on Dell R740xd.

It manifests as lockups while doing disk IO, rendering the machine mostly unresponsive to commands.

I also encountered "Missing interrupt" message(s). Sometimes the box has to be rebooted via DRAC.

Some data I have been able to gather:

Oct 26 11:13:49 bnts-nvs-n1 kernel: mrsas0: MSI-x interrupts setup success
Oct 26 11:13:49 bnts-nvs-n1 kernel: bnxt0: Using MSI-X interrupts with 2 vectors
Oct 26 11:13:49 bnts-nvs-n1 kernel: bnxt1: Using MSI-X interrupts with 2 vectors
Oct 26 11:13:49 bnts-nvs-n1 kernel: bnxt2: Using MSI-X interrupts with 2 vectors
Oct 26 11:13:49 bnts-nvs-n1 kernel: bnxt3: Using MSI-X interrupts with 2 vectors


dmesg excerpts:
AVAGO MegaRAID SAS FreeBSD mrsas driver version: 07.709.04.00-fbsd
mrsas0: <AVAGO Invader SAS Controller> port 0x4000-0x40ff mem 0x9db00000-0x9db0ffff,0x9da00000-0x9dafffff irq 32 at device 0.0 numa-domain 0 on pci4
mrsas0: FW now in Ready state
mrsas0: Using MSI-X with 32 number of vectors
mrsas0: FW supports <96> MSIX vector,Online CPU 32 Current MSIX <32>
mrsas0: max sge: 0x46, max chain frame size: 0x400, max fw cmd: 0x39f
mrsas0: Issuing IOC INIT command to FW.
mrsas0: IOC INIT response received from FW.
mrsas0: System PD created target ID: 0x0
mrsas0: System PD created target ID: 0x1
mrsas0: FW supports: UnevenSpanSupport=1

mrsas0: max_fw_cmds: 927  max_scsi_cmds: 911
mrsas0: MSI-x interrupts setup success
mrsas0: mrsas_ocr_thread

bge0: <Broadcom NetXtreme Gigabit Ethernet, ASIC rev. 0x5720000> mem 0x9d830000-0x9d83ffff,0x9d840000-0x9d84ffff,0x9d850000-0x9d85ffff irq 34 at device 0.0 numa-domain 0 on pci5
bge0: APE FW version: NCSI v1.5.14.0
bge0: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus0: <MII bus> numa-domain 0 on bge0
brgphy0: <BCM5720C 1000BASE-T media interface> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Using defaults for TSO: 65518/35/2048

pcib13: <ACPI PCI-PCI bridge> mem 0xb9400000-0xb943ffff irq 48 at device 0.0 numa-domain 0 on pci11
pci13: <ACPI PCI bus> numa-domain 0 on pcib14
nvme0: <Samsung PM1725a> mem 0xb9300000-0xb9303fff irq 48 at device 0.0 numa-domain 0 on pci13

bnxt0: <Broadcom BCM57412 NetXtreme-E 10Gb Ethernet> mem 0xd3a10000-0xd3a1ffff,0xd3900000-0xd39fffff,0xd3a22000-0xd3a23fff irq 80 at device 0.0 numa-domain 1 on pci27
bnxt0: Using 256 TX descriptors and 256 RX descriptors
bnxt0: Using 1 RX queues 1 TX queues
bnxt0: Using MSI-X interrupts with 2 vectors
bnxt0: Ethernet address: bc:97:e1:7d:d4:70
bnxt0: netmap queues/slots: TX 1/256, RX 1/256

est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 1ba400001e00
device_attach: est31 attach returned 6

nda0 at nvme0 bus 0 scbus19 target 0 lun 1
nda0: <Dell Express Flash PM1725b 1.6TB SFF 1.1.0 S5CUNA0N201038>
nda0: Serial Number S5CUNA0N201038
nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
nda0: 1526185MB (3125627568 512 byte sectors)

da0 at mrsas0 bus 1 scbus17 target 0 lun 0
da0: <ATA SSDSC2KG240G8R DL67> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number BTYG01730DP5240AGN
da0: 150.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)

FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz (2494.22-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x50657  Family=0x6  Model=0x55  Stepping=7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7ffefbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0xd39ffffb<FSGSBASE,TSCADJ,BMI1,HLE,AVX2,FDPEXC,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,NFPUSG,MPX,PQE,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PROCTRACE,AVX512CD,AVX512BW,AVX512VL>
  Structured Extended Features2=0x808<PKU,AVX512VNNI>
  Structured Extended Features3=0xbc000400<MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0xab<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,TSX_CTRL>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics
real memory  = 274869518336 (262136 MB)
avail memory = 267481690112 (255090 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <DELL   PE_SC3  >
FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
FreeBSD/SMP: 2 package(s) x 8 core(s) x 2 hardware threads
random: unblocking device.
ioapic6: MADT APIC ID 16 != hw id 0
ioapic7: MADT APIC ID 17 != hw id 1
ioapic8: MADT APIC ID 18 != hw id 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-31 on motherboard
ioapic2 <Version 2.0> irqs 32-39 on motherboard
ioapic3 <Version 2.0> irqs 40-47 on motherboard
ioapic4 <Version 2.0> irqs 48-55 on motherboard
ioapic5 <Version 2.0> irqs 72-79 on motherboard
ioapic6 <Version 2.0> irqs 80-87 on motherboard
ioapic7 <Version 2.0> irqs 88-95 on motherboard
ioapic8 <Version 2.0> irqs 96-103 on motherboard
Launching APs: 1 29 25 27 23 3 9 14 24 8 2 17 5 30 13 21 4 15 31 18 10 26 16 11 7 6 12 22 20 28 19


root@bnts-nvs-n1:~ # vmstat -z
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP

UMA Kegs:               248,      0,     263,       7,     263,   0,   0
UMA Zones:             4560,      0,     284,       0,     284,   0,   0
UMA Slabs:               80,      0,  312803,      47,  314845,   0,   0
UMA Hash:               256,      0,      65,      55,     112,   0,   0
4 Bucket:                32,      0,    5820,    5430,   24657,   0,   0
6 Bucket:                48,      0,    1787,    7094,   13130,   0,   0
8 Bucket:                64,      0,     399,    8467,    7981,  29,   0
12 Bucket:               96,      0,     905,    4958,   13112,   0,   0
16 Bucket:              128,      0,     803,    5273,   13301,   1,   0
32 Bucket:              256,      0,    3364,    3776,   38441,  62,   0
64 Bucket:              512,      0,    2349,    1363,   12463,9041,   0
128 Bucket:            1024,      0,     749,    1419,    9226,  54,   0
256 Bucket:            2048,      0,    1680,    5734,    7762,6856,   0
vmem:                  1856,      0,       5,       1,       5,   0,   0
vmem btag:               56,      0,   50303,    3444,   51016, 379,   0
VM OBJECT:              256,      0,   52100,    1825,  136197,   0,   0
RADIX NODE:             144,      0,   46008,    1890,  237871,   0,   0
MAP:                    240,      0,       3,      61,       3,   0,   0
KMAP ENTRY:             120,      0,      30,     861,      38,   0,   0
MAP ENTRY:              120,      0,    1905,    4860,  309691,   0,   0
VMSPACE:               2560,      0,      45,     126,    4223,   0,   0
fakepg:                 104,      0,       0,     380,       3,   0,   0
64 pcpu:                  8,      0,    3814,    3098,    3814,   0,   0
mt_stats_zone:           64,      0,     379,     645,     379,   0,   0
mt_zone:                 24,      0,     379,     790,     379,   0,   0
16:                      16,      0,   17001,    5338, 3039295,   0,   0
32:                      32,      0,   79514,    4611, 1649034,   0,   0
64:                      64,      0,  129486,  116282, 1413017,   0,   0
128:                    128,      0,  326585,   13423, 4152063,   0,   0
256:                    256,      0,   14328,    1377,  324758,   0,   0
512:                    512,      0,   53403,   31469,  799599,   0,   0
1024:                  1024,      0,   27572,    6652,   83218,   0,   0
2048:                  2048,      0,    2977,     345,  217240,   0,   0                                                                                                        [186/452]
4096:                  4096,      0,   11018,      66,  455358,   0,   0
8192:                  8192,      0,      92,      24,     959,   0,   0
16384:                16384,      0,      44,     460,    2217,   0,   0
32768:                32768,      0,      72,      64,    1480,   0,   0
65536:                65536,      0,      18,      27,    1253,   0,   0
SLEEPQUEUE:              80,      0,    2641,    2877,    2641,   0,   0
kenv:                   258,      0,       0,     930,   34195,   0,   0
Files:                   80,      0,     150,    3850,   94933,   0,   0
filedesc0:             1104,      0,     102,     339,    4279,   0,   0
rangeset pctrie nodes:    144,      0,       0,       0,       0,   0,   0
TURNSTILE:              136,      0,    2641,    1219,    2641,   0,   0
rl_entry:                40,      0,     299,    7801,     299,   0,   0
umtx pi:                 96,      0,       0,       0,       0,   0,   0
umtx_shm:                88,      0,       0,       0,       0,   0,   0
MAC labels:              40,      0,       0,       0,       0,   0,   0
PROC:                  1328,      0,     101,     286,    4278,   0,   0
THREAD:                1840,      0,    2600,      40,    3534,   0,   0
cpuset:                 104,      0,     427,    3603,     427,   0,   0
domainset:               40,      0,       0,       0,       0,   0,   0
pkru ranges:             24,      0,       0,       0,       0,   0,   0
audit_record:          1280,      0,       0,       0,       0,   0,   0
mbuf_packet:            256, 104517225,    1024,    5807,    8851,   0,   0
mbuf:                   256, 104517225,    1556,    7813,  121090,   0,   0
mbuf_cluster:          2048, 16330814,    8365,      41,    9703,   0,   0
mbuf_jumbo_page:       4096, 8165407,       0,       5,      64,   0,   0
mbuf_jumbo_9k:         9216, 2419379,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384, 1360901,       0,       0,       0,   0,   0
epoch_record pcpu:      256,      0,       4,      60,       4,   0,   0
NetGraph items:          72,   4123,       0,       0,       0,   0,   0
NetGraph data items:     72,   4123,       0,       0,       0,   0,   0
FPU_save_area:         2696,      0,       0,       0,       0,   0,   0
DMAR_MAP_ENTRY:         120,      0,       0,       0,       0,   0,   0
ttyinq:                 160,      0,     225,     650,     525,   0,   0
ttyoutq:                256,      0,     119,     736,     279,   0,   0
g_bio:                  376,      0,       0,    2940,  171672,   0,   0
nvme_request:           128,      0,     176,    3575,    6016,   0,   0
cryptop:                128,      0,       0,       0,       0,   0,   0
cryptodesc:             120,      0,       0,       0,       0,   0,   0
crypto_session:          24,      0,       0,       0,       0,   0,   0
CTL IO:                 672,      0,       0,       0,       0,   0,   0
beio:                   376,      0,       0,       0,       0,   0,   0
ctlblock:            131072,      0,       0,       0,       0,   0,   0
taskq_zone:             192,      0,       1,    1919,    1723,   0,   0
VNODE:                  480,      0,   50021,     499,   56381,   0,   0
VNODEPOLL:              120,      0,       0,       0,       0,   0,   0
BUF TRIE:               144,      0,       0,  105948,       0,   0,   0
NAMEI:                 1024,      0,       1,     415,  351178,   0,   0
rentr:                   24,      0,       0,     334,       1,   0,   0
S VFS Cache:            108,      0,   59848,    2277,  102061,   0,   0                                                                                                        [137/452]
STS VFS Cache:          148,      0,       0,       0,       0,   0,   0
L VFS Cache:            328,      0,     299,     349,     414,   0,   0
LTS VFS Cache:          368,      0,       0,       0,       0,   0,   0
DIRHASH:               1024,      0,       0,       0,       0,   0,   0
NCLNODE:                592,      0,       0,       0,       0,   0,   0
pipe:                   760,      0,       8,     412,    1691,   0,   0
Mountpoints:           2744,      0,      16,      37,      16,   0,   0
procdesc:               136,      0,       0,       0,       0,   0,   0
AIO:                    208,      0,       0,       0,       0,   0,   0
AIOP:                    32,      0,       0,       0,       0,   0,   0
AIOCB:                  752,      0,       0,       0,       0,   0,   0
AIOLIO:                 280,      0,       0,       0,       0,   0,   0
zfs_btree_leaf_cache:   4096,      0,     118,    6492,   14022,   0,   0
ddt_cache:            24840,      0,      26,       1,      39,   0,   0
ddt_entry_cache:        392,      0,       0,       0,       0,   0,   0
zio_cache:             1232,      0,      74,    9958, 1089142,   0,   0
zio_link_cache:          48,      0,       1,   16765,  368925,   0,   0
zio_buf_512:            512,      0,   14842,     878,  175956,   0,   0
zio_data_buf_512:       512,      0,     476,   99780,  143346,   0,   0
zio_buf_1024:          1024,      0,     759,     297,    1129,   0,   0
zio_data_buf_1024:     1024,      0,     374,     290,     466,   0,   0
zio_buf_1536:          1536,      0,     191,     215,     360,   0,   0
zio_data_buf_1536:     1536,      0,     294,     184,     345,   0,   0
zio_buf_2048:          2048,      0,      73,     113,     717,   0,   0
zio_data_buf_2048:     2048,      0,     667,      93,     772,   0,   0
zio_buf_2560:          2560,      0,      35,      70,     341,   0,   0
zio_data_buf_2560:     2560,      0,     726,      72,     813,   0,   0
zio_buf_3072:          3072,      0,      28,      74,     285,   0,   0
zio_data_buf_3072:     3072,      0,     598,      47,     672,   0,   0
zio_buf_3584:          3584,      0,      30,      24,     240,   0,   0
zio_data_buf_3584:     3584,      0,     513,       6,     557,   0,   0
zio_buf_4096:          4096,      0,   14509,     204,   38743,   0,   0
zio_data_buf_4096:     4096,      0,    1889,     242,    7024,   0,   0
zio_buf_5120:          5120,      0,       1,     224,     286,   0,   0
zio_data_buf_5120:     5120,      0,     170,     261,     455,   0,   0
zio_buf_6144:          6144,      0,       2,     188,     212,   0,   0
zio_data_buf_6144:     6144,      0,     119,     202,     356,   0,   0
zio_buf_7168:          7168,      0,       1,     141,     208,   0,   0
zio_data_buf_7168:     7168,      0,     111,     163,     317,   0,   0
zio_buf_8192:          8192,      0,       2,     148,    3379,   0,   0
zio_data_buf_8192:     8192,      0,     109,     173,     454,   0,   0
zio_buf_10240:        10240,      0,       4,     223,     279,   0,   0
zio_data_buf_10240:   10240,      0,     136,     264,     479,   0,   0
zio_buf_12288:        12288,      0,       2,     180,    1583,   0,   0
zio_data_buf_12288:   12288,      0,     112,     212,     417,   0,   0
zio_buf_14336:        14336,      0,       0,     131,     169,   0,   0
zio_data_buf_14336:   14336,      0,      86,     148,     286,   0,   0
zio_buf_16384:        16384,      0,   10576,     709,   79591,   0,   0
zio_data_buf_16384:   16384,      0,      87,     189,     364,   0,   0
zio_buf_20480:        20480,      0,       2,     181,     941,   0,   0
zio_data_buf_20480:   20480,      0,     104,     204,     392,   0,   0
zio_buf_24576:        24576,      0,       1,     184,     741,   0,   0
zio_data_buf_24576:   24576,      0,      75,     184,     361,   0,   0
zio_buf_28672:        28672,      0,       2,     108,     658,   0,   0
zio_data_buf_28672:   28672,      0,      51,     118,     228,   0,   0
zio_buf_32768:        32768,      0,       1,      92,     534,   0,   0
zio_data_buf_32768:   32768,      0,      32,      94,     177,   0,   0
zio_buf_40960:        40960,      0,       1,     155,    1207,   0,   0
zio_data_buf_40960:   40960,      0,      81,     175,     379,   0,   0
zio_buf_49152:        49152,      0,       1,     108,    1779,   0,   0
zio_data_buf_49152:   49152,      0,      46,     110,     232,   0,   0
zio_buf_57344:        57344,      0,       0,     130,    2938,   0,   0
zio_data_buf_57344:   57344,      0,      28,      79,     157,   0,   0
zio_buf_65536:        65536,      0,       0,     111,    2045,   0,   0
zio_data_buf_65536:   65536,      0,      18,      86,     763,   0,   0
zio_buf_81920:        81920,      0,       0,     103,    2180,   0,   0
zio_data_buf_81920:   81920,      0,      33,     113,     218,   0,   0
zio_buf_98304:        98304,      0,       0,     114,    3141,   0,   0
zio_data_buf_98304:   98304,      0,      28,      70,     152,   0,   0
zio_buf_114688:      114688,      0,       0,      77,    1704,   0,   0
zio_data_buf_114688: 114688,      0,      26,      63,     129,   0,   0
zio_buf_131072:      131072,      0,     457,    3184,    8388,   0,   0
zio_data_buf_131072: 131072,      0,    1449,    3314,   15030,   0,   0
zio_buf_163840:      163840,      0,       0,       0,       0,   0,   0
zio_data_buf_163840: 163840,      0,       0,       0,       0,   0,   0
zio_buf_196608:      196608,      0,       0,       0,       0,   0,   0
zio_data_buf_196608: 196608,      0,       0,       0,       0,   0,   0
zio_buf_229376:      229376,      0,       0,       0,       0,   0,   0
zio_data_buf_229376: 229376,      0,       0,       0,       0,   0,   0
zio_buf_262144:      262144,      0,       0,       0,       0,   0,   0
zio_data_buf_262144: 262144,      0,       0,       0,       0,   0,   0
zio_buf_327680:      327680,      0,       0,       0,       0,   0,   0
zio_data_buf_327680: 327680,      0,       0,       0,       0,   0,   0
zio_buf_393216:      393216,      0,       0,       0,       0,   0,   0
zio_data_buf_393216: 393216,      0,       0,       0,       0,   0,   0
zio_buf_458752:      458752,      0,       0,       0,       0,   0,   0
zio_data_buf_458752: 458752,      0,       0,       0,       0,   0,   0
zio_buf_524288:      524288,      0,       0,       0,       0,   0,   0
zio_data_buf_524288: 524288,      0,       0,       0,       0,   0,   0
zio_buf_655360:      655360,      0,       0,       0,       0,   0,   0
zio_data_buf_655360: 655360,      0,       0,       0,       0,   0,   0
zio_buf_786432:      786432,      0,       0,       0,       0,   0,   0
zio_data_buf_786432: 786432,      0,       0,       0,       0,   0,   0
zio_buf_917504:      917504,      0,       0,       0,       0,   0,   0
zio_data_buf_917504: 917504,      0,       0,       0,       0,   0,   0
zio_buf_1048576:     1048576,      0,       0,       0,       0,   0,   0
zio_data_buf_1048576: 1048576,      0,       0,       0,       0,   0,   0
zio_buf_1310720:     1310720,      0,       0,       0,       0,   0,   0
zio_data_buf_1310720: 1310720,      0,       0,       0,       0,   0,   0
zio_buf_1572864:     1572864,      0,       0,       0,       0,   0,   0
zio_data_buf_1572864: 1572864,      0,       0,       0,       0,   0,   0
zio_buf_1835008:     1835008,      0,       0,       0,       0,   0,   0
zio_data_buf_1835008: 1835008,      0,       0,       0,       0,   0,   0
zio_buf_2097152:     2097152,      0,       0,       0,       0,   0,   0
zio_data_buf_2097152: 2097152,      0,       0,       0,       0,   0,   0
zio_buf_2621440:     2621440,      0,       0,       0,       0,   0,   0
zio_data_buf_2621440: 2621440,      0,       0,       0,       0,   0,   0
zio_buf_3145728:     3145728,      0,       0,       0,       0,   0,   0
zio_data_buf_3145728: 3145728,      0,       0,       0,       0,   0,   0
zio_buf_3670016:     3670016,      0,       0,       0,       0,   0,   0
zio_data_buf_3670016: 3670016,      0,       0,       0,       0,   0,   0
zio_buf_4194304:     4194304,      0,       0,       0,       0,   0,   0
zio_data_buf_4194304: 4194304,      0,       0,       0,       0,   0,   0
zio_buf_5242880:     5242880,      0,       0,       0,       0,   0,   0
zio_data_buf_5242880: 5242880,      0,       0,       0,       0,   0,   0
zio_buf_6291456:     6291456,      0,       0,       0,       0,   0,   0
zio_data_buf_6291456: 6291456,      0,       0,       0,       0,   0,   0
zio_buf_7340032:     7340032,      0,       0,       0,       0,   0,   0
zio_data_buf_7340032: 7340032,      0,       0,       0,       0,   0,   0
zio_buf_8388608:     8388608,      0,       0,       0,       0,   0,   0
zio_data_buf_8388608: 8388608,      0,       0,       0,       0,   0,   0
zio_buf_10485760:    10485760,      0,       0,       0,       0,   0,   0
zio_data_buf_10485760: 10485760,      0,       0,       0,       0,   0,   0
zio_buf_12582912:    12582912,      0,       0,       0,       0,   0,   0
zio_data_buf_12582912: 12582912,      0,       0,       0,       0,   0,   0
zio_buf_14680064:    14680064,      0,       0,       0,       0,   0,   0
zio_data_buf_14680064: 14680064,      0,       0,       0,       0,   0,   0
zio_buf_16777216:    16777216,      0,       0,       0,       0,   0,   0
zio_data_buf_16777216: 16777216,      0,       0,       0,       0,   0,   0
lz4_cache:            16384,      0,       0,      60,   11178,   0,   0
abd_chunk:             4096,      0,  135271,   61046,  417479,   0,   0
sa_cache:               288,      0,   49787,     445,   56151,   0,   0
dnode_t:                816,      0,   56990,     810,   59265,   0,   0
arc_buf_hdr_t_full:     248,      0,   46633,    1751,  135232,   0,   0
arc_buf_hdr_t_full_crypt:    312,      0,       0,       0,       0,   0,   0
arc_buf_hdr_t_l2only:     96,      0,       0,       0,       0,   0,   0
arc_buf_t:               64,      0,   15053,   20163,  201421,   0,   0
dmu_buf_impl_t:         296,      0,   66021,   15788,  190748,   0,   0
zil_lwb_cache:          360,      0,       3,     206,      30,   0,   0
zil_zcw_cache:           80,      0,       1,    1149,      25,   0,   0
sio_cache_0:            136,      0,       0,       0,       0,   0,   0
sio_cache_1:            152,      0,       0,       0,       0,   0,   0
sio_cache_2:            168,      0,       0,       0,       0,   0,   0
zfs_znode_cache:        472,      0,   49787,     533,   56145,   0,   0
ksiginfo:               112,      0,     312,    4203,    3990,   0,   0
itimer:                 352,      0,       0,       0,       0,   0,   0
KNOTE:                  160,      0,       0,     450,     123,   0,   0
socket:                 872, 8375672,      59,     373,    9833,   0,   0
IPsec SA lft_c:          16,      0,       0,       0,       0,   0,   0
unpcb:                  256, 8375685,      22,    1943,    9328,   0,   0
ipq:                     56,  51262,       0,       0,       0,   0,   0
udp_inpcb:              488, 8375672,      24,     880,     473,   0,   0
udpcb:                   32, 8375750,      24,    7851,     473,   0,   0
tcp_inpcb:              488, 8375672,      12,     428,      24,   0,   0
tcpcb:                  984, 8375672,      12,     184,      24,   0,   0
tcptw:                   88,  27810,       0,       0,       0,   0,   0
syncache:               168,  15364,       0,      69,       1,   0,   0
hostcache:               96,      0,       0,       0,       0,   0,   0
sackhole:                32,      0,       0,     375,       2,   0,   0
tfo:                      4,      0,       0,       0,       0,   0,   0
tfo_ccache_entries:      80,      0,       0,       0,       0,   0,   0
tcpreass:                48, 1020734,       0,       0,       0,   0,   0
tcp_log:                400, 5000000,       0,       0,       0,   0,   0
tcp_log_bucket:         144,      0,       0,       0,       0,   0,   0
tcp_log_node:           120,      0,       0,       0,       0,   0,   0
sctp_ep:               1280, 8375673,       0,       0,       0,   0,   0
sctp_asoc:             2288,  40000,       0,       0,       0,   0,   0
sctp_laddr:              48,  80012,       0,    1826,       7,   0,   0
sctp_raddr:             736,  80000,       0,       0,       0,   0,   0
sctp_chunk:             152, 400010,       0,       0,       0,   0,   0
sctp_readq:             152, 400010,       0,       0,       0,   0,   0
sctp_stream_msg_out:    112, 400015,       0,       0,       0,   0,   0
sctp_asconf:             40, 400000,       0,       0,       0,   0,   0
sctp_asconf_ack:         48, 400060,       0,       0,       0,   0,   0
udplite_inpcb:          488, 8375672,       0,       0,       0,   0,   0
ripcb:                  488, 8375672,       0,       0,       0,   0,   0
rtentry:                208,      0,      19,     608,      21,   0,   0
tcp_rack_map:            64,      0,       0,       0,       0,   0,   0
tcp_rack_pcb:           384,      0,       0,       0,       0,   0,   0
bridge_rtnode:           64,      0,       0,       0,       0,   0,   0
selfd:                   64,      0,      96,    6538,  178908,   0,   0
swpctrie:               144, 32661657,       0,       0,       0,   0,   0
swblk:                  136, 32661656,       0,       0,       0,   0,   0

/boot/loader.conf:
hw.nvme.use_nvd=0
kern.cam.nda.enable_biospeedup=1


Is there anything that might help? This box is not yet in production so I can do patching and reboots at will.

Thanks.
Comment 1 Juraj Lutter 2020-10-26 12:39:44 UTC
While doing any work on this box that involves interrupts:

bge2: watchdog timeout -- resetting
bge2: link state changed to DOWN
bge2: link state changed to UP
bge2: watchdog timeout -- resetting
bge2: link state changed to DOWN
bge2: link state changed to UP
bnxt3: Interface stopped DISTRIBUTING, possible flapping

I can also try to install recent build of CURRENT image (built using release(7))
Comment 2 Juraj Lutter 2020-10-26 12:41:08 UTC
And, for example, locked svnlite looks like:

 4019 root         24    0    30M    20M tx->tx  31   0:07   0.00% svnlite
Comment 3 Juraj Lutter 2020-11-02 16:26:07 UTC
This lockup is *very likely* caused by a faulty SSD attached to LSI card with mrsas driver.

After removal of the respective drive, system does not lock up.

How could be error reporting and handling improved?
Comment 4 Juraj Lutter 2020-11-12 15:57:13 UTC
After SSD has been replaced, `zpool replace` caused the lockup again.

Couldn't this be a problem with interrupts? With hw.pci.enable_msi="0" it behaves (subjectively) better.
Comment 5 Juraj Lutter 2020-11-15 18:28:59 UTC
This is very similar to PR 200459 (see attachments).
Comment 6 Juraj Lutter 2020-11-15 18:29:56 UTC
Created attachment 219708 [details]
nvme
Comment 7 Juraj Lutter 2020-11-15 18:30:32 UTC
Created attachment 219709 [details]
mrsas ocr thread