Bug 275306 - 14.0-RELEASE: ossl(4) causes data corruption on encrypted ZFS filesystems/volumes
Summary: 14.0-RELEASE: ossl(4) causes data corruption on encrypted ZFS filesystems/vol...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks: 14.0-erratas
  Show dependency treegraph
 
Reported: 2023-11-24 13:45 UTC by Lexi Winter
Modified: 2023-12-05 18:43 UTC (History)
6 users (show)

See Also:


Attachments
my loader.conf (958 bytes, text/plain)
2023-11-27 22:46 UTC, Daniel Austin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lexi Winter freebsd_triage 2023-11-24 13:45:39 UTC
everything was working fine on 13.2.

after upgrading to 14.0, starting a jail causes a kernel panic. see dmesg/console output below.

this is reproducible: setting jail_enable="NO" allows the system to start, trying to build packages with poudriere, or just starting a configured jail, causes a panic.

---<<BOOT>>---
Copyright (c) 1992-2023 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.0-RELEASE #15 releng/14.0-n265380-f9716eee8ab4: Fri Nov 24 11:33:16 GMT 2023
    root@hemlock.eden.le-fay.org:/usr/obj/usr/src/amd64.amd64/sys/HEMLOCK amd64
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
VT(efifb): resolution 1280x1024
CPU: AMD Ryzen 7 2700X Eight-Core Processor          (3700.26-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f82  Family=0x17  Model=0x8  Stepping=2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
  Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  AMD Extended Feature Extensions ID EBX=0x1007<CLZERO,IRPerf,XSaveErPtr,IBPB>
  SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16555761664 (15788 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s)
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
Security policy loaded: MAC/ntpd (mac_ntpd)
ioapic0 <Version 2.1> irqs 0-23
ioapic1 <Version 2.1> irqs 24-55
Launching APs: 6 5 1 7 4 2 3
TCP_ratelimit: Is now initialized
TCP Hpts created 8 swi interrupt threads and bound 0 to cpus
random: entropy device external interface
kbd0 at kbdmux0
efirtc0: <EFI Realtime Clock>
efirtc0: registered as a time-of-day clock, resolution 1.000000s
smbios0: <System Management BIOS> at iomem 0xbde25000-0xbde2501e
smbios0: Version: 3.3, BCD Revision: 3.3
ossl0: <OpenSSL crypto>
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256>
acpi0: <ALASKA A M I >
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 350
Event timer "HPET2" frequency 14318180 Hz quality 350
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
acpi_wmi0: <ACPI-WMI mapping> on acpi0
acpi_wmi0: cannot find EC device
acpi_wmi0: Embedded MOF found
ACPI: \134AOD.WQBA: 1 arguments were passed to a non-method ACPI object (Buffer) (20221020/nsarguments-361)
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
amdsmn0: <AMD Family 17h System Management Network> on hostb0
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0
pcib1: <ACPI PCI-PCI bridge> at device 1.1 on pci0
pci1: <ACPI PCI bus> on pcib1
mps0: <Avago Technologies (LSI) SAS2008> port 0xf000-0xf0ff mem 0xfbbc0000-0xfbbc3fff,0xfbb80000-0xfbbbffff irq 24 at device 0.0 on pci1
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
pcib2: <ACPI PCI-PCI bridge> at device 1.3 on pci0
pci2: <ACPI PCI bus> on pcib2
xhci0: <AMD 400 Series USB 3.1 controller> mem 0xfb4a0000-0xfb4a7fff irq 32 at device 0.0 on pci2
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
ahci0: <AHCI SATA controller> mem 0xfb480000-0xfb49ffff irq 33 at device 0.1 on pci2
ahci0: AHCI v1.31 with 8 6Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
pcib3: <ACPI PCI-PCI bridge> irq 34 at device 0.2 on pci2
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> irq 32 at device 0.0 on pci3
pci4: <ACPI PCI bus> on pcib4
vgapci0: <VGA-compatible display> port 0xe000-0xe07f mem 0xfa000000-0xfaffffff,0xd0000000-0xdfffffff,0xf8000000-0xf9ffffff irq 32 at device 0.0 on pci4
vgapci0: Boot video device
pcib5: <ACPI PCI-PCI bridge> irq 33 at device 1.0 on pci3
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> irq 32 at device 4.0 on pci3
pci6: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> irq 33 at device 5.0 on pci3
pci7: <ACPI PCI bus> on pcib7
pcib8: <ACPI PCI-PCI bridge> irq 34 at device 6.0 on pci3
pci8: <ACPI PCI bus> on pcib8
ahci1: <ASMedia ASM1062 AHCI SATA controller> port 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem 0xfb300000-0xfb3001ff irq 34 at device 0.0 on pci8
ahci1: AHCI v1.20 with 2 6Gbps ports, Port Multiplier supported
ahci1: quirks=0xc00000<NOCCS,NOAUX>
ahcich8: <AHCI channel> at channel 0 on ahci1
ahcich9: <AHCI channel> at channel 1 on ahci1
pcib9: <ACPI PCI-PCI bridge> irq 35 at device 7.0 on pci3
pci9: <ACPI PCI bus> on pcib9
pci9: <network, ethernet> at device 0.0 (no driver attached)
pcib10: <ACPI PCI-PCI bridge> at device 3.1 on pci0
pci10: <ACPI PCI bus> on pcib10
ix0: <Intel(R) X540-AT2> mem 0xe0200000-0xe03fffff,0xe0404000-0xe0407fff irq 55 at device 0.0 on pci10
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 8 RX queues 8 TX queues
ix0: Using MSI-X interrupts with 9 vectors
ix0: allocated for 8 queues
ix0: allocated for 8 rx queues
ix0: Ethernet address: b4:96:91:45:a2:14
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: eTrack 0x800005f9 PHY FW V272
ix0: netmap queues/slots: TX 8/2048, RX 8/2048
ix1: <Intel(R) X540-AT2> mem 0xe0000000-0xe01fffff,0xe0400000-0xe0403fff irq 54 at device 0.1 on pci10
ix1: Using 2048 TX descriptors and 2048 RX descriptors
ix1: Using 8 RX queues 8 TX queues
ix1: Using MSI-X interrupts with 9 vectors
ix1: allocated for 8 queues
ix1: allocated for 8 rx queues
ix1: Ethernet address: b4:96:91:45:a2:16
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: eTrack 0x800005f9 PHY FW V272
ix1: netmap queues/slots: TX 8/2048, RX 8/2048
pcib11: <ACPI PCI-PCI bridge> at device 7.1 on pci0
pci11: <ACPI PCI bus> on pcib11
pci11: <encrypt/decrypt> at device 0.2 (no driver attached)
xhci1: <XHCI (generic) USB 3.0 controller> mem 0xfb600000-0xfb6fffff irq 37 at device 0.3 on pci11
xhci1: 64 bytes context size, 64-bit DMA
usbus1 on xhci1
usbus1: 5.0Gbps Super Speed USB v3.0
pcib12: <ACPI PCI-PCI bridge> at device 8.1 on pci0
pci12: <ACPI PCI bus> on pcib12
ahci2: <AMD KERNCZ AHCI SATA controller> mem 0xfb908000-0xfb908fff irq 42 at device 0.2 on pci12
ahci2: AHCI v1.31 with 1 6Gbps ports, Port Multiplier supported with FBS
ahcich10: <AHCI channel> at channel 0 on ahci2
pci12: <multimedia, HDA> at device 0.3 (no driver attached)
intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
smbus0: <System Management Bus> on intsmb0
smb0: <SMBus generic I/O> on smbus0
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
acpi_button0: <Power Button> on acpi0
ns8250: UART FCR is broken
ns8250: UART FCR is broken
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
Timecounter "TSC-low" frequency 1849999267 Hz quality 1000
Timecounters tick every 1.000 msec
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
WARNING: Adding ifaddrs to all fibs has been turned off by default. Consider tuning net.add_addr_allfibs if needed
ugen0.1: <AMD XHCI root HUB> at usbus0
ugen1.1: <AMD XHCI root HUB> at usbus1
uhub0 on usbus0
uhub0: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub1 on usbus1
uhub1: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
hwpmc: SOFT/16/64/0x67<INT,USR,SYS,REA,WRI> TSC/1/64/0x20<REA> K8/16/48/0x1ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA>
Trying to mount root from zfs:zroot/ROOT/default []...
uhub1: 8 ports with 8 removable, self powered
uhub0: 22 ports with 22 removable, self powered
Root mount waiting for: CAM usbus0
ugen0.2: <vendor 0x04d9 daskeyboard> at usbus0
ukbd0 on uhub0
ukbd0: <vendor 0x04d9 daskeyboard, class 0/0, rev 1.10/3.90, addr 1> on usbus0
kbd1 at ukbd0
ums0 on uhub0
ums0: <vendor 0x04d9 daskeyboard, class 0/0, rev 1.10/3.90, addr 1> on usbus0
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
ada0 at ahcich5 bus 0 scbus4 target 0 lun 0
ada0: <KINGSTON SUV400S37240G 0C3FD6SD> ACS-4 ATA SATA 3.x device
ada0: Serial Number 50026B776601F3DC
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 228936MB (468862128 512 byte sectors)
ada1 at ahcich9 bus 0 scbus6 target 0 lun 0
ada1: <CT250MX500SSD1 M3CR045> ACS-3 ATA SATA 3.x device
ada1: Serial Number 2240E67118A8
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada1: Command Queueing enabled
ada1: 238475MB (488397168 512 byte sectors)
cd0 at ahcich0 bus 0 scbus1 target 0 lun 0
cd0: <TSSTcorp CDDVDW TS-H653B SI01> Removable CD-ROM SCSI device
cd0: Serial Number � �;��x�o�Q �\^A�\^P
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
da0 at mps0 bus 0 scbus0 target 4 lun 0
da0: <SEAGATE STCRSEI1CLAR8000 CA01> Fixed Direct Access SPC-5 SCSI device
da0: Serial Number WSD6Z597
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 7630885MB (15628053168 512 byte sectors)
da3 at mps0 bus 0 scbus0 target 9 lun 0
da3: <SEAGATE STCRSEI1CLAR8000 CA01> Fixed Direct Access SPC-5 SCSI device
da3: Serial Number WSD4LE4D
da3: 600.000MB/s transfers
da3: Command Queueing enabled
da3: 7630885MB (15628053168 512 byte sectors)
da1 at mps0 bus 0 scbus0 target 7 lun 0
da1: <ATA WDC WD20EARX-00P AB51> Fixed Direct Access SPC-4 SCSI device
da1: Serial Number WD-WCAZAD107331
da1: 600.000MB/s transfers
da1: Command Queueing enabled
da1: 1907729MB (3907029168 512 byte sectors)
da1: quirks=0x8<4K>
da5 at mps0 bus 0 scbus0 target 11 lun 0
da5: <SEAGATE STCRSEI1CLAR8000 CA01> Fixed Direct Access SPC-5 SCSI device
da5: Serial Number WSD6TJWQ
da5: 600.000MB/s transfers
da5: Command Queueing enabled
da5: 7630885MB (15628053168 512 byte sectors)
da4 at mps0 bus 0 scbus0 target 10 lun 0
da4: <SEAGATE STCRSEI1CLAR8000 CA01> Fixed Direct Access SPC-5 SCSI device
da4: Serial Number WSD6Z6JB
da4: 600.000MB/s transfers
da4: Command Queueing enabled
da4: 7630885MB (15628053168 512 byte sectors)
da2 at mps0 bus 0 scbus0 target 8 lun 0
da2: <ATA TOSHIBA DT01ACA2 ABB0> Fixed Direct Access SPC-4 SCSI device
da2: Serial Number 563J53DAS
da2: 600.000MB/s transfers
da2: Command Queueing enabled
da2: 1907729MB (3907029168 512 byte sectors)
GEOM: da1: the primary GPT table is corrupt or invalid.
GEOM: da1: using the secondary instead -- recovery strongly advised.
GEOM: da2: the primary GPT table is corrupt or invalid.
GEOM: da2: using the secondary instead -- recovery strongly advised.
/etc/rc: WARNING: Kernel dumps will be written to the swap partition without encryption.
Setting hostuuid: 06006b9c-72ba-0000-0000-000000000000.
Setting hostid: 0x3f003717.
GEOM: da1: the primary GPT table is corrupt or invalid.
GEOM: da1: using the secondary instead -- recovery strongly advised.
GEOM: da2: the primary GPT table is corrupt or invalid.
GEOM: da2: using the secondary instead -- recovery strongly advised.
GEOM_ELI: Device ada0p2.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: accelerated software
Loading key for data/ccache from file:///etc/zfs/data.key..
Loading key for data/iscsi from file:///etc/zfs/data.key..
Loading key for data/jail from file:///etc/zfs/data.key..
Loading key for data/jellyfin from file:///etc/zfs/data.key..
Loading key for data/packages from file:///etc/zfs/data.key..
Loading key for data/poudriere from file:///etc/zfs/data.key..
Loading key for data/public/Books from file:///etc/zfs/data.key..
Loading key for data/public/CalibreLibrary from file:///etc/zfs/data.key..
Loading key for data/public/Comics from file:///etc/zfs/data.key..
Loading key for data/public/Films from file:///etc/zfs/data.key..
Loading key for data/public/Miscellaneous from file:///etc/zfs/data.key..
Loading key for data/public/Music from file:///etc/zfs/data.key..
Loading key for data/public/Software from file:///etc/zfs/data.key..
Loading key for data/public/TV from file:///etc/zfs/data.key..
Loading key for data/public/Torrents_new from file:///etc/zfs/data.key..
Loading key for zroot/jail from file:///etc/zfs/data.key..
Starting file system checks:
/dev/gpt/efiboot0: 6 files, 258 MiB free (16519 clusters)
FIXED
/dev/gpt/efiboot0: MARKING FILE SYSTEM CLEAN
Mounting local filesystems:.
Key already loaded for data/ccache.
Key already loaded for data/iscsi.
Key already loaded for data/jail.
Key already loaded for data/jellyfin.
Key already loaded for data/packages.
Key already loaded for data/poudriere.
Key already loaded for data/public/Books.
Key already loaded for data/public/CalibreLibrary.
Key already loaded for data/public/Comics.
Key already loaded for data/public/Films.
Key already loaded for data/public/Miscellaneous.
Key already loaded for data/public/Music.
Key already loaded for data/public/Software.
Key already loaded for data/public/TV.
Key already loaded for data/public/Torrents_new.
Key already loaded for zroot/jail.
Updating CPU Microcode...
CPU: AMD Ryzen 7 2700X Eight-Core Processor          (3700.00-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f82  Family=0x17  Model=0x8  Stepping=2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
  Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  AMD Extended Feature Extensions ID EBX=0x1007<CLZERO,IRPerf,XSaveErPtr,IBPB>
  SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics
Done.
ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg /usr/local/lib/perl5/5.36/mach/CORE /usr/local/lib/samba4
32-bit compatibility ldconfig path:
Setting hostname: hemlock.eden.le-fay.org.
Setting up harvesting: PURE_RDRAND,[CALLOUT],[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
Feeding entropy: .
bridge0: Ethernet address: 58:9c:fc:10:95:57
epair0a: Ethernet address: 02:58:0b:2e:3b:0a
epair0b: Ethernet address: 02:58:0b:2e:3b:0b
epair0a: link state changed to UP
epair0b: link state changed to UP
epair0a
epair1a: Ethernet address: 02:50:6c:db:6f:0a
epair1b: Ethernet address: 02:50:6c:db:6f:0b
epair1a: link state changed to UP
epair1b: link state changed to UP
epair1a
Created clone interfaces: bridge0 epair0a epair0b epair1a epair1b.
lo0: link state changed to UP
ix1: link state changed to UP
ix1: link state changed to DOWN
bridge0: link state changed to UP
ix1: promiscuous mode enabled
epair0a: promiscuous mode enabled
epair1a: promiscuous mode enabled
Starting Network: lo0 ix0 ix1 bridge0 epair0a epair0b epair1a epair1b.
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
ix0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
	ether b4:96:91:45:a2:14
	media: Ethernet autoselect
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4a538b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP>
	ether b4:96:91:45:a2:16
	inet6 fe80::b696:91ff:fe45:a216%ix1 prefixlen 64 scopeid 0x2
	media: Ethernet autoselect
	status: no carrier
	nd6 options=1<PERFORMNUD>
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 58:9c:fc:10:95:57
	inet 10.1.6.13 netmask 0xffffff00 broadcast 10.1.6.255
	inet6 fe80::5a9c:fcff:fe10:9557%bridge0 prefixlen 64 scopeid 0x6
	inet6 2001:8b0:aab5:106::12 prefixlen 64
	id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
	member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 9 priority 128 path cost 2000
	member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 7 priority 128 path cost 2000
	member: ix1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
	        ifmaxaddr 0 port 2 priority 128 path cost 20000
	groups: bridge
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:58:0b:2e:3b:0a
	groups: epair
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair0b: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:58:0b:2e:3b:0b
	groups: epair
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:50:6c:db:6f:0a
	groups: epair
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
epair1b: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:50:6c:db:6f:0b
	groups: epair
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Starting devd.
Starting Network: ix0.
ix0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
	ether b4:96:91:45:a2:14
	media: Ethernet autoselect
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
syscons does not exist in /etc/rc.d or the local startup
directories (/usr/local/etc/rc.d), or is not executable
moused does not exist in /etc/rc.d or the local startup
directories (/usr/local/etc/rc.d), or is not executable
Starting Network: epair0b.
epair0b: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:58:0b:2e:3b:0b
	groups: epair
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Starting Network: epair1b.
epair1b: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 02:50:6c:db:6f:0b
	groups: epair
	media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
Starting pflog.
pflog0: promiscuous mode enabled
2023-11-24T13:15:27.262943+00:00 hemlock.eden.le-fay.org pflogd 62468 - - [priv]: msg PRIV_OPEN_LOG received
add host 127.0.0.1: gateway lo0 fib 0: route already in table
add host 127.0.0.1: gateway lo0 fib 1: route already in table
add host 127.0.0.1: gateway lo0 fib 2: route already in table
add net default: gateway 10.1.6.1 fib 0
add host ::1: gateway lo0 fib 1,2
add host ::1: gateway lo0 fib 0: route already in table
add net fe80::: gateway ::1 fib 0,1,2
add net ff02::: gateway ::1 fib 0,1,2
add net ::ffff:0.0.0.0: gateway ::1 fib 0,1,2
add net ::0.0.0.0: gateway ::1 fib 0,1,2
add net default: gateway 2001:8B0:AAB5:106::1 fib 0
Enabling pfpfctl: DIOCADDRULENV: File exists
/etc/rc: WARNING: Unable to load /etc/pf.conf.
.
Starting nfsuserd.
Starting kdc.
Starting ctld.
Starting kadmind.
Starting gssd.
Creating and/or trimming log files.
Clearing /tmp.
Updating motd:.
Updating /var/run/os-release done.
Starting syslogd.
No core dumps found.
Mounting late filesystems:.
Starting ntpd.
Starting powerd.
Starting dbus.
Starting avahi-daemon.
NFSv4 only server
Starting mountd.
Starting nfsd.
Performing sanity check on Samba configuration: OK
Starting smbd.
ix1: link state changed to UP
Nov 24 13:15:28 hemlock smbd[46314]: [2023/11/24 13:15:28.268818,  0] ../../source3/smbd/server.c:1741(main)
Nov 24 13:15:28 hemlock smbd[46314]:   smbd version 4.16.11 started.
Nov 24 13:15:28 hemlock smbd[46314]:   Copyright Andrew Tridgell and the Samba Team 1992-2022
Starting pushgateway.
Starting rsyncd.
Starting inetd.
Performing sanity check on sshd configuration.
Starting sshd.
Starting cron.
Performing sanity check on nginx configuration:
nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok
nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful
Starting nginx.
Starting node_exporter.
postfix/postfix-script: starting the Postfix mail system
Starting smartd.
Starting background file system checks in 60 seconds.
Starting jails:
tcp_vnet_init: WARNING: unable to initialise TCP stats
lo0: link state changed to UP
panic: VERIFY0(0 == spa_do_crypt_abd(B_TRUE, spa, &zio->io_bookmark, BP_GET_TYPE(bp), BP_GET_DEDUP(bp), BP_SHOULD_BYTESWAP(bp), salt, iv, mac, psize, zio->io_abd, eabd, &no_crypt)) failed (0 == 5)

cpuid = 0
time = 1700831735
KDB: stack backtrace:
#0 0xffffffff8088054d at kdb_backtrace+0x5d
#1 0xffffffff8083bdc1 at vpanic+0x131
#2 0xffffffff80393a4a at spl_panic+0x3a
#3 0xffffffff80506cad at zio_encrypt+0x5ad
#4 0xffffffff80504089 at zio_execute+0x39
#5 0xffffffff808a1431 at taskqueue_run_locked+0x161
#6 0xffffffff808a2372 at taskqueue_thread_loop+0xb2
#7 0xffffffff807fd6b3 at fork_exit+0x73
#8 0xffffffff80b64f6e at fork_trampoline+0xe
Uptime: 41s
Dumping 914 out of 16283 MB:..2%..11%..21%..32%..41%..51%..62%..72%..81%..91%
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2023-11-24 15:42:30 UTC
It looks like ossl is compiled into your custom kernel.  Does the problem reproduce if it's not there?
Comment 2 Lexi Winter freebsd_triage 2023-11-24 16:45:58 UTC
removing "ossl" seems to have fixed the panic.

however, it seems like this has caused permanent damage to several ZFS filesystems / volumes:

root@hemlock:~ # zpool status -v
  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 8 days 05:00:11 with 0 errors on Thu Nov 16 09:41:34 2023
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da1     ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da4     ONLINE       0     0     0
	    da5     ONLINE       0     0     0
	  mirror-2  ONLINE       0     0     0
	    da0     ONLINE       0     0     0
	    da3     ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        data/iscsi/thyme:<0x1>
        data/iscsi/willow0:<0x1>
        /usr/local/poudriere/data/.m/

  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:02:24 with 0 errors on Wed Nov  8 04:43:50 2023
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada0p3  ONLINE       0     0     0
	    ada1p3  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        zroot/jail/amber:<0x0>

is this expected?  is there any way to recover the data?
Comment 3 Lexi Winter freebsd_triage 2023-11-24 16:52:13 UTC
quick update: "zpool scrub zroot" seems to have fixed the error on the zroot pool.  i'm running a scrub on the data pool to see if it fixes the errors there as well, but that will likely take a couple of days to finish.  (this is with ossl removed from the kernel config.)
Comment 4 Daniel Austin 2023-11-26 03:26:21 UTC
If it helps, I also hit this issue but...

* I'm not running jails.
* I did have ossl.ko loaded, which i've now removed.
* I also have corruption on one of my encrypted filesystems now

So I think it's a more generic ZFS issue rather than jail issue - probably just triggered easily with jails due to the extra IO.
Comment 5 Lexi Winter freebsd_triage 2023-11-26 09:36:35 UTC
i updated the title to better reflect the issue, since this doesn't seem to be directly related to jails; jails are just the first thing on this system to do I/O on an encrypted filesystem after boot.

i assume this means the assignee should be updated as well, but i'll let someone else do that since i'm not too familiar with the processes on Bugzilla.
Comment 6 John Baldwin freebsd_committer freebsd_triage 2023-11-27 18:16:57 UTC
Hmm, it's not clear that ossl(4) is generating incorrect output.  The panic is that spa_do_crypt_abd() failed with EIO (5) which means that the request failed with an error for some reason.  It is an encryption request, so the error would have to be some sort of invalid parameter or unsupported length or the like.

Can you say which algorithm the pool is using (AES-GCM or something else)?  AES-GCM had some changes in 14.0 (14.0 added AES-GCM support to ossl(4)).
Comment 7 Daniel Austin 2023-11-27 18:48:43 UTC
(In reply to John Baldwin from comment #6)
In my case, yes my filesystems are using aes-256-gcm encryption
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2023-11-27 19:13:44 UTC
Hmm, I suspect that the problem is that ZFS expects to be able to dispatch multiple operations on a session in parallel, but the ossl AES-GCM implementation maintains some context in the session structure (the session structure is 624 bytes so a bit large for hte stack) and neither ZFS nor opencrypto nor ossl provide serialization.
Comment 9 Lexi Winter freebsd_triage 2023-11-27 19:56:51 UTC
in my case all the pool's encrypted datasets were using aes-256-gcm.
Comment 10 John Baldwin freebsd_committer freebsd_triage 2023-11-27 22:14:11 UTC
BTW, the ZFS encryption code claims that it has to the same buffer for input/output and does a memcpy as a result.  That isn't true in 13.0 and later.  Someone (tm) should fix ZFS to use CSP_F_SEPARATE_OUTPUT on 13.0 and later.  Looks like it could also benefit from using CSP_F_SEPARATE_AAD as well.
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2023-11-27 22:30:21 UTC
(In reply to Daniel Austin from comment #4)
Are/were you using a custom kernel with ossl compiled in?  Or were you simply loading ossl.ko via loader.conf?  When I try the latter, aesni(4) probes first, in which case the bug is not triggered.
Comment 12 Daniel Austin 2023-11-27 22:46:54 UTC
Created attachment 246620 [details]
my loader.conf

I was/am running a generic kernel with ossl.ko loaded via loader.conf.
I've attached my full loader.conf in case anything jumps out at you.

Quite a few things in my loader.conf are no longer needed and are leftover from when the machine was used for a PPPOE gateway for my broadband.  I've just never got around to removing them.

The panic seemed to trigger after mounting the encrypted filesystems and when they had some load applied.  I use them as rsync backup locations so they get a hit of read then write IO as rsync works out what to copy.

If it is of any help, the hardware is a HP Microserver Gen10 plus with a Xeon E-2224 CPU, 32GB ram, 4 x 4TB SATA drives in a raidz2 configuration and i'm using a LACP bundle across 2 igb ports (all default lacp settings).
I'm also using bridge to bridge a layer2 openvpn instance to the bundle.

When i originally had the issue, i had 1 error reported in my zpool (even after a scrub).  Now I have 3 errors reported across 2 filesystems.
Comment 13 Daniel Austin 2023-11-27 22:49:31 UTC
(In reply to Daniel Austin from comment #12)
Oops I just realised i'm not being clear... I'm currently NOT loading ossl.ko via loader.conf which stopped the panics (uptime 1d19h30m)
Comment 14 Mark Johnston freebsd_committer freebsd_triage 2023-11-27 22:54:34 UTC
(In reply to Daniel Austin from comment #12)
Ok, thank you.  It seems that the probe order is somewhat arbitrary: if you load ossl.ko from loader.conf, you may or may not end up using ossl(4) once the system boots up.  GENERIC kernels have aesni(4) as well, and the kernel will use whichever happens to have been probed first.

I was able to reproduce the panic and have a patch which fixes the problem in my testing.  We'll have it released with some other 14.0 errata later this week.

I do not have a solution for the data errors, I apologize.  Anything that was written to an aes-gcm encrypted dataset using ossl on 14.0 cannot be trusted.  (Prior to 14.0, having ossl.ko loaded didn't matter since it didn't implement any ciphers used by OpenZFS.)
Comment 15 Daniel Austin 2023-11-27 22:57:20 UTC
(In reply to Mark Johnston from comment #14)
Thanks for looking into it.
No problem regarding the corruption, it's nothing I can't replace :-)
Comment 16 commit-hook freebsd_committer freebsd_triage 2023-11-29 17:58:53 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5c0dac0b7a012f326edab06ad85aee5ad68ff120

commit 5c0dac0b7a012f326edab06ad85aee5ad68ff120
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-11-29 17:51:55 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-11-29 17:55:51 +0000

    ossl: Keep mutable AES-GCM state on the stack

    ossl(4)'s AES-GCM implementation keeps mutable state in the session
    structure, together with the key schedule.  This was done for
    convenience, as both are initialized together.  However, some OCF
    consumers, particularly ZFS, assume that requests may be dispatched to
    the same session in parallel.  Without serialization, this results in
    incorrect output.

    Fix the problem by explicitly copying per-session state onto the stack
    at the beginning of each operation.

    PR:             275306
    Reviewed by:    jhb
    Fixes:          9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D42783

 sys/crypto/openssl/ossl_aes.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)
Comment 17 Lexi Winter freebsd_triage 2023-12-01 18:18:07 UTC
thanks for the fix.  no worries about the lost data, fortunately the issue only hit datasets that i could easily restore from backup.

out of interest, can you say if "zpool checkpoint" would have averted the data corruption by allowing a rollback to a known good state (with any newer writes lost, of course)?
Comment 18 commit-hook freebsd_committer freebsd_triage 2023-12-02 19:29:25 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=84ef0a84ecaa4f5d9bcfed3ce10c288953491e7e

commit 84ef0a84ecaa4f5d9bcfed3ce10c288953491e7e
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-11-29 17:51:55 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-12-02 19:25:42 +0000

    ossl: Keep mutable AES-GCM state on the stack

    ossl(4)'s AES-GCM implementation keeps mutable state in the session
    structure, together with the key schedule.  This was done for
    convenience, as both are initialized together.  However, some OCF
    consumers, particularly ZFS, assume that requests may be dispatched to
    the same session in parallel.  Without serialization, this results in
    incorrect output.

    Fix the problem by explicitly copying per-session state onto the stack
    at the beginning of each operation.

    PR:             275306
    Reviewed by:    jhb
    Fixes:          9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D42783

    (cherry picked from commit 5c0dac0b7a012f326edab06ad85aee5ad68ff120)

 sys/crypto/openssl/ossl_aes.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)
Comment 19 commit-hook freebsd_committer freebsd_triage 2023-12-05 18:28:42 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9fd62386ad6e6f5c5298cda66c5c1894373e4379

commit 9fd62386ad6e6f5c5298cda66c5c1894373e4379
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-11-29 17:51:55 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-12-04 14:02:05 +0000

    ossl: Keep mutable AES-GCM state on the stack

    ossl(4)'s AES-GCM implementation keeps mutable state in the session
    structure, together with the key schedule.  This was done for
    convenience, as both are initialized together.  However, some OCF
    consumers, particularly ZFS, assume that requests may be dispatched to
    the same session in parallel.  Without serialization, this results in
    incorrect output.

    Fix the problem by explicitly copying per-session state onto the stack
    at the beginning of each operation.

    PR:             275306
    Reviewed by:    jhb
    Fixes:          9a3444d91c70 ("ossl: Add a VAES-based AES-GCM implementation for amd64")
    MFC after:      3 days
    Differential Revision:  https://reviews.freebsd.org/D42783
    Approved by:    so
    Security:       FreeBSD-EN-23:17.ossl

    (cherry picked from commit 5c0dac0b7a012f326edab06ad85aee5ad68ff120)
    (cherry picked from commit 84ef0a84ecaa4f5d9bcfed3ce10c288953491e7e)

 sys/crypto/openssl/ossl_aes.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)
Comment 20 Mark Johnston freebsd_committer freebsd_triage 2023-12-05 18:43:37 UTC
Fixed in 14.0-RELEASE-p2.

(In reply to Lexi from comment #17)
I'm afraid I can't say for sure one way or another.  I would expect it to work since ZFS encryption is per-dataset, but I wouldn't be confident in that without testing.