275731 – panic in zfs: VERIFY(buf[i] == 0) failed

Bug 275731 - panic in zfs: VERIFY(buf[i] == 0) failed

Summary: panic in zfs: VERIFY(buf[i] == 0) failed

Status:	New

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	arm64 Any

Importance:	--- Affects Only Me
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:	crash

Depends on:
Blocks:

Reported:	2023-12-12 20:03 UTC by John F. Carr
Modified:	2023-12-14 14:05 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description John F. Carr 2023-12-12 20:03:04 UTC

About 20 minutes into a poudriere run my 4 core arm64 system crashed with

VERIFY(buf[i] == 0) failed

See sys/contrib/openzfs/module/zfs/dbuf.c line 1192.

I am running CURRENT including the change from llvm16 to llvm17 (c711af7727824da79d87f375f3d6829feec3799a), but nothing newer.

On the console I saw the text below.  I have a crash dump.
The temporary zfs filesystems created by poudriere still exist after reboot.

cpuid = 2
time = 1702333198
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
spl_panic() at spl_panic+0x44
dbuf_verify() at dbuf_verify+0x820
dbuf_rele_and_unlock() at dbuf_rele_and_unlock+0x38
arc_read_done() at arc_read_done+0x5d4
zio_done() at zio_done+0xd08
zio_nowait() at zio_nowait+0xe8
arc_read() at arc_read+0x157c
dbuf_read() at dbuf_read+0xbd0
dmu_tx_check_ioerr() at dmu_tx_check_ioerr+0xc4
dmu_tx_count_write() at dmu_tx_count_write+0x198
dmu_tx_hold_write_by_dnode() at dmu_tx_hold_write_by_dnode+0x6c
zfs_write() at zfs_write+0x440
zfs_freebsd_write() at zfs_freebsd_write+0x3c
VOP_WRITE_APV() at VOP_WRITE_APV+0xac
vn_write() at vn_write+0x304
vn_io_fault_doio() at vn_io_fault_doio+0x50
vn_io_fault1() at vn_io_fault1+0x144
vn_io_fault() at vn_io_fault+0x194
dofilewrite() at dofilewrite+0x80
kern_writev() at kern_writev+0x54
sys_write() at sys_write+0x88
do_el0_sync() at do_el0_sync+0x58c
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x56000000

Comment 1 John F. Carr 2023-12-14 14:05:03 UTC

The object that failed verification prints as

*db =
  {db = {
      db_object = 144497, db_offset = 0, db_size = 512, 
      db_data = 0xffffa00184020a00},
   db_objset = 0xffffa00004ac3000, 
   db_dnode_handle = 0xffffa0019f8cd618, db_parent = 0xffffa0015a5557f8, 
   db_hash_next = 0x0,
   db_link = {avl_child = {0x0, 0x0}, avl_pcb = 1}, db_blkid = 0, 
   db_blkptr = 0xffff0000f9cab240, db_level = 0 '\000',
   db_rwlock = {
     lock_object = {
       lo_name = 0xffff000000c16eaf "db->db_rwlock",
       lo_flags = 577830912, lo_data = 0, 
       lo_witness = 0x0}, sx_lock = 1},
   db_buf = 0xffffa000bb982480,
   db_mtx = {
     lock_object = {
       lo_name = 0xffff000000bf395d "db->db_mtx", lo_flags = 577830912, 
       lo_data = 0, lo_witness = 0x0},
     sx_lock = 18446462602811342848}, 
   db_state = DB_CACHED,
   db_holds = {
     rc_count = 2, rc_mtx = {
       lock_object = {
         lo_name = 0xffff000000c8cca3 "rc->rc_mtx",
         lo_flags = 577830912, lo_data = 0, lo_witness = 0x0},
       sx_lock = 1},
     rc_tree = {
       avl_root = 0x0, 
       avl_compar = 0xffff000000162ee0 <zfs_refcount_compare>,
       avl_offset = 0, 
       avl_numnodes = 0},
     rc_removed = {
       list_size = 48, list_offset = 0,
       list_head = {
         list_next = 0xffffa0003d1c3c40, list_prev = 0xffffa0003d1c3c40}
     },
     rc_removed_count = 0, rc_tracked = 0},
   db_changed = {
     cv_description = 0xffff000000c0b27b "db->db_changed", cv_waiters = 0},
   db_data_pending = 0x0,
   db_dirty_records = {
     list_size = 408, list_offset = 40, 
     list_head = {
       list_next = 0xffffa0003d1c3c80, list_prev = 0xffffa0003d1c3c80}},
   db_cache_link = {list_next = 0x0, list_prev = 0x0},
   db_caching_status = DB_NO_CACHE, 
   db_hash = 3780604374730761540, db_user = 0x0,
   db_user_immediate_evict = 0 '\000', 
   db_freed_in_flight = 0 '\000', db_pending_evict = 0 '\000',
   db_dirtycnt = 0 '\000', 
   db_partial_read = 1 '\001'}

The db_data field points to something kgdb can't read from the core dump ("Cannot access memory at address 0xffffa00184020a00").