Bug 282134 - [ext2fs] watchdogd fired (with one hour timeout)
Summary: [ext2fs] watchdogd fired (with one hour timeout)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Doug Moore
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-10-16 07:47 UTC by Peter Holm
Modified: 2024-10-26 12:57 UTC (History)
4 users (show)

See Also:


Attachments
Handle a range from negative to positive (529 bytes, patch)
2024-10-18 08:52 UTC, Doug Moore
no flags Details | Diff
diagnostic patch (961 bytes, patch)
2024-10-18 16:56 UTC, Doug Moore
no flags Details | Diff
deal with negative values; sort the list (2.45 KB, patch)
2024-10-19 07:47 UTC, Doug Moore
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Holm freebsd_committer freebsd_triage 2024-10-16 07:47:17 UTC
20241016 00:36:10 all (763/970): ext3fs.sh
Oct 16 00:36:58 mercat1 kernel: pid 58404 (swap), jid 0, uid 2007, was killed: failed to reclaim memory
Oct 16 00:37:03 mercat1 kernel: pid 58389 (swap), jid 0, uid 2007, was killed: failed to reclaim memory
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01d903d4e0
hardclock() at hardclock+0x103/frame 0xfffffe01d903d520
handleevents() at handleevents+0xaf/frame 0xfffffe01d903d560
timercb() at timercb+0x18e/frame 0xfffffe01d903d5b0
lapic_handle_timer() at lapic_handle_timer+0xab/frame 0xfffffe01d903d5d0
Xtimerint() at Xtimerint+0xb1/frame 0xfffffe01d903d5d0
--- interrupt, rip = 0xffffffff829c59b2, rsp = 0xfffffe01d903d6a0, rbp = 0xfffffe01d903d710 ---
ext2_htree_split_dirblock() at ext2_htree_split_dirblock+0xb2/frame 0xfffffe01d903d710
ext2_htree_add_entry() at ext2_htree_add_entry+0x233/frame 0xfffffe01d903d890
ext2_direnter() at ext2_direnter+0xac/frame 0xfffffe01d903da50
ext2_makeinode() at ext2_makeinode+0x128/frame 0xfffffe01d903daa0
ext2_create() at ext2_create+0x2c/frame 0xfffffe01d903dac0
VOP_CREATE_APV() at VOP_CREATE_APV+0x5f/frame 0xfffffe01d903dae0
vn_open_cred() at vn_open_cred+0x3f9/frame 0xfffffe01d903dc60
openatfp() at openatfp+0x287/frame 0xfffffe01d903ddb0
sys_openat() at sys_openat+0x3d/frame 0xfffffe01d903dde0
filemon_wrapper_openat() at filemon_wrapper_openat+0x12/frame 0xfffffe01d903de00
amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe01d903df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01d903df30
--- syscall (499, FreeBSD ELF64, openat), rip = 0xcb2e6d966fa, rsp = 0xcb2e5a793d8, rbp = 0xcb2e5a79490 ---
KDB: enter: watchdog timeout

https://people.freebsd.org/~pho/stress/log/log0554.txt

Seems easy to reproduce:
cd /usr/src/tools/test/stress2/misc
./all.sh ext3fs.sh
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2024-10-16 16:09:57 UTC
Presumably the problem is a directory entry with ep->e2d_reclen == 0, so the first loop in ext2_htree_split_dirblock() never terminates.
Comment 2 Peter Holm freebsd_committer freebsd_triage 2024-10-18 05:41:33 UTC
A commit search for when the ext2fs problems (log0554.txt and log0555.txt) were introduced, shows:

10/14 Kevin Bowling   (2,9K) git: 7763b194d8de - main - igc: txrx function prototype cleanup    OK
10/14 Doug Moore      (3,2K) git: 2c8caa4b3925 - main - vfs_subr: optimize inval_buf_range      FAIL
Comment 3 Doug Moore 2024-10-18 08:52:55 UTC
Created attachment 254324 [details]
Handle a range from negative to positive

A first guess is that somebody is asking to invalidate a range from a negative lower bound to a positive upper bound.  I added a fix for that case, and the ext3fs.sh test seems fine for me.
Comment 4 Peter Holm freebsd_committer freebsd_triage 2024-10-18 11:40:48 UTC
(In reply to Doug Moore from comment #3)
The patch did not seem to make any difference to what I see:
https://people.freebsd.org/~pho/stress/log/log0556.txt
Comment 5 Doug Moore 2024-10-18 16:56:52 UTC
Created attachment 254341 [details]
diagnostic patch

Another patch, intended to diagnose and not likely to fix.  I can't reproduce the problem because I can't install ext2.  I'll examine the results after this new assertion fails.
Comment 6 Peter Holm freebsd_committer freebsd_triage 2024-10-18 17:45:26 UTC
(In reply to Doug Moore from comment #5)
You need to install the e2fsprogs package to run the ext2 tests.
Comment 7 Peter Holm freebsd_committer freebsd_triage 2024-10-19 05:24:42 UTC
I have not been able to trigger the assertion in your latest diagnostic patch.  I still see the same issues as before.
Comment 8 Doug Moore 2024-10-19 07:47:33 UTC
Created attachment 254350 [details]
deal with negative values; sort the list

I've had some success with this patch.
Comment 9 Peter Holm freebsd_committer freebsd_triage 2024-10-19 11:23:27 UTC
(In reply to Doug Moore from comment #8)
Yes, I no longer see any issues with the ext2fs tests.
Comment 10 Doug Moore 2024-10-19 17:10:40 UTC
https://reviews.freebsd.org/D47200 is posted for review to address this bug.
Comment 11 commit-hook freebsd_committer freebsd_triage 2024-10-22 21:59:04 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e2414d91d33f31d6f2c9f49eef7a1553b5798c9e

commit e2414d91d33f31d6f2c9f49eef7a1553b5798c9e
Author:     Doug Moore <dougm@FreeBSD.org>
AuthorDate: 2024-10-22 21:54:34 +0000
Commit:     Doug Moore <dougm@FreeBSD.org>
CommitDate: 2024-10-22 21:54:34 +0000

    vfs_subr: maintain sorted tailq

    Pctries are based on unsigned index values. Type daddr_t is
    signed. Using daddr_t as an index type for a pctrie works, except that
    the pctrie considers negative values greater than nonnegative
    ones. Building a sorted tailq of bufs, based on pctrie results, sorts
    negative daddr_ts larger than nonnegative ones, and makes code that
    depends on the tailq being actually sorted broken.

    Write wrappers for the functions that do pctrie operations that depend
    on index ordering that fix the order problem, and use them in place of
    direct pctrie operations.

    PR:             282134
    Reported by:    pho
    Reviewed by:    kib, markj
    Tested by:      pho
    Fixes: 2c8caa4b3925aa7335 vfs_subr: optimize inval_buf_range
    Differential Revision:  https://reviews.freebsd.org/D47200

 sys/kern/vfs_subr.c | 56 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 12 deletions(-)