Bug 218337 - panic: Journal overflow with g_journal_switcher waiting on wswbuf0
Summary: panic: Journal overflow with g_journal_switcher waiting on wswbuf0
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-geom (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-03 16:37 UTC by longwitz
Modified: 2018-02-12 09:53 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description longwitz 2017-04-03 16:37:05 UTC
I am running several servers FreeBSD 10.3-STABLE #3 r307955 using gjournal/gmirror:

 --> gjournal status
                    Name  Status  Components
 mirror/gmbkp4p7.journal     N/A  mirror/gmbkp4p3
                                  mirror/gmbkp4p7
 mirror/gmbkp4p8.journal     N/A  mirror/gmbkp4p4
                                  mirror/gmbkp4p8
 mirror/gmbkp4p9.journal     N/A  mirror/gmbkp4p5
                                  mirror/gmbkp4p9== root@dssbkp4 (pts/2) ->  
 --> gmirror status
         Name    Status  Components
 mirror/gmbkp4  COMPLETE  da10 (ACTIVE)
                          da11 (ACTIVE)
 mirror/gmbkp4p10.journal     N/A  mirror/gmbkp4p6
                                   mirror/gmbkp4p10

I do not use gjournal on other disks because I am aware of the trouble described in PR kern/164252. This configuration runs correct for many years,
all journal partitions have a size of 2 GB.

Now a have got a panic on a test machine with 48 GB RAM:

panic: Journal overflow (id = 1510160743 joffset=1950095360 active=863204864 inactive=1948974592)
cpuid = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0c4ea36410
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0c4ea364c0
vpanic() at vpanic+0x126/frame 0xfffffe0c4ea36500
panic() at panic+0x43/frame 0xfffffe0c4ea36560
g_journal_flush() at g_journal_flush+0x8f9/frame 0xfffffe0c4ea36840
g_journal_flush_send() at g_journal_flush_send+0xd1/frame 0xfffffe0c4ea36880
g_journal_worker() at g_journal_worker+0x170a/frame 0xfffffe0c4ea36bb0
fork_exit() at fork_exit+0x9a/frame 0xfffffe0c4ea36bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0c4ea36bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 27 tid 100182 ]
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
db:0:kdb.enter.panic> watchdog
No argument provided, disabling watchdog
db:0:kdb.enter.panic>  run ddbinfo
db:1:ddbinfo> capture on
db:1:on>  run lockinfo
db:2:lockinfo> show lock Giant
 class: sleep mutex
 name: Giant
 flags: {DEF, RECURSE}
 state: {UNOWNED}
db:2:Giant>  show lockedvnods
Locked vnodes

0xfffff80b7e1acb10: tag ufs, type VREG
    usecount 2, writecount 1, refcount 3781 mountedhere 0
    flags (VI_ACTIVE)
    v_object 0xfffff8085348d600 ref 0 pages 30224 cleanbuf 548 dirtybuf 3230
    lock type ufs: EXCL by thread 0xfffff800117c1960 (pid 7, g_journal switcher, tid 100169)
        ino 18, on dev mirror/gmbkp4p8.journal
db:2:lockedvnods>  show lockchain
thread 100182 (pid 27, g_journal mirror/gm) running on CPU 4
db:2:lockchain>  show sleepchain
thread 100182 (pid 27, g_journal mirror/gm) running on CPU 4
db:1:sleepchain>  show pcpu
cpuid        = 4
dynamic pcpu = 0xfffffe0c9c5dfe00
curthread    = 0xfffff80011e014b0: pid 27 "g_journal mirror/gm"
curpcb       = 0xfffffe0c4ea36cc0
fpcurthread  = none
idlethread   = 0xfffff8000b343960: tid 100007 "idle: cpu4"
curpmap      = 0xffffffff80efa9d8
tssp         = 0xffffffff80f11ab0
commontssp   = 0xffffffff80f11ab0

From kerneldump:

kgdb) bt
#0  doadump (textdump=-1319329552) at pcpu.h:219
#1  0xffffffff80340f05 in db_fncall (dummy1=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>)
    at /usr/src/sys/ddb/db_command.c:568
#2  0xffffffff80340bed in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440
#3  0xffffffff80344e7f in db_script_exec (scriptname=0xfffffe0c4ea360c0 "kdb.enter.panic", warnifnotfound=<value optimized out>) at /usr/src/sys/ddb/db_script.c:302
#4  0xffffffff80344cc3 in db_script_kdbenter (eventname=0x0) at /usr/src/sys/ddb/db_script.c:324
#5  0xffffffff803432eb in db_trap (type=<value optimized out>, code=0) at /usr/src/sys/ddb/db_main.c:230
#6  0xffffffff806df959 in kdb_trap (type=3, code=0, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:656
#7  0xffffffff809391d3 in trap (frame=0xfffffe0c4ea363f0) at /usr/src/sys/amd64/amd64/trap.c:561
#8  0xffffffff8091eb6c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:238
#9  0xffffffff806df0be in kdb_enter (why=0xffffffff80a79f84 "panic", msg=<value optimized out>) at cpufunc.h:63
#10 0xffffffff806a12e6 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:882
#11 0xffffffff806a1193 in panic (fmt=0xffffffff80d24350 "PèŠ\200ÿÿÿÿ\001") at /usr/src/sys/kern/kern_shutdown.c:818
#12 0xffffffff8112bbc9 in g_journal_flush (sc=<value optimized out>) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:344
#13 0xffffffff8112bca1 in g_journal_flush_send (sc=0xfffff80011dee400) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:1443
#14 0xffffffff8112f8ea in g_journal_worker (arg=0xfffff80011dee400) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:2211
#15 0xffffffff8066f99a in fork_exit (callout=0xffffffff8112e1e0 <g_journal_worker>, arg=0xfffff80011dee400, frame=0xfffffe0c4ea36c00) at /usr/src/sys/kern/kern_fork.c:1030
#16 0xffffffff8091f0ae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:613

(kgdb) f 13
#13 0xffffffff8112bca1 in g_journal_flush_send (sc=0xfffff80011dee400)
    at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:1443
1443                    g_journal_add_current(sc, bp);
(kgdb) p *sc
$21 = {sc_id = 1510160743, sc_type = 3 '\003', sc_orig_type = 1 '\001', sc_geom = 0xfffff80011de6b00,
  sc_flags = 64, sc_mtx = {lock_object = {lo_name = 0xffffffff811319f4 "gjournal", lo_flags = 16973824,
      lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, sc_mediasize = 4294966784, sc_sectorsize = 512,
  sc_bio_flush = 3, sc_journal_id = 114030489, sc_journal_next_id = 3023870498, sc_journal_copying = 0,
  sc_journal_offset = 1950095360, sc_journal_previous_id = 2305361505, sc_back_queue = {queue = {
      tqh_first = 0xfffff8094ce6e1f0, tqh_last = 0xfffff8065935cd10}, last_offset = 1945507328,
    insert_point = 0xfffff8065935cc98}, sc_regular_queue = {queue = {tqh_first = 0xfffff808536333e0,
      tqh_last = 0xfffff80836392740}, last_offset = 478609408, insert_point = 0xfffff808363926c8},
  sc_delayed_queue = {queue = {tqh_first = 0xfffff805bf282d90, tqh_last = 0xfffff805f2023360},
    last_offset = 475987968, insert_point = 0xfffff805f20232e8}, sc_delayed_count = 108,
  sc_current_queue = 0x0, sc_current_count = 0, sc_flush_queue = 0xfffff805f14fc3e0, sc_flush_count = 84,
  sc_flush_in_progress = 16, sc_copy_queue = 0x0, sc_copy_in_progress = 0,
  sc_dconsumer = 0xfffff80011dea880, sc_jconsumer = 0xfffff80011deb600, sc_inactive = {jj_queue = 0x0,
    jj_offset = 1948974592}, sc_active = {jj_queue = 0xfffff80a4748baa8, jj_offset = 863204864},
  sc_jstart = 0, sc_jend = 2147483136, sc_callout = {c_links = {le = {le_next = 0x0,
        le_prev = 0xfffffe0000806768}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0,
        tqe_prev = 0xfffffe0000806768}}, c_time = 55532158374, c_precision = 1342177187,
    c_arg = 0xfffff80011dee400, c_func = 0xffffffff8112e150 <g_journal_timeout>, c_lock = 0x0,
    c_flags = 0, c_iflags = 16, c_cpu = 0}, sc_worker = 0xfffff80011df7500, sc_rootmount = 0x0}

(kgdb) p vm_kmem_size
$1 = 50189914112
(kgdb) p g_journal_cache_divisor
$2 = 2
(kgdb) p g_journal_cache_low
$3 = 3258108450
(kgdb) p g_journal_cache_limit
$4 = 3620120576
(kgdb) p g_journal_cache_switch
$5 = 90
(kgdb) p g_journal_force_switch
$6 = 70
(kgdb) p g_journal_cache_used
$7 = 1098408960
(kgdb) p g_journal_switch_time
$8 = 10
(kgdb) p g_journal_switcher_wokenup
$9 = 1
(kgdb) p g_journal_cache_alloc_failures
$10 = 0
(kgdb) p g_journal_cache_misses
$11 = 4
(kgdb) p g_journal_switcher_state
$12 = 0
(kgdb) p g_journal_sync_requested
$13 = 0
(kgdb) p g_journal_stats_journal_full
$14 = 2
(kgdb) p g_journal_switch_time
$15 = 10
(kgdb) p sc->sc_id
$22 = 1510160743
(kgdb) p sc->sc_journal_offset
$23 = 1950095360
(kgdb) p sc->sc_active.jj_offset
$24 = 863204864
(kgdb) p sc->sc_inactive.jj_offset
$25 = 1948974592
(kgdb) p  g_journal_stats_wait_for_copy
$28 = 0

(kgdb) thread 465
[Switching to thread 465 (Thread 100169)]#0  sched_switch (td=0xfffff800117c1960, newtd=<value optimized out>, flags=<value optimized out>)
    at /usr/src/sys/kern/sched_ule.c:1945
1945                    cpuid = PCPU_GET(cpuid);
(kgdb) where
#0  sched_switch (td=0xfffff800117c1960, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
#1  0xffffffff806aabc1 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:491
#2  0xffffffff806ee73a in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:624
#3  0xffffffff806aa5d7 in _sleep (ident=<value optimized out>, lock=<value optimized out>, priority=<value optimized out>, wmesg=<value optimized out>,
    sbt=<value optimized out>, pr=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255
#4  0xffffffff80910c58 in getpbuf (pfreecnt=<value optimized out>) at /usr/src/sys/vm/vm_pager.c:336
#5  0xffffffff8073f453 in cluster_wbuild (vp=0xfffff80b7e1acb10, size=32768, start_lbn=<value optimized out>, len=<value optimized out>, gbflags=<value optimized out>)
    at /usr/src/sys/kern/vfs_cluster.c:840
#6  0xffffffff808d8748 in ffs_syncvnode (vp=0xfffff80b7e1acb10, waitfor=1, flags=<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_vnops.c:298
#7  0xffffffff808d5f6d in ffs_sync (mp=0xfffff8002ba4b000, waitfor=Cannot access memory at address 0x1
) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1591
#8  0xffffffff807618fc in vfs_write_suspend (mp=0xfffff8002ba4b000, flags=<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1877
#9  0xffffffff8112a3e4 in g_journal_switcher (arg=0xffffffff81132f28) at /usr/src/sys/modules/geom/geom_journal/../../../geom/journal/g_journal.c:2964
#10 0xffffffff8066f99a in fork_exit (callout=0xffffffff81129a40 <g_journal_switcher>, arg=0xffffffff81132f28, frame=0xfffffe0c4e49cc00) at /usr/src/sys/kern/kern_fork.c:1030
#11 0xffffffff8091f0ae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:613
#12 0x0000000000000000 in ?? ()
(kgdb) f 4
#4  0xffffffff80910c58 in getpbuf (pfreecnt=<value optimized out>) at /usr/src/sys/vm/vm_pager.c:336
336                                     msleep(pfreecnt, &pbuf_mtx, PVM, "wswbuf0", 0);
(kgdb) list
331             mtx_lock(&pbuf_mtx);
332
333             for (;;) {
334                     if (pfreecnt) {
335                             while (*pfreecnt == 0) {
336                                     msleep(pfreecnt, &pbuf_mtx, PVM, "wswbuf0", 0);
337                             }
338                     }

The panic was caused by a vi-Problem: I tried to insert some bytes per cut/paste into an opened file and vi started (I do not know why) to loop and wrote my copystring over and over again in his address space until vi dumped. The kernel wrote the core dump of vi to /var/core and doing this issued the gjournal panic.

(kgdb) thread 541 (vi)
[Switching to thread 541 (Thread 101086)]#0  sched_switch (td=0xfffff805564a7960, newtd=<value optimized out>, flags=<value optimized out>)
    at /usr/src/sys/kern/sched_ule.c:1945
1945                    cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xfffff805564a7960, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945
#1  0xffffffff806aabc1 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:491
#2  0xffffffff806ee73a in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:624
#3  0xffffffff806aa5d7 in _sleep (ident=<value optimized out>, lock=<value optimized out>, priority=<value optimized out>, wmesg=<value optimized out>,
    sbt=<value optimized out>, pr=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255
#4  0xffffffff80761487 in vn_start_write_locked (mp=0xfffff8002ba4b000, flags=257) at /usr/src/sys/kern/vfs_vnops.c:1673
#5  0xffffffff8075fc23 in vn_start_write (vp=<value optimized out>, mpp=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1725
#6  0xffffffff807605db in vn_rdwr (rw=UIO_WRITE, vp=0xfffff80b7e1acb10, base=<value optimized out>, len=<value optimized out>, offset=<value optimized out>,
    segflg=<value optimized out>, ioflg=<value optimized out>, active_cred=0xfffff8055623b280, file_cred=0x1, aresid=<value optimized out>, td=<value optimized out>)
    at /usr/src/sys/kern/vfs_vnops.c:530
#7  0xffffffff80760d47 in vn_rdwr_inchunks (rw=UIO_WRITE, vp=0xfffff80b7e1acb10, base=<value optimized out>, len=<value optimized out>, offset=<value optimized out>,
    segflg=UIO_USERSPACE, aresid=<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:630
#8  0xffffffff80645e14 in elf64_coredump (td=<value optimized out>, vp=0xfffff80b7e1acb10, limit=<value optimized out>, flags=<value optimized out>)
    at /usr/src/sys/kern/imgact_elf.c:1179
#9  0xffffffff806a4a6c in sigexit (td=0xfffff805564a7960, sig=11) at /usr/src/sys/kern/kern_sig.c:3366
#10 0xffffffff806a50c6 in postsig (sig=<value optimized out>) at /usr/src/sys/kern/kern_sig.c:2911
#11 0xffffffff806f40f7 in ast (framep=<value optimized out>) at /usr/src/sys/kern/subr_trap.c:316
#12 0xffffffff8091fe89 in doreti_ast () at /usr/src/sys/amd64/amd64/exception.S:685
#13 0x0000000801853c00 in ?? ()
#14 0x0000000801853c80 in ?? ()
#15 0x0000062e00000008 in ?? ()
#16 0x0000000000000604 in ?? ()
#17 0x0000000000000002 in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000000801815400 in ?? ()
#20 0x0000000000000008 in ?? ()
#21 0x00007fffffffe190 in ?? ()
#22 0x0000000000000000 in ?? ()

(kgdb) f 4
#4  0xffffffff80761487 in vn_start_write_locked (mp=0xfffff8002ba4b000, flags=257) at /usr/src/sys/kern/vfs_vnops.c:1673
1673                            error = msleep(&mp->mnt_flag, MNT_MTX(mp), mflags,
(kgdb) list
1668                    while ((mp->mnt_kern_flag & MNTK_SUSPEND) != 0) {
1669                            if (flags & V_NOWAIT) {
1670                                    error = EWOULDBLOCK;
1671                                    goto unlock;
1672                            }
1673                            error = msleep(&mp->mnt_flag, MNT_MTX(mp), mflags,
1674                                "suspfs", 0);
1675                            if (error)
1676                                    goto unlock;
1677                    }

After reboot I saw (inode 18 is the inode the gjournal switcher had locked):
== root@dssbkp4 (pts/1) -> ls -li /var/core
total 1704864
18 -rw-------  1 root  wheel  1745289216 29 Mär 22:34 vi-70628.core

=== root@dssbkp4 (pts/1) -> ls -li /var/tmp/vi.recover/
total 395232
80384 -rw-------  1 root  wheel  404533248 29 Mär 22:34 vi.CqtHml

This server has faster disks than my others, maybe the journal space is too low.
But the journal_switcher sits on wswbuf0 (but why ?) and every journal space will be filled bringing the overflow.

Looking at the gjournal variables, we have (/sys/geom/journal/g_journal.c)
  static u_int g_journal_cache_limit = 64 * 1024 * 1024;
  static u_int g_journal_cache_divisor = 2;
  static u_int g_journal_cache_switch = 90;
  static u_int g_journal_cache_low = 0;
and in g_journal_init()  
  if (g_journal_cache_limit > 0) {
           g_journal_cache_limit = vm_kmem_size / g_journal_cache_divisor;
           g_journal_cache_low =
               (g_journal_cache_limit / 100) * g_journal_cache_switch;
  }
But vm_kmem_size is defined u_long. Therefore we have
  g_journal_cache_limit = 3.620.120.576,
but we should have
  g_journal_cache_limit = 50.189.914.112 / 2.

This looks to me like a programming error in g_journal.c, relevant for servers with more RAM than 4 GB.

Can this be respinsible for the panic ?
Comment 1 Peter Holm freebsd_committer freebsd_triage 2017-04-04 17:41:54 UTC
All of the stress2 tests I have panic with "Journal overflow".
Here's an example with an 8g disk with an excessive 50% journal:
https://people.freebsd.org/~pho/stress/log/gjournal5.txt
Comment 2 longwitz 2017-04-06 16:33:51 UTC
Thanks for the extensive example, the information of your kerneldump looks nearly identical to mine. 

I think the problem with u_int/u_long variables in g_journal.c is not the reason for the panic, there is already the PR kern/198500.

The panic has to do with the extra physical memory described in man pbuf(9).
I see the following user of pbufs (I don't have nfs and fuse):

nbuf = 105931
nswbuf = min(nbuf / 4, 256) = 256;

User of pbufs:                 boottime       kerneldump
   md_vnode_pbuf_freecnt          25              25
   smbfs_pbuf_freecnt            129             129
   ncl_pbuf_freecnt              129             129
   cluster_pbuf_freecnt          128               0
   vnode_pbuf_freecnt            129             129
   nsw_rcount                    128             128
   nsw_wcount_sync                64              64

The g_journal_switcher (pid 7) is waiting on channel "wswbuf0"
because all the cluster_pbufs are in use. Looking at the 256 swbufs I found, that 128 are free and 128 are in use by the g_journal_switcher itself. All
the buffers have the same b_bufobj and the same b_iodone address "cluster_callback". One example:

(kgdb) p swbuf[126]
$352 = {b_bufobj = 0xfffff80b7e1acbe0, b_bcount = 131072, b_caller1 = 0x0, b_data = 0xfffffe0c1c2ec000 "", b_error = 0, b_iocmd = 2 '\002', b_ioflags = 0 '\0',
  b_iooffset = 472973312, b_resid = 0, b_iodone = 0xffffffff8073fcb0 <cluster_callback>, b_blkno = 923776, b_offset = 2841903104, b_bobufs = {tqe_next = 0x0,
    tqe_prev = 0x0}, b_vflags = 0, b_freelist = {tqe_next = 0xfffffe0bafe182b8, tqe_prev = 0xffffffff80eaa460}, b_qindex = 0, b_flags = 1677721636,
  b_xflags = 0 '\0', b_lock = {lock_object = {lo_name = 0xffffffff80a829a7 "bufwait", lo_flags = 108199936, lo_data = 0, lo_witness = 0x0},
    lk_lock = 18446744073709551600, lk_exslpfail = 0, lk_timo = 0, lk_pri = 96}, b_bufsize = 131072, b_runningbufspace = 131072,
  b_kvabase = 0xfffffe0c1c2ec000 "", b_kvaalloc = 0x0, b_kvasize = 131072, b_lblkno = 86728, b_vp = 0xfffff80b7e1acb10, b_dirtyoff = 0, b_dirtyend = 131072,
  b_rcred = 0x0, b_wcred = 0x0, b_saveaddr = 0xfffffe0c1c2ec000, b_pager = {pg_reqpage = 0}, b_cluster = {cluster_head = {tqh_first = 0xfffffe0bb1bec410,
      tqh_last = 0xfffffe0bb1bebe20}, cluster_entry = {tqe_next = 0xfffffe0bb1bec410, tqe_prev = 0xfffffe0bb1bebe20}}, b_pages = {0xfffff80c097f97c0,
    0xfffff80c097f9828, 0xfffff80c097f9890, 0xfffff80c097f98f8, 0xfffff80c097f9960, 0xfffff80c097f99c8, 0xfffff80c097f9a30, 0xfffff80c097f9a98,
    0xfffff80c097f9b00, 0xfffff80c097f9b68, 0xfffff80c097f9bd0, 0xfffff80c097f9c38, 0xfffff80c097f9ca0, 0xfffff80c097f9d08, 0xfffff80c097f9d70,
    0xfffff80c097f9dd8, 0xfffff80c097f9e40, 0xfffff80c097f9ea8, 0xfffff80c097f9f10, 0xfffff80c097f9f78, 0xfffff80c097f9fe0, 0xfffff80c097fa048,
    0xfffff80c097fa0b0, 0xfffff80c097fa118, 0xfffff80c11547980, 0xfffff80c115479e8, 0xfffff80c11547a50, 0xfffff80c11547ab8, 0xfffff80c11547b20,
    0xfffff80c11547b88, 0xfffff80c11547bf0, 0xfffff80c11547c58}, b_npages = 32, b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0,
  b_fsprivate3 = 0x0, b_pin_count = 0}

Therefore the g_journal_switcher has all his cluster pbufs in use and waits forever for another one. So the worker thread must panic with overflow.

In cluster_wbuild() I can't see a check for "cluster_pbuf_freecnt > 0" to avoid the hang on "wswbuf0". I wonder why this seems only a problem with gjournal, other components in the kernel also use VFS_SYNC.

I would like to know if this problem can be fixed.
Comment 3 longwitz 2017-04-08 13:40:59 UTC
From my kerneldump I see the backtrace for the g_journal_switcher:

#4  0xffffffff80910c58 in getpbuf (pfreecnt=<value optimized out>) at /usr/src/sys/vm/vm_pager.c:336
#5  0xffffffff8073f453 in cluster_wbuild (vp=0xfffff80b7e1acb10, size=32768, start_lbn=<value optimized out>,
    len=<value optimized out>, gbflags=<value optimized out>) at /usr/src/sys/kern/vfs_cluster.c:840
#6  0xffffffff808d8748 in ffs_syncvnode (vp=0xfffff80b7e1acb10, waitfor=1, flags=<value optimized out>)
    at /usr/src/sys/ufs/ffs/ffs_vnops.c:298
#7  0xffffffff808d5f6d in ffs_sync (mp=0xfffff8002ba4b000, waitfor=Cannot access memory at address 0x1
) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1591
#8  0xffffffff807618fc in vfs_write_suspend (mp=0xfffff8002ba4b000, flags=<value optimized out>)
    at /usr/src/sys/kern/vfs_vnops.c:1877
#9  0xffffffff8112a3e4 in g_journal_switcher (arg=0xffffffff81132f28)

In the source /usr/src/sys/ufs/ffs/ffs_vfsops.c on line 1591 I see the call of ffs_syncvnode() as stated above (frame #7 --> frame #6). But in the source /usr/src/sys/ufs/ffs/ffs_vnops.c on line 298 I see a call of vfs_bio_awrite() and not a call of cluster_wbuild() as stated above (frame #6 --> frame #5).
Looking in the source /usr/src/sys/kern/vfs_bio.c line 1971 I see the function vfs_bio_awrite() which calls cluster_wbuild(). So I feel there is a missing frame in the backtrace give by kgdb for g_journal_switcher.

In my ddb output - amd also in Peters ddb output - I see the same:
getpbuf() at getpbuf+0xb8/frame 0xfffffe0c4e49c8b0
cluster_wbuild() at cluster_wbuild+0x83/frame 0xfffffe0c4e49c950
ffs_syncvnode() at ffs_syncvnode+0x308/frame 0xfffffe0c4e49c9e0
ffs_sync() at ffs_sync+0x1fd/frame 0xfffffe0c4e49ca90
vfs_write_suspend() at vfs_write_suspend+0x19c/frame 0xfffffe0c4e49cad0
g_journal_switcher() at g_journal_switcher+0x9a4/frame 0xfffffe0c4e49cbb0

Please can somebody explain to me what is going on here ?
Comment 4 longwitz 2017-05-08 19:12:21 UTC
With the help of Peters stress2 test I could find out that the reason for the journal overflow panic was a journal of 2 GB which is too small for my new hardware. After going to 8 GB Journal the server does not crash anymore. Also I have understand that the g_journal_switcher sitting on the lock wswbuf0 will be wakened up when the interrupt routine for outstanding I/O becames active.

This PR can be closed.