Bug 127420 - geom: panic: journal overflow on gmirrored gjournal
Summary: geom: panic: journal overflow on gmirrored gjournal
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.1-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2008-09-16 12:10 UTC by ruben
Modified: 2023-06-11 13:01 UTC (History)
2 users (show)

See Also:


Attachments
crashinfo dump of gmirror'ed gjournal panic (139.25 KB, text/plain)
2022-12-11 16:01 UTC, ruben
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description ruben 2008-09-16 12:10:02 UTC
Crash 1

panic: Journal overflow (joffset=180955342336 active=180735900160 inactive=180952868864)
cpuid = 1
Uptime: 40m34s
Physical memory: 4085 MB
Dumping 625 MB:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x200
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x8:0x200
stack pointer           = 0x10:0xffffffffae1ece40
frame pointer           = 0x10:0xffffffffae1ece70
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 47 (g_journal mirror/gm)
trap number             = 12

Crash 2 (with debug kernel)

panic: Journal overflow (joffset=180542946816 active=181305220608 inactive=180542008320)
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x17d
g_journal_flush() at g_journal_flush+0x8cb
g_journal_worker() at g_journal_worker+0x14ce
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffae1edd30, rbp = 0 ---
panic: BUF_UNLOCK 0xffffffff9a26e220 while B_REMFREE is still set.
cpuid = 1
panic: BUF_UNLOCK 0xffffffff9a04b420 while B_REMFREE is still set.
cpuid = 1
Uptime: 20m24s
Physical memory: 4084 MB
Dumping 625 MB:

Unfortunately, dumping doesn't succeed anymore at this stage

Kernel config, the -DEBUG version just includes that file with as extra
options:

options         BREAK_TO_DEBUGGER
options                 INVARIANTS
options                 INVARIANT_SUPPORT
options WITNESS
options WITNESS_KDB
options DIAGNOSTIC

(I had to disable some KASSERTS in sys/geom/geom_io.c as gjournal may alter
some data there it seems, also see
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-08/msg00648.html
)

http://ruben.is.verweg.com/stuff/gjournal-panic/CHASSIS
http://ruben.is.verweg.com/stuff/gjournal-panic/dmesg.boot

The machine is a Sun X2100M2 with 2 x 250Gb SATA drives

Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NOFAILSYNC
GenID: 0
SyncID: 1
ID: 4042519102
Providers:
1. Name: mirror/gm0
   Mediasize: 250055999488 (233G)
   Sectorsize: 512
   Mode: r6w6e8
Consumers:
1. Name: ad4
   Mediasize: 250056000000 (233G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: NONE
   GenID: 0
   SyncID: 1
   ID: 2820405034
2. Name: ad6
   Mediasize: 250056000000 (233G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: NONE
   GenID: 0
   SyncID: 1
   ID: 933275518

Geom name: gjournal 243051746
ID: 243051746
Providers:
1. Name: mirror/gm0s1a.journal
   Mediasize: 3221224960 (3.0G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: mirror/gm0s1a
   Mediasize: 4294967296 (4.0G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 4294966784
   Jstart: 3221224960
   Role: Data,Journal

Geom name: gjournal 3027218344
ID: 3027218344
Providers:
1. Name: mirror/gm0s1d.journal
   Mediasize: 33285996032 (31G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: mirror/gm0s1d
   Mediasize: 34359738368 (32G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 34359737856
   Jstart: 33285996032
   Role: Data,Journal

Geom name: gjournal 1964026446
ID: 1964026446
Providers:
1. Name: mirror/gm0s1e.journal
   Mediasize: 3221224960 (3.0G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: mirror/gm0s1e
   Mediasize: 4294967296 (4.0G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 4294966784
   Jstart: 3221224960
   Role: Data,Journal

Geom name: gjournal 3220754734
ID: 3220754734
Providers:
1. Name: mirror/gm0s1f.journal
   Mediasize: 7516192256 (7.0G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: mirror/gm0s1f
   Mediasize: 8589934592 (8.0G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 8589934080
   Jstart: 7516192256
   Role: Data,Journal

Geom name: gjournal 1120739874
ID: 1120739874
Providers:
1. Name: mirror/gm0s1g.journal
   Mediasize: 180255252480 (168G)
   Sectorsize: 512
   Mode: r1w1e2
Consumers:
1. Name: mirror/gm0s1g
   Mediasize: 181328994816 (169G)
   Sectorsize: 512
   Mode: r1w1e1
   Jend: 181328994304
   Jstart: 180255252480
   Role: Data,Journal

      Name  Status  Components
label/swap     N/A  mirror/gm0s1b
  ufs/root     N/A  mirror/gm0s1a.journal
   ufs/var     N/A  mirror/gm0s1d.journal
   ufs/tmp     N/A  mirror/gm0s1e.journal
   ufs/usr     N/A  mirror/gm0s1f.journal
   ufs/opt     N/A  mirror/gm0s1g.journal


******* Working on device /dev/ad4 *******
parameters extracted from in-core disklabel are:
cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 488375937 (238464 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 1;
        end: cyl 703/ head 254/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

******* Working on device /dev/ad6 *******
parameters extracted from in-core disklabel are:
cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=484514 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 488375937 (238464 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 1;
        end: cyl 703/ head 254/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

# /dev/mirror/gm0s1:
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:  8388608       16    4.2BSD     2048 16384 28528 
  b: 33554432  8388624      swap                    
  c: 488375937        0    unused        0     0         # "raw" part, don't edit
  d: 67108864 41943056    4.2BSD     2048 16384 28528 
  e:  8388608 109051920    4.2BSD     2048 16384 28528 
  f: 16777216 117440528    4.2BSD     2048 16384 28528 
  g: 354158193 134217744    4.2BSD     2048 16384 28528 

/dev/ufs/root on / (ufs, asynchronous, local, gjournal)
devfs on /dev (devfs, local)
/dev/ufs/opt on /opt (ufs, asynchronous, local, gjournal)
/dev/ufs/tmp on /tmp (ufs, asynchronous, local, gjournal)
/dev/ufs/usr on /usr (ufs, asynchronous, local, gjournal)
/dev/ufs/var on /var (ufs, asynchronous, local, gjournal)

Fix: 

Maybe don't run a mirrored gjournal on FreeBSD/amd64 ?
How-To-Repeat: 
on /opt/bonnie, run in parallel

bonnie++ -c 4 -s 4096 -r 4096 -u nobody  -d $PWD

both bonnie processes will stall the system with suspfs/wdrain states until it
panics.

Also building a 1Gb sized nanobsd image will lock during disk install phase on
suspfs/wdrain, but that is not always reproducable: it succeeds about 50% of
the time.

It looks it takes longer to trigger when using the debugging options.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2008-09-17 16:16:09 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 ruben 2008-09-23 10:29:59 UTC
Hi,

I managed to trigger a new panic, still not able to get a proper dump  
but this is a capture from the serial console.

I was running a couple of bonnie++'s before to "exercise" the system.  
At the time of the panic one bonnie and one nanobsd build was running.

I had enabled geom mirror and journal debug sysctl's

Some minutes before the actual panic there was a complaint made by  
fsync, and gjournal not being able to suspend a filesystem.

http://ruben.is.verweg.com/stuff/gjournal-panic/gjournal-textdump-text-only.txt

Regards,
	Ruben
Comment 3 admin 2009-05-28 18:45:24 UTC
I have some problem on my system:
HP$ uname -a
FreeBSD HP.lissyara.su 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri May 22 
22:14:24 MSD 2009 
lissyara@HP.lissyara.su:/usr/obj/usr/src/sys/GENERIC  amd64
HP$
For reproduce, just - make buildkernel.
HP# gjournal list
Geom name: gjournal 1458850558
ID: 1458850558
Providers:
1. Name: ad4s1a.journal
    Mediasize: 158913789440 (148G)
    Sectorsize: 512
    Mode: r1w1e1
Consumers:
1. Name: ad4s1a
    Mediasize: 158913789952 (148G)
    Sectorsize: 512
    Mode: r1w1e1
    Role: Data
2. Name: ad4s1d
    Mediasize: 129303552 (123M)
    Sectorsize: 512
    Mode: r1w1e1
    Jend: 129303040
    Jstart: 0
    Role: Journal

HP#
Comment 4 spartak 2009-07-09 15:22:55 UTC
I have the same problem. FreeBSD 7.2-RELEASE amd64, gjournal on 
gmirrored volume (local drive + geom_gate mirrored). I am trying to make 
something like a HA cluster using freevrrpd, ggate, gmirror and 
gjournal. It generally works, but every time a  server with ggated 
running goes down (I use hardware reset for testing) first ggate0 device 
is removed from gmirrored volume on master as it should, next master 
panics with "gjournal overflow" message.
Comment 5 Alexander Best freebsd_committer freebsd_triage 2010-09-16 19:56:53 UTC
Responsible Changed
From-To: freebsd-fs->freebsd-geom

This one looks more geom than fs related.
Comment 6 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:01:27 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 7 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:18:22 UTC
Keyword: 

    crash

– in lieu of summary line prefix: 

    [panic]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
Comment 8 Graham Perrin freebsd_committer freebsd_triage 2022-12-10 20:15:45 UTC
Is this reproducible with modern versions of the OS?
Comment 9 ruben 2022-12-11 16:01:41 UTC
Created attachment 238711 [details]
crashinfo dump of gmirror'ed gjournal panic

(In reply to Graham Perrin from comment #8)
Unfortunately, it does.

Inside a Freebsd 13.1 zfs VM I created 2 zvols of 5G each, gmirror'ed them and then gjournalled the mirror. The same bonnie++ invocation was used, resulting in the attached crashinfo.

The story nowadays is somewhat different though, for triage purposes: ZFS was still experimental then and a lot of functionality once covered by gmirror/gjournal etc is available with ZFS in a much more stable fashion.

It might be a question whether having these geom classes should be continued instead of using ZFS.

If geom is still considered to be a first-class citizen the bug should probably be explored...