Bug 244685 - gjournal: kernel: panic: Journal overflow
Summary: gjournal: kernel: panic: Journal overflow
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-geom (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2020-03-09 00:36 UTC by Paul
Modified: 2020-10-29 08:21 UTC (History)
3 users (show)

See Also:


Attachments
dmesg output (6.30 KB, text/plain)
2020-03-09 03:05 UTC, Paul
no flags Details
`pciconf -lv` oputput (3.22 KB, text/plain)
2020-03-09 03:06 UTC, Paul
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul 2020-03-09 00:36:03 UTC
Trying to copy huge file to gjournaled filesystem causes reboot just after a few seconds, if vfs.write_behind=0, and works fine with vfs.write_behind=1

Tested in such conditions under 12.1-RELEASE and 11.3-RELEASE:

#/dev/gpt/ada0-data - 200Gb on SATA disk
#/dev/gpt/nvd0-log - 1Gb on NVME disk

gjournal label -vh /dev/gpt/ada0-data /dev/gpt/nvd0-log

mount -o "async,atime=off" /dev/gpt/ada0-data.journal /dst
mount -o "async,atime=off" /dev/gpt/nvd0-data /src

dd if=/dev/random of=/src/60gb bs=1m count=60000
sysctl vfs.write_behind=0
cp /src/60gb /dst/


I didn't test it neither with slow /src device, nor with fast /dst device.

With bigger log device, we can get panic later, and with huge enough log device (I tried 30Gb), it can work fine without reboots.
Comment 1 Paul 2020-03-09 01:08:02 UTC
(In reply to Paul from comment #0)
sorry, but it seems I was hastened to conclusions.
right now, have caught a reboot with vfs.write_behind=1
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2020-03-09 02:53:53 UTC
@Paul Could you provide additional information including:

- /var/run/dmesg.boot output (as an attachment)
- pciconf -lv output as an attachment

Also, 

- So you see any panic messages on screen prior to the reboot?

You can try setting kern.panic_reboot_wait_time=-1 as a sysctl or loader tunable

- Are there any relevant error messages in dmesg or /var/log/messages you can include?
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2020-03-09 02:55:03 UTC
You can also try debug.debugger_on_panic=1 if not already set
Comment 4 Paul 2020-03-09 03:05:46 UTC
Created attachment 212271 [details]
dmesg output
Comment 5 Paul 2020-03-09 03:06:49 UTC
Created attachment 212272 [details]
`pciconf -lv` oputput
Comment 6 Paul 2020-03-09 03:29:25 UTC
(In reply to Kubilay Kocak from comment #2)
Sorry, but I have no access to console. It is remote server.
Usually, there are no messages in /var/log/messages 
Only twice I've found a message like this:
Mar  9 01:44:12 test kernel: panic: Journal overflow (id = 617901877 joffset=450395648 active=260258304 inactive=447780864)
Comment 7 Paul 2020-03-09 03:47:47 UTC
with vfs.write_behind=1 it seems work mostly ok, but I reproduce a crash easy, just when I try to overwrite the same file:
cp /src/60gb /dst/ # done ok
cp /src/60gb /dst/ # rebooted after 5 seconds


And... I tried to reproduce this case with the same file, but using nvme disk of same size with gjournal log of the same size as destination.

with vfs.write_behind=1 I repeated `cp /src/60gb /fast/` several times, and all is ok.
but with vfs.write_behind=0, it works more stable, but at least once I've seen a reboot.
Comment 8 Yuran 2020-05-11 09:53:41 UTC
Good afternoon.
I also faced a similar problem. Dealing with this question, I considered the following directives of the kernel:

kern.geom.journal.cache.limit
kern.geom.journal.switch_time

Fact is that variable "kern.geom.journal.cache.limit" sets size of buffer for logging, and it seems that default size is very large. 
Second Directive is "kern.geom.journal.switch_time" specifies frequency at which the journal will be switched. I tried to disable this option altogether, because the log switching is also controlled by parameter "kern.geom.parameter.journal.force_switch"

As a result I use following values of the directives described above:

kern.geom.journal.cache.limit=134217728
kern.geom.journal.switch_time=0