244685 – gjournal: kernel: panic: Journal overflow

Bug 244685 - gjournal: kernel: panic: Journal overflow

Summary: gjournal: kernel: panic: Journal overflow

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	12.1-RELEASE
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	freebsd-geom (Nobody)

URL:
Keywords:	crash, needs-qa

Depends on:
Blocks:

Reported:	2020-03-09 00:36 UTC by Paul
Modified:	2024-06-03 02:28 UTC (History)
CC List:	4 users (show)

See Also:

Attachments
dmesg output (6.30 KB, text/plain) 2020-03-09 03:05 UTC, Paul	no flags	Details
`pciconf -lv` oputput (3.22 KB, text/plain) 2020-03-09 03:06 UTC, Paul	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Paul 2020-03-09 00:36:03 UTC

Trying to copy huge file to gjournaled filesystem causes reboot just after a few seconds, if vfs.write_behind=0, and works fine with vfs.write_behind=1

Tested in such conditions under 12.1-RELEASE and 11.3-RELEASE:

#/dev/gpt/ada0-data - 200Gb on SATA disk
#/dev/gpt/nvd0-log - 1Gb on NVME disk

gjournal label -vh /dev/gpt/ada0-data /dev/gpt/nvd0-log

mount -o "async,atime=off" /dev/gpt/ada0-data.journal /dst
mount -o "async,atime=off" /dev/gpt/nvd0-data /src

dd if=/dev/random of=/src/60gb bs=1m count=60000
sysctl vfs.write_behind=0
cp /src/60gb /dst/


I didn't test it neither with slow /src device, nor with fast /dst device.

With bigger log device, we can get panic later, and with huge enough log device (I tried 30Gb), it can work fine without reboots.

Comment 1 Paul 2020-03-09 01:08:02 UTC

(In reply to Paul from comment #0)
sorry, but it seems I was hastened to conclusions.
right now, have caught a reboot with vfs.write_behind=1

Comment 2 Kubilay Kocak freebsd_committer

2020-03-09 02:53:53 UTC

@Paul Could you provide additional information including:

- /var/run/dmesg.boot output (as an attachment)
- pciconf -lv output as an attachment

Also, 

- So you see any panic messages on screen prior to the reboot?

You can try setting kern.panic_reboot_wait_time=-1 as a sysctl or loader tunable

- Are there any relevant error messages in dmesg or /var/log/messages you can include?

Comment 3 Kubilay Kocak freebsd_committer

2020-03-09 02:55:03 UTC

You can also try debug.debugger_on_panic=1 if not already set

Comment 4 Paul 2020-03-09 03:05:46 UTC

Created attachment 212271 [details]
dmesg output

Comment 5 Paul 2020-03-09 03:06:49 UTC

Created attachment 212272 [details]
`pciconf -lv` oputput

Comment 6 Paul 2020-03-09 03:29:25 UTC

(In reply to Kubilay Kocak from comment #2)
Sorry, but I have no access to console. It is remote server.
Usually, there are no messages in /var/log/messages 
Only twice I've found a message like this:
Mar  9 01:44:12 test kernel: panic: Journal overflow (id = 617901877 joffset=450395648 active=260258304 inactive=447780864)

Comment 7 Paul 2020-03-09 03:47:47 UTC

with vfs.write_behind=1 it seems work mostly ok, but I reproduce a crash easy, just when I try to overwrite the same file:
cp /src/60gb /dst/ # done ok
cp /src/60gb /dst/ # rebooted after 5 seconds


And... I tried to reproduce this case with the same file, but using nvme disk of same size with gjournal log of the same size as destination.

with vfs.write_behind=1 I repeated `cp /src/60gb /fast/` several times, and all is ok.
but with vfs.write_behind=0, it works more stable, but at least once I've seen a reboot.

Comment 8 Catwoolfii 2020-05-11 09:53:41 UTC

Good afternoon.
I also faced a similar problem. Dealing with this question, I considered the following directives of the kernel:

kern.geom.journal.cache.limit
kern.geom.journal.switch_time

Fact is that variable "kern.geom.journal.cache.limit" sets size of buffer for logging, and it seems that default size is very large. 
Second Directive is "kern.geom.journal.switch_time" specifies frequency at which the journal will be switched. I tried to disable this option altogether, because the log switching is also controlled by parameter "kern.geom.parameter.journal.force_switch"

As a result I use following values of the directives described above:

kern.geom.journal.cache.limit=134217728
kern.geom.journal.switch_time=0