Bug 244685

Summary:

gjournal: kernel: panic: Journal overflow

Product:

Base System

Reporter:

Paul <spy>

Component:

kern

Assignee:

freebsd-geom (Nobody) <geom>

Status:

Open ---

Severity:

Affects Some People

CC:

fs, lwhsu, yyv83

Priority:

---

Keywords:

crash, needs-qa

Version:

12.1-RELEASE

Hardware:

Any

OS:

Any

Attachments:

Description	Flags
dmesg output	none
`pciconf -lv` oputput	none

Description Paul 2020-03-09 00:36:03 UTC

Trying to copy huge file to gjournaled filesystem causes reboot just after a few seconds, if vfs.write_behind=0, and works fine with vfs.write_behind=1

Tested in such conditions under 12.1-RELEASE and 11.3-RELEASE:

#/dev/gpt/ada0-data - 200Gb on SATA disk
#/dev/gpt/nvd0-log - 1Gb on NVME disk

gjournal label -vh /dev/gpt/ada0-data /dev/gpt/nvd0-log

mount -o "async,atime=off" /dev/gpt/ada0-data.journal /dst
mount -o "async,atime=off" /dev/gpt/nvd0-data /src

dd if=/dev/random of=/src/60gb bs=1m count=60000
sysctl vfs.write_behind=0
cp /src/60gb /dst/


I didn't test it neither with slow /src device, nor with fast /dst device.

With bigger log device, we can get panic later, and with huge enough log device (I tried 30Gb), it can work fine without reboots.

Comment 1 Paul 2020-03-09 01:08:02 UTC

(In reply to Paul from comment #0)
sorry, but it seems I was hastened to conclusions.
right now, have caught a reboot with vfs.write_behind=1

Comment 2 Kubilay Kocak freebsd_committer

2020-03-09 02:53:53 UTC

@Paul Could you provide additional information including:

- /var/run/dmesg.boot output (as an attachment)
- pciconf -lv output as an attachment

Also, 

- So you see any panic messages on screen prior to the reboot?

You can try setting kern.panic_reboot_wait_time=-1 as a sysctl or loader tunable

- Are there any relevant error messages in dmesg or /var/log/messages you can include?

Comment 3 Kubilay Kocak freebsd_committer

2020-03-09 02:55:03 UTC

You can also try debug.debugger_on_panic=1 if not already set

Comment 4 Paul 2020-03-09 03:05:46 UTC

Created attachment 212271 [details]
dmesg output

Comment 5 Paul 2020-03-09 03:06:49 UTC

Created attachment 212272 [details]
`pciconf -lv` oputput

Comment 6 Paul 2020-03-09 03:29:25 UTC

(In reply to Kubilay Kocak from comment #2)
Sorry, but I have no access to console. It is remote server.
Usually, there are no messages in /var/log/messages 
Only twice I've found a message like this:
Mar  9 01:44:12 test kernel: panic: Journal overflow (id = 617901877 joffset=450395648 active=260258304 inactive=447780864)

Comment 7 Paul 2020-03-09 03:47:47 UTC

with vfs.write_behind=1 it seems work mostly ok, but I reproduce a crash easy, just when I try to overwrite the same file:
cp /src/60gb /dst/ # done ok
cp /src/60gb /dst/ # rebooted after 5 seconds


And... I tried to reproduce this case with the same file, but using nvme disk of same size with gjournal log of the same size as destination.

with vfs.write_behind=1 I repeated `cp /src/60gb /fast/` several times, and all is ok.
but with vfs.write_behind=0, it works more stable, but at least once I've seen a reboot.

Comment 8 Catwoolfii 2020-05-11 09:53:41 UTC

Good afternoon.
I also faced a similar problem. Dealing with this question, I considered the following directives of the kernel:

kern.geom.journal.cache.limit
kern.geom.journal.switch_time

Fact is that variable "kern.geom.journal.cache.limit" sets size of buffer for logging, and it seems that default size is very large. 
Second Directive is "kern.geom.journal.switch_time" specifies frequency at which the journal will be switched. I tried to disable this option altogether, because the log switching is also controlled by parameter "kern.geom.parameter.journal.force_switch"

As a result I use following values of the directives described above:

kern.geom.journal.cache.limit=134217728
kern.geom.journal.switch_time=0