Bug 239352

Summary:

kernel panic - cause as yet undetermined

Product:

Base System

Reporter:

Ronald F. Guilmette <rfg-freebsd>

Component:

kern

Assignee:

freebsd-bugs (Nobody) <bugs>

Status:

New ---

Severity:

Affects Some People

CC:

grahamperrin, markj, mckusick

Priority:

---

Keywords:

crash

Version:

12.0-RELEASE

Hardware:

amd64

OS:

Any

Attachments:

Description	Flags
/v/crash/core.txt.0	none
/v/crash/core.txt.1 file dated 2019-08-04	none
/v/crash/core.txt.2	none
/v/crash/core.txt.3	none
/v/crash/core.txt.4	none
/v/crash/core.txt.5	none
pciconf -lv output	none

Description Ronald F. Guilmette 2019-07-20 20:47:40 UTC

I got a kernel panic on 12.0-RELEASE a couple of days ago.  Fortunately, I was already set up to get the crash file(s).

That's the good news.

The bad news is that now I have no idea what I'm supposed to do with this stuff.  Somebody please instruct me.  I want to be of help to get this bug out of the kernel, if I can.

Comment 1 Mark Johnston freebsd_committer

2019-07-20 20:59:37 UTC

If your system successfully saved a crash dump, you'll see files called /var/crash/core.txt.*.  Attaching them here would be a useful start.

Comment 2 Ronald F. Guilmette 2019-07-20 21:20:13 UTC

Just to be 100% clear, my personal /var partition doesn't have much space on it, so I had previously set the folliowing in my /etc/rc.conf file:

dumpdir="/v/crash"

and that dir is where all of the relevant files from my recent kernel panic are.

There is only one file matching the filename pattern you gave and it is called /v/crash/core.txt.0

It's current content are as follows:

'version' has unknown type; cast it to its declared type
'version' has unknown type; cast it to its declared type
Unable to find matching kernel for /v/crash/vmcore.0

That's it.  Just those three lines.

I don't have any idea what this stuff is actually *supposed* to look like, but offhand I would guess that this ain't it.

What went wrong?  My kernel is 100% stock 12.0-RELEASE.  It has not been fiddled by me at all.  uname -a gives the following:

FreeBSD segfault.tristatelogic.com 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC  amd64

OK, so now what?

Comment 3 Mark Johnston freebsd_committer

2019-07-21 15:09:03 UTC

Do you have kernel debug symbols available under /usr/lib/debug/boot/kernel/?  Do you install the kernel to a non-default location?

Comment 4 Ronald F. Guilmette 2019-07-21 19:25:09 UTC

(In reply to Mark Johnston from comment #3)

The directory /usr/lib/debug/boot/kernel/ exists on my system, but it is devoid of any content.

I did not install the kernel in any non-default location.

Comment 5 Mark Johnston freebsd_committer

2019-07-30 13:41:26 UTC

If you are running stock 12.0-RELEASE, i.e., no patches, you can fetch debug symbols from here: ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/12.0-RELEASE/kernel-dbg.txz

Then crashinfo(8) should be able to produce a usable summary of the panic.

Comment 6 Ronald F. Guilmette 2019-08-03 19:33:39 UTC

Created attachment 206249 [details]
/v/crash/core.txt.0

Comment 7 Ronald F. Guilmette 2019-08-03 19:36:14 UTC

(In reply to Mark Johnston from comment #5)

As requested, I have fetched and unxz'd and untarred the symbols file and then run crashinfo and have now attached the resulting core.txt.0 file.

Please let me know if there's anything else I should do to help with getting to the bottom of this kernel panic.

Comment 8 Mark Johnston freebsd_committer

2019-08-19 16:33:17 UTC

Sorry for the delay in replying.

We panicked because the disk returned an error in response to a write to the UFS journal:

g_vfs_done():ada3p4[WRITE(offset=-512, length=512)]error = 5
panic: cannot reassign paging buffer

That offset however seems strange and suggests a software issue.  Is the panic reproducible, or have you only seen it the one time?

Comment 9 Ronald F. Guilmette 2019-08-19 21:17:37 UTC

Actually, I have had 1 more kernel panic also on August 4.  And once again I got a dump file.  I'm sorry that I have not had time to deal with this since then, but I still do have that dump file.  I will try now to do the steps that I was instructed to do for the last one, and will try to get the new info uploaded so that you can take a look.  Maybe it is the same problem.  Maybe a different one.  I am sure that I do not know.  I hope you will tell me.

Comment 10 Ronald F. Guilmette 2019-08-19 21:24:36 UTC

Created attachment 206712 [details]
/v/crash/core.txt.1 file dated 2019-08-04

Apparently, having a proper set of symbols files already installed on my system means that I don't (and didn't) have to manually run crashdump myself.  This file just appeared automagically following the last kernel panic on August 4.

Comment 11 Ronald F. Guilmette 2019-11-01 06:26:28 UTC

Is anybody ever going to fix this?

I am frankly kind of shocked that apparently there is no manpower available to even fix kernel panics.

I've already posted all of the information that was requested of me, twice.  I've had three or four more kernel panics since then, and the only reason that I haven't posted that additional info is that it is starting to seem like an utter waste of my time to do so.

Is it?

Time was when kernel panics were treated like minor national emergencies.  I guess not so much these day.

A suggestion:  If nobody is going to fix the kernel bugs that already exist, some or all of which may possibly have security implications, then perhaps it would be best to freeze all further kernel development, thereby at least insuring that the probem won't be made any worse than it already is.

Don't get me wrong.  New kernel features are great and always appreciated... at least right up until the moment when the console suddently goes black.

Comment 12 Mark Johnston freebsd_committer

2019-11-11 11:10:49 UTC

(In reply to Ronald F. Guilmette from comment #11)
The second panic report you posted is different, and is occurring in the graphics driver.  It would indeed help to see any other crash reports that were collected.

I see one other report of the same panic, but it doesn't appear to be widespread.  Some more investigation will be needed to track down what's going on.  In the meantime, can you please also:
- provide the version of the DRM drivers you are using ("pkg info | grep drm" output)
- tell us which driver you are using (radeonkms or amdgpu?)

Comment 13 Mark Johnston freebsd_committer

2019-11-11 11:14:03 UTC

Also, would you be willing to share one or more of the kernel dumps with me?  I would basically need a copy of a vmcore from /var/crash and the contents of /boot/modules.

Comment 14 Ronald F. Guilmette 2019-11-11 20:47:33 UTC

Output from: "pkg info | grep drm"

drm-fbsd12.0-kmod-4.16.g20190624 DRM modules for the linuxkpi-based KMS components
drm-kmod-g20181126             Metaport of DRM modules for the linuxkpi-based KMS components
libdrm-2.4.96,1                Userspace interface to kernel Direct Rendering Module services

Comment 15 Ronald F. Guilmette 2019-11-11 20:56:13 UTC

In answer to the question regarding which driver I am using, I do not know how to unambiguously determine the correct answer to this question, however I do have a recollection, from back when I installed this system, that there are something special I had to do which related to "radeonkms", so I do believe that that is the one I am using.

That having been said, I looked at my /boot/loader.conf and my /etc/rc.local and my /etc/rc.conf file and there is nothing in any of those that mentions radeon.

Here are some lines from /var/log/dmesg.yesterday that may perhaps be relevant:

[drm] radeon kernel modesetting enabled.
[drm:radeon_device_init] Unable to find PCI I/O BAR
[drm:radeon_atombios_init] Unable to find PCI I/O BAR; using MMIO for ATOM IIO
[drm] radeon: 1024M of VRAM memory ready
[drm] radeon: 1024M of GTT memory ready.
drmn0: successfully loaded firmware image with name: radeon/CEDAR_pfp.bin
drmn0: successfully loaded firmware image with name: radeon/CEDAR_me.bin
drmn0: successfully loaded firmware image with name: radeon/CEDAR_rlc.bin
drmn0: successfully loaded firmware image with name: radeon/CEDAR_smc.bin
[drm] radeon: dpm initialized
drmn0: successfully loaded firmware image with name: radeon/CYPRESS_uvd.bin
[drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
drmn0: radeon: MSI limited to 32-bit
[drm] radeon: irq initialized.
drmn0: fb0: radeondrmfb frame buffer device
[drm] Initialized radeon 2.50.0 20080528 for drmn0 on minor 0

Comment 16 Ronald F. Guilmette 2019-11-11 21:01:45 UTC

Regarding the several other kernal panics that I have gottns ince "upgrading" to 12.0-RELEASE I have my crash directory set to /v/crash (where I have a lot of space available) and here is a record of the dates of all such:

-rw-r--r--  1 root  rfg      339183 Aug  3 12:29 core.txt.0
-rw-r--r--  1 root  rfg      393230 Aug  4 11:42 core.txt.1
-rw-r--r--  1 root  rfg      337581 Sep  2 22:43 core.txt.2
-rw-r--r--  1 root  rfg      361049 Sep 23 02:34 core.txt.3
-rw-r--r--  1 root  rfg      345748 Sep 28 17:13 core.txt.4
-rw-r--r--  1 root  rfg      377760 Oct  2 06:35 core.txt.5

I have already attached to this bugreport the first two of those.  I will now also attach the final four, numbered 2-5.

Please do let me know if there is any more information I can provide.  These crashes are becoming quite tedious, especially as I seem to loose all of my Firefox open tabs whenever one happens.

Comment 17 Ronald F. Guilmette 2019-11-11 21:05:14 UTC

Created attachment 209075 [details]
/v/crash/core.txt.2

Comment 18 Ronald F. Guilmette 2019-11-11 21:06:42 UTC

Created attachment 209076 [details]
/v/crash/core.txt.3

Comment 19 Ronald F. Guilmette 2019-11-11 21:08:03 UTC

Created attachment 209077 [details]
/v/crash/core.txt.4

Comment 20 Ronald F. Guilmette 2019-11-11 21:08:54 UTC

Created attachment 209078 [details]
/v/crash/core.txt.5

Comment 21 Ronald F. Guilmette 2019-11-11 21:17:35 UTC

(In reply to Mark Johnston from comment #13)

I am willing to share one or more of the vmcore files, as may be useful, however this may be somewhat ardouous to achieve.

I am not on the end of a puny DSL line which gives my something like 756kbps upload speed.  And each of the vmcore files is on the order of 1 GB.

If you can email me direct rfg(at)tristatelogic.com and give me the hostname and directory name for some writable FTP directory, then I will try to copy one or two of these vmcore files to that overnight, while I sleep. (I feel quite sure that each one will take hours and hours to upload.)

Of course I will be happy to provide whatever you need from /boot/modules also.

Comment 22 Mark Johnston freebsd_committer

2019-11-12 15:26:45 UTC

(In reply to Ronald F. Guilmette from comment #15)
Indeed, you are using radeonkms.  This driver is considered legacy by upstream and has been superseded by amdgpu, which is quite stable both in my experience and based on bug reports.

Can you please also provide pciconf -lv output?  Some cards are supported by both radeonkms and amdgpu; I believe Xorg will automatically load radeonkms, but perhaps you can try using amdgpu instead.

(In reply to Ronald F. Guilmette from comment #21)
I will first try using radeonkms on my workstation to see if I can reproduce any crashes.

Comment 23 Ronald F. Guilmette 2019-11-12 21:19:49 UTC

Created attachment 209113 [details]
pciconf -lv output

Comment 24 Ronald F. Guilmette 2019-11-12 21:22:44 UTC

(In reply to Mark Johnston from comment #22)

I have provided pciconf -lv output as requested.

I have no idea how to switch from radeonkms to amdgpu so specific instructions would be appreciated.

Are 100% of the crashes I have reported due to my use of radeonkms ?

Comment 25 Mark Johnston freebsd_committer

2019-11-13 15:08:58 UTC

(In reply to Ronald F. Guilmette from comment #24)
Thanks.  Unfortunately it seems that amdgpu won't support this card.

On my end, I have yet to trigger any panics in radeonkms.  There is some discussion of similar crashes happening on -current now, and I'm hoping we can make some progress there.

Most of your crashes are in radeonkms.  Crash 0 appears to be a bug in our handling of a disk write error.  I'm not sure about crash 4.  It could be the result of inconsistent UFS metadata following the other crashes you've seen, so you might consider running a full fsck on the system's filesystems.