Bug 253990 - NTB driver causes panic: page fault
Summary: NTB driver causes panic: page fault
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2021-03-03 16:48 UTC by KO
Modified: 2021-06-22 20:38 UTC (History)
4 users (show)

See Also:


Attachments
Core Text Dump (83.39 KB, text/plain)
2021-03-03 16:48 UTC, KO
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description KO 2021-03-03 16:48:15 UTC
Created attachment 222945 [details]
Core Text Dump

The NTB driver panics the system when booting from the FreeBSD13-Beta4 USB image and after you have it installed unless you disable it.

FreeBSD xbox6 13.0-BETA4 FreeBSD 13.0-BETA4 #0 releng/13.0-n244592-e32bc253629: Fri Feb 26 06:17:34 UTC 2021 


Workaround:
Add hint.ntb_hw.0.disabled="1" to /boot/loader.conf
Comment 1 Mark Johnston freebsd_committer 2021-06-16 15:39:00 UTC
Looks like the problem is that amd_ntb_init_isr() modifies ntb->hw_info->db_count, but ntb->hw_info is a pointer to read-only memory.

The bug seems to have come in with:
https://cgit.freebsd.org/src/commit/?id=e67b122307344b9583d75cca2e9a292df76c0a19
It probably went unnoticed since we did not enforce mapping protections for amd64 kernel modules until:
https://cgit.freebsd.org/src/commit/?id=1d9eae9fb2e2253ca3d3764a5cc7f124b10e358b

But since the hw_info table is global it seems incorrect for a driver attach routine to modify it.
Comment 2 Alexander Motin freebsd_committer 2021-06-16 18:04:37 UTC
I am definitely agree that ntb->hw_info->db_count assignment in amd_ntb_init_isr() is logically incorrect.  May be it could be per-instance.  I have documentation not hardware for the AMD NTB, but I guess the proper solution may instead be to implement multiple doorbells with single/few interrupt vectors instead of reducing their count.  For example, PLX NTB driver uses single legacy IRQ to implement 16 doorbells.  Changing number of doorbells depending on attach errors may cause a problem for upper layers, when they try to use expected number of doorbells and won't find them.