Bug 268835 - Use BUS_PASS_SUPPORTDEV + BUS_PASS_ORDER_MIDDLE (updated: as committed) for bcm_dma (for RPi* contexts)
Summary: Use BUS_PASS_SUPPORTDEV + BUS_PASS_ORDER_MIDDLE (updated: as committed) for b...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Some People
Assignee: Mitchell Horne
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-08 23:54 UTC by Mark Millard
Modified: 2023-08-20 02:14 UTC (History)
4 users (show)

See Also:


Attachments
main [so: 14]: Use BUS_PASS_INTERRUPT+BUS_PASS_ORDER_LATE for bcm_dma (530 bytes, patch)
2023-01-08 23:54 UTC, Mark Millard
no flags Details | Diff
stable/13: Use BUS_PASS_INTERRUPT+BUS_PASS_ORDER_LATE for bcm_dma (563 bytes, patch)
2023-01-08 23:56 UTC, Mark Millard
no flags Details | Diff
An attempt to declare dependency with MODULE_DEPEND (25.20 KB, patch)
2023-02-19 21:00 UTC, Dmitry Salychev
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Millard 2023-01-08 23:54:57 UTC
Created attachment 239352 [details]
main [so: 14]: Use BUS_PASS_INTERRUPT+BUS_PASS_ORDER_LATE for bcm_dma

Use BUS_PASS_INTERRUPT+BUS_PASS_ORDER_LATE for bcm_dma
for both main [so: 14] and stable/13 .

I'll also be adding an attachment for the stable/13 case
since stable/13 and main do not match in the very few lines
involved.

This avoids crashes for booting with newer RPi* firmware
than is in sysutils/rpi-firmware . But this is not a proposal
for updating sysutils/rpi-firmware at all.

I've my own interest in testing something with newer firmware.
I've also seen reports about *.dtb files that do not exist
in sysutils/rpi-firmware . At some point someone might put in
the work to add support for such RPi* devices. If anyone does,
they likely would like to avoid the stage if dealing with the
boot-crash that now happens.

My test context had:

"C0T" 8 GiByte Rev1.5 RPi4B
"B0T" 8 GiByte Rev1.4 and 4 GiByte Rev1.1 RPI4Bs
RPi3B
RPi2B v1.2
RPi2B v1.1 (so: armv7)

and some non-RPi*'s:

ThreadRipper 1950X (so an amd64)
HoneyComb
MACCHIATObin Double Shot
Rock64
OrangePi+2ed (so: armv7)

For aarch64 and armv7, I test the same boot media both
on RPi*'s and at least one non-RPi*  My testing spanned
main and stable/13 examples.

I've tried multiple tagged RPI* firmware releases and none
lead to a crash with the change in place. The non-RPI*'s
continued to work normally.

For reference, all the boot media are now at (picking an
example system):

# uname -apKU # Long output line split for readability
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #75
main-n259950-5723e5ac6d76-dirty: Fri Jan  6 01:50:53 PST 2023     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400077 1400077
Comment 1 Mark Millard 2023-01-08 23:56:35 UTC
Created attachment 239353 [details]
stable/13: Use BUS_PASS_INTERRUPT+BUS_PASS_ORDER_LATE for bcm_dma
Comment 2 Mark Millard 2023-01-18 01:34:44 UTC
I probably should have selected the arm component, which is more
specific than kernel in some resects. So change it now.

(I used a modern firmware context to ask a question, instead of
trying to ask based on the vintage that FreeBSD currently uses.
This was based on that I've got the patches in place to allow
such a modern context to boot and operate.)
Comment 3 Mitchell Horne freebsd_committer freebsd_triage 2023-02-13 15:06:49 UTC
Hi Mark,

The change itself is sensible. I trust that the fix works for you, but I wouldn't be able to write a commit message describing "why" based on what's written here.

You did not describe the failure mode. Does "crash" mean a boot-time hang, or a panic? If it's a panic, did you collect a backtrace? If it's a hang, where does it hang?

In other words: how did you discover what needed to be changed?
Comment 4 Mark Millard 2023-02-13 18:34:56 UTC
(In reply to Mitchell Horne from comment #3)

There is a history of my messages to the lists about the
problem, going back into 2022-Apr, mostly freebsd-arm for
the problem and, more recently, freebsd-hackers asking for
help in solving the problem. (No replies.) From knowing crash
details published in in 2022-Apr to knowing the fix here took
me something like 8 months of trying to figure it out.

The 2022-Apr freebsd-arm messages are somewhat messy as I
explored firmware versions and made operator errors.

Crash here means the attempt to dereference a small offset
from a NULL pointer.

https://lists.freebsd.org/archives/freebsd-arm/2022-April/001274.html
has the original report with many console logs with
backtraces, various firmware versions being tested. (Later
messages for a while repeated this material.)

https://lists.freebsd.org/archives/freebsd-arm/2022-April/001278.html
is where I first showed the failing behavior at the source
code level and the related (textual form of) DTB content
for what was different between older firmware that did not
lead to a crash vs. newer firmware that did.

The message sequence also has material about other issues
as I isolated things. (Part of the messy status.) Also a
false report from accidentally deleting a file without
noticing at the time --followed later by message(s) about
that mistake.

There is nothing to show for how I figured out the change.
With what I'd gradually partially understood, a message
about some other fix that I read turned out to be somewhat
related in how it fixed something (not at all specific to
RPi*'s). It lead to me the technique and that it applied
to Device Tree (fdt) contexts for having resources ready
in time. So: Found by analogy to something else that I saw.

My first try was too early and produced messages about that.
I then figured out the BUS_PASS_INTERRUPT dependency and
and that BUS_PASS_ORDER_LATE was no longer too early.
Comment 5 Mark Millard 2023-02-17 19:47:14 UTC
(In reply to Mitchell Horne from comment #3)

Were my notes and referenced material in comment #4
sufficient for your purposes?
Comment 6 Mitchell Horne freebsd_committer freebsd_triage 2023-02-17 20:20:30 UTC
(In reply to Mark Millard from comment #5)

Yes, thank you very much. 

I took a look at the relevant code and it should be upgraded to handle this error gracefully, rather than the NULL-dereference it does now. I will handle this follow-up change.

Did you test with anything later than BUS_PASS_INTERRUPT? It seems like the only requirement is for the bcm_dma driver to attach before sdhci_bcm, so BUS_PASS_SUPPORTDEV would be slightly more appropriate, if it works.
Comment 7 Mark Millard 2023-02-17 23:06:58 UTC
(In reply to Mitchell Horne from comment #6)

> Did you test with anything later than BUS_PASS_INTERRUPT?

No.

> It seems like the only requirement is for the bcm_dma driver
> to attach before sdhci_bcm, so BUS_PASS_SUPPORTDEV would be
> slightly more appropriate, if it works.

My view was "as early as reasonable" would cover all current and
potential future uses of bcm_dma, in part covering my overall,
fairly-general, ignorance. I do not have the background to
make detailed judgments of best-placement after that: I
barely discovered BUS_PASS_INTERRUPT and BUS_PASS_ORDER_LATE as
it was. I've no general knowledge of the sequencing or its
criteria.

To me, the mere presence that of that "if it works" part still
suggests the existing place: needing the experiment suggests
more risk, and leaves me wondering how general such an experiment
would be, a question I do not have the background to answer on
my own.

If you want to do better than I've done, I've no objection.
Comment 8 Emmanuel Vadot freebsd_committer freebsd_triage 2023-02-17 23:21:11 UTC
(In reply to Mitchell Horne from comment #6)

BUS_PASS_SUPPORTDEV was created for this so yes I think that this would be better to use that :)
Comment 9 Mark Millard 2023-02-17 23:48:10 UTC
(In reply to Mitchell Horne from comment #6)
(In reply to Emmanuel Vadot from comment #8)

Just for my edification . . .

So I gather from both your comments that for:

#define BUS_PASS_ROOT           0       /* Used to attach root0. */
#define BUS_PASS_BUS            10      /* Buses and bridges. */
#define BUS_PASS_CPU            20      /* CPU devices. */
#define BUS_PASS_RESOURCE       30      /* Resource discovery. */
#define BUS_PASS_INTERRUPT      40      /* Interrupt controllers. */
#define BUS_PASS_TIMER          50      /* Timers and clocks. */
#define BUS_PASS_SCHEDULER      60      /* Start scheduler. */
#define BUS_PASS_SUPPORTDEV     100000  /* Drivers which support DEFAULT drivers. */
. . .

BUS_PASS_SUPPORTDEV is not intended to indicate or require
any dependence on BUS_PASS_TIMER or any on BUS_PASS_SCHEDULER.
(Sound right?)

(My first attempt had used BUS_PASS_RESOURCE, on generic
thinking of dma as a resource, but that was too early because
of dependency on context from BUS_PASS_INTERRUPT .)

Also sounds like the expectation is that no timer and no
scheduler would be likely to depend on bcm_dma. Similarly,
BUS_PASS_ORDER_* would be used to cover if any other
BUS_PASS_SUPPORTDEV instances use needed to use bcm_dma.
(Sound right?)
Comment 10 Mark Millard 2023-02-18 22:54:41 UTC
(In reply to Mitchell Horne from comment #6)

This time I've not gone through much of the:

(6 RPi variants) * (12 RPi firmware versions) == 72 tests for main
(and similarly for stable/13)

But, for the little bit that I've tested, BUS_PASS_SUPPORTDEV
seems to have worked just fine. I've only tested main this
time. Only a BOT 8 GiByte RPi4B. Only a couple versions of
firmware.


(The 6 accessible RPi variants do not actually cover much of
a range. Half are RPi4B 's: 4 GiByte vs. 8 GiByte and, for
8 GiByte, B0T vs. C0T. The others: RPi3B, RPi2B v1.2, and
RPi2B v1.1 [so: armv7].)

(12 was originally from after ignoring the likes of
firmware versions that were less than 2 weeks before the
next version (avoiding quickly replaced). Also ignoring
earlier versions that had the same *.dtb files by content,
even if the time frame between versions was longer. Also,
nothing older than the sysutils/rpi-firmware port contents
[1.20210303] was ever tested.)
Comment 11 Mark Millard 2023-02-18 23:00:55 UTC
(In reply to Mark Millard from comment #10)

I should have listed that the context I used was:
(long output line split for readability)

# uname -apKU
FreeBSD CA72_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #76
main-n259950-5723e5ac6d76-dirty: Sat Feb 18 13:17:54 PST 2023
root@CA72_UFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400077 1400077

5723e5ac6d76 dates back to 2023-Jan-06.
Comment 12 Mitchell Horne freebsd_committer freebsd_triage 2023-02-19 15:00:41 UTC
(In reply to Mark Millard from comment #9)

> Also sounds like the expectation is that no timer and no
> scheduler would be likely to depend on bcm_dma. Similarly,
> BUS_PASS_ORDER_* would be used to cover if any other
> BUS_PASS_SUPPORTDEV instances use needed to use bcm_dma.
> (Sound right?)

Yes, exactly right. At present we have one device (sdhci_bcm) that depends on bcm_dma, and it attaches at BUS_PASS_DEFAULT. Therefore any earlier pass will ensure the dependency is met, even BUS_PASS_DEFAULT - 1 (but we would not use that).

If someone were to add or change a driver in the future that requires bcm_dma to be attached earlier, then it will be their responsibility to make that tweak.

(In reply to Mark Millard from comment #10)

Thank you for your thorough testing. I will commit the change this week.
Comment 13 Mark Millard 2023-02-19 16:02:51 UTC
(In reply to Mark Millard from comment #10)

I should have noted that booting 8 GiByte RPi4B's was
broken by the change to use 2023.01 of U-Boot (without
extra patching). So my testing was dependent on my
local enabling of booting by having a different U-Boot
context.

(U-Boot itself is what gives up for FreeBSD standard
2023.01 and the UEFI style of booting, later stages do
not get a chance.)

Thus there is more involved in trying to do more such
testing.
Comment 14 Mark Millard 2023-02-19 17:05:53 UTC
(In reply to Mitchell Horne from comment #12)

> If someone were to add or change a driver in the future
> that requires bcm_dma to be attached earlier, then it
> will be their responsibility to make that tweak.

I find it interesting that the normal criteria are to
structure things such that they require such rework to be
involved, even when there is an alternative that makes
such rework less likely across potential future changes
(so far as I can tell, anyway).
Comment 15 Dmitry Salychev freebsd_committer freebsd_triage 2023-02-19 19:35:31 UTC
I've been following your discussion for some time already and there is one thing which concerns me. There's a MODULE_DEPEND(9) macro which can be used to hint the kernel that sdhci_bcm depends on bcm_dma without tweaking their initialization order. Am I missing something?
Comment 16 Mark Millard 2023-02-19 19:52:10 UTC
(In reply to Dmitry Salychev from comment #15)

The lack of the required initialization ordering is
the original problem at run time. Unless MODULE_DEPEND
controled that ordering appropriately, the run-time
problem would still occur as far as I can tell.
Comment 17 Dmitry Salychev freebsd_committer freebsd_triage 2023-02-19 21:00:32 UTC
Created attachment 240265 [details]
An attempt to declare dependency with MODULE_DEPEND

Could you give the attached patch a try? I've RPi 2B v1.1, but don't have different device trees at hand to check quickly.
Comment 18 Mark Millard 2023-02-19 21:55:18 UTC
(In reply to Dmitry Salychev from comment #17)

Quoting: https://man.freebsd.org/cgi/man.cgi?query=MODULE_DEPEND&apropos=0&sektion=0&manpath=FreeBSD+14.0-CURRENT&arch=default&format=html

     The MODULE_DEPEND() macro provides	hints to the kernel loader(8) and to
     the kernel	linker to ensure that the named	dependency is loaded prior to
     the existing module.  It does not change or dictate the order in which
     modules are initialized at	runtime.

Note that last statement: no contribution to order of runtime
initialization.

bcm_dma has to be run-time initialized before it can be put to
use by anything else. Otherwise there is a global holding a
NULL pointer still in place and that NULL ends up being used as
the basis for a dereference that fails.

The material in question is all builtin to the kernel, none of
what is involved is loaded via an external module. For example,
teh below is from the system in question running:

# kldstat
you have mail
Id Refs Address                Size Name
 1    1 0xffff000000000000  12e9c80 kernel

This is not a load-order or load-time issue at all: everything
is loaded and present before the code at issue is started.

I'm in the middle of building and upgrading the FreeBSD systems
to get past a problem that I see has been reported that I could
easily hit in my normal use. I do not plan on doing the requested
experiment.
Comment 19 Mark Millard 2023-02-19 22:14:12 UTC
(In reply to Dmitry Salychev from comment #17)

Also . . .

As far as I know, testing on a RPi2B or RPi3B variant
with official .dtb files from the RPi* folks will not
happen to show the problem, just like older *.dtb files
for BCM2711 based parts do not lead to the problem
showing up. Only newer *.dtb files for the BCM2711 parts
changed the ordering in the *.dtb files such that the
initialization order actually happens out of order in
the FreeBSD kernel.

(Both orders for the *.dtb files are valid orders as
far as I could find: no .dtb file content constraints
violated by either of them.)

I have to use RPi4B's that I have access to in order
to see failure, the RPi2B v1.1, RPi2B v1.2, and RPi3B
do not ever show failure as I remember. This is driven
by .dtb file content and its ordering.
Comment 20 commit-hook freebsd_committer freebsd_triage 2023-02-24 17:23:45 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9873b171697033f9f19608d98bcd1c16cacb92af

commit 9873b171697033f9f19608d98bcd1c16cacb92af
Author:     Mark Millard <marklmi26-fbsd@yahoo.com>
AuthorDate: 2023-02-17 20:30:35 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2023-02-24 17:20:40 +0000

    bcm_dma: attach at an earlier bus pass

    The sdhci_bcm driver attach routine relies on bcm_dma already being
    attached, in order to allocate a DMA channel. However, both drivers
    attached at the default pass so this is not guaranteed. Newer RPI
    firmware exposes this assumption, and the result is a NULL-dereference
    in bcm_dma_allocate().

    To fix this, use BUS_PASS_SUPPORTDEV for bcm_dma.

    PR:             268835
    Reviewed by:    mhorne
    MFC after:      1 week

 sys/arm/broadcom/bcm2835/bcm2835_dma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 21 Mark Millard 2023-03-03 02:23:58 UTC
(In reply to commit-hook from comment #20)

Summary: Testing today's snapshot worked fine for
booting with some modern RPi* firmware that replaced
the vintage that FreeBSD has in the snapshot.


Note: The following was done with a 4 GiByte RPI4B in
order to avoid having a 8 GiByte RPi4B used with the
U-Boot 2023.01 that FreeBSD is using: 2023.01 fails
to load FreeBSD's EFI loader for 8 GiByte RPi4B,
reporting a message that is a misnomer for the actual
internal problem.

I first downloaded:

http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/14.0/FreeBSD-14.0-CURRENT-arm64-aarch64-RPI-20230302-005cca8361a4-261233.img.xz

and put the .img on a USB3 SSD via dd and booted the
result on a RPi4B. That worked fine.

I then shutdown and replaced the FreeBSD vintage of
RPi* firmware with the firmware tagged 1.20230106
and booted that.

It worked fine.
Comment 22 commit-hook freebsd_committer freebsd_triage 2023-03-07 16:55:48 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5b6edfc577fbb4e9703c112713c7fb472e144346

commit 5b6edfc577fbb4e9703c112713c7fb472e144346
Author:     Mark Millard <marklmi26-fbsd@yahoo.com>
AuthorDate: 2023-02-17 20:30:35 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2023-03-07 16:55:01 +0000

    bcm_dma: attach at an earlier bus pass

    The sdhci_bcm driver attach routine relies on bcm_dma already being
    attached, in order to allocate a DMA channel. However, both drivers
    attached at the default pass so this is not guaranteed. Newer RPI
    firmware exposes this assumption, and the result is a NULL-dereference
    in bcm_dma_allocate().

    To fix this, use BUS_PASS_SUPPORTDEV for bcm_dma.

    PR:             268835
    Reviewed by:    mhorne
    MFC after:      1 week

    (cherry picked from commit 9873b171697033f9f19608d98bcd1c16cacb92af)

 sys/arm/broadcom/bcm2835/bcm2835_dma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 23 Mark Millard 2023-03-07 17:58:12 UTC
(In reply to commit-hook from comment #22)

Thanks much.

It ended up being just a little late for releng/13.2 to avoid
the firmware version limitation, not a lot late like I'd
originally expected.
Comment 24 dfr 2023-05-20 13:28:18 UTC
Would it make sense to add a variant of the sysutils/rpi-firmware port which can be pinned to the last version before the devices were re-ordered upstream (1.20210805, I think)? This would allow FreeBSD versions without the fix (12.4, 13.1 and 13.2) to use compatible firmware while also allowing more recent firmware for 13-stable and 14.
Comment 25 Mark Millard 2023-08-20 02:14:51 UTC
It has been discovered that the modern RPI3B+'s with the
newer Power Management I.C. (PMIC) can not use the older
RPi* firmware ( start*.elf and fixup*.dat files ) but
the FreeBSD kernel for the likes of 13.2-RELEASE and
before can not handle the *.dtb's that are in the modern
RPi* firmware distribution. (They lack the fix in this
bugzilla.)

Thus there is the extra step updating the msdosfs
start*.elf and fixup*.dat files on the 13.2-RELEASE
media but not updating the *.dtb files on that media.

It is not obvious which overlays/* can tolerate the
combination.  Some might have to be from the older
RPi* firmware and others from the modern RPi* firmware
and neither vintage may work for yet others.