Bug 278234 - panic: pfs_add_node(): homonymous siblings, because linsysfs assumes pci domain 0000
Summary: panic: pfs_add_node(): homonymous siblings, because linsysfs assumes pci doma...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-emulation (Nobody)
URL:
Keywords:
: 279581 (view as bug list)
Depends on:
Blocks: 247219
  Show dependency treegraph
 
Reported: 2024-04-07 13:40 UTC by Zeev Zilberman
Modified: 2025-02-27 09:13 UTC (History)
4 users (show)

See Also:


Attachments
Fix for the panic, but the layout does not match Linux (10.08 KB, patch)
2024-04-07 20:29 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zeev Zilberman 2024-04-07 13:40:59 UTC
When booting on a system with 2 pci domains (3 and 11) and linux compatibility layer enabled on arm64 coherent system, 
crash was observed with the following signature:

Mounting local filesystems:.M
panic: pfs_add_node(): homonymous siblingsM
cpuid = 150M
time = 1711898170M
KDB: stack backtrace:M
db_trace_self() at db_trace_selfM
db_trace_self_wrapper() at db_trace_self_wrapper+0x38M
vpanic() at vpanic+0x1a4M
panic() at panic+0x48M
pfs_add_node() at pfs_add_node+0x16cM
pfs_create_dir() at pfs_create_dir+0xc0M
linsysfs_run_bus() at linsysfs_run_bus+0x36cM
linsysfs_run_bus() at linsysfs_run_bus+0x23cM
linsysfs_run_bus() at linsysfs_run_bus+0x23cM
linsysfs_run_bus() at linsysfs_run_bus+0x23cM
linsysfs_run_bus() at linsysfs_run_bus+0x23cM
linsysfs_run_bus() at linsysfs_run_bus+0x23cM
linsysfs_init() at linsysfs_init+0x1b4M
pfs_init() at pfs_init+0xb0M
vfs_modevent() at vfs_modevent+0x368M
module_register_init() at module_register_init+0xb4M
linker_load_module() at linker_load_module+0xac8M
kern_kldload() at kern_kldload+0x190M
sys_kldload() at sys_kldload+0x64M
do_el0_sync() at do_el0_sync+0x59cM
handle_el0_sync() at handle_el0_sync+0x48M
--- exception, esr 0x56000000M
KDB: enter: panicM
[ thread pid 115 tid 101679 ]M
Stopped at      kdb_enter+0x4c: str     xzr, [x19, #1408]M

This is caused by sys/compat/linsysfs/linsysfs.c:linsysfs_init() hardcoding pci domain 0:

...
        /* /sys/devices/... */
        dir = pfs_create_dir(root, "devices", NULL, NULL, NULL, 0);
        pci = pfs_create_dir(dir, "pci0000:00", NULL, NULL, NULL, 0);

        devclass = devclass_find("root");
        if (devclass == NULL) {
                return (0);
        }

        dev = devclass_get_device(devclass, 0);
        linsysfs_run_bus(dev, pci, scsi, chardev, drm, "/pci0000:00", "0000");
...
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2024-04-07 20:20:54 UTC
Yes, this is kind of known.

I tried to look at the real linux sysfs layout for multi-domain machine, and
it was not simple change to add other top-level nodes for non-zero domains.
Worse, I was not able to find a definitive doc what is the expected layout
for sysfs in this case is (I tried to read in-tree linux doc folder).

Do you have a pointer to better explanation of the sysfs?
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2024-04-07 20:29:03 UTC
Created attachment 249811 [details]
Fix for the panic, but the layout does not match Linux

This is the patch I have.  As noted, the non-zero domains seems to be handled
wrong: real Linux does not create 000X top level nodes, and the 0000 node is
there only for legacy reasons.  Also I am not sure about HBA nodes.
Comment 3 Zeev Zilberman 2024-04-08 14:26:01 UTC
This seems like a good reference for sysfs: https://cromwell-intl.com/open-source/sysfs.html

For reference, on a Linux system with 2 domains, a switch connected to each domain and 2 devices behind one switch and 1 device behind the second switch it looks like following:

# tree -L 2 -F /sys/bus/pci*
/sys/bus/pci/
├── devices/
│   ├── 0003:00:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/
│   ├── 0003:01:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/
│   ├── 0003:02:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.0/
│   ├── 0003:02:00.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.1/
│   ├── 0003:02:00.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.2/
│   ├── 0003:02:00.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.3/
│   ├── 0003:02:00.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.4/
│   ├── 0003:02:00.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.5/
│   ├── 0003:02:00.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.6/
│   ├── 0003:02:00.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.7/
│   ├── 0003:02:01.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.0/
│   ├── 0003:02:01.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.1/
│   ├── 0003:02:01.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.2/
│   ├── 0003:02:01.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.3/
│   ├── 0003:02:01.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.4/
│   ├── 0003:02:01.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.5/
│   ├── 0003:02:01.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.6/
│   ├── 0003:02:01.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.7/
│   ├── 0003:02:02.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.0/
│   ├── 0003:02:02.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.1/
│   ├── 0003:02:02.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.2/
│   ├── 0003:02:02.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.3/
│   ├── 0003:02:02.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.4/
│   ├── 0003:02:02.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.5/
│   ├── 0003:02:02.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.6/
│   ├── 0003:02:02.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.7/
│   ├── 0003:02:03.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.0/
│   ├── 0003:02:03.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.1/
│   ├── 0003:02:03.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.2/
│   ├── 0003:02:03.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.3/
│   ├── 0003:02:03.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.4/
│   ├── 0003:02:03.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.5/
│   ├── 0003:02:03.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.6/
│   ├── 0003:02:03.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.7/
│   ├── 0003:02:04.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.0/
│   ├── 0003:02:04.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.1/
│   ├── 0003:02:04.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.2/
│   ├── 0003:02:04.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.3/
│   ├── 0003:02:04.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.4/
│   ├── 0003:02:04.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.5/
│   ├── 0003:02:04.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.6/
│   ├── 0003:02:04.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.7/
│   ├── 0003:02:05.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.0/
│   ├── 0003:02:05.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.1/
│   ├── 0003:02:05.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.2/
│   ├── 0003:02:05.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.3/
│   ├── 0003:02:05.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.4/
│   ├── 0003:02:05.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.5/
│   ├── 0003:02:05.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.6/
│   ├── 0003:02:05.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.7/
│   ├── 0003:02:06.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.0/
│   ├── 0003:02:06.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.1/
│   ├── 0003:02:06.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.2/
│   ├── 0003:02:06.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.3/
│   ├── 0003:02:06.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.4/
│   ├── 0003:02:06.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.5/
│   ├── 0003:02:06.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.6/
│   ├── 0003:02:06.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.7/
│   ├── 0003:02:07.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.0/
│   ├── 0003:02:07.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.1/
│   ├── 0003:02:07.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.2/
│   ├── 0003:02:07.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.3/
│   ├── 0003:02:07.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.4/
│   ├── 0003:02:07.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.5/
│   ├── 0003:02:07.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.6/
│   ├── 0003:02:07.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.7/
│   ├── 0003:02:08.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.0/
│   ├── 0003:02:08.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.1/
│   ├── 0003:02:08.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.2/
│   ├── 0003:02:08.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.3/
│   ├── 0003:02:08.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.4/
│   ├── 0003:02:08.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.5/
│   ├── 0003:02:08.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.6/
│   ├── 0003:02:08.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.7/
│   ├── 0003:02:09.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/
│   ├── 0003:02:09.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.1/
│   ├── 0003:02:09.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.2/
│   ├── 0003:02:09.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.3/
│   ├── 0003:02:09.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.4/
│   ├── 0003:02:09.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.5/
│   ├── 0003:02:09.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.6/
│   ├── 0003:02:09.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.7/
│   ├── 0003:03:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.0/0003:03:00.0/
│   ├── 0003:04:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.1/0003:04:00.0/
│   ├── 000b:00:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/
│   ├── 000b:01:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/
│   ├── 000b:02:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:00.0/
│   ├── 000b:02:01.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:01.0/
│   ├── 000b:02:02.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:02.0/
│   ├── 000b:02:03.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:03.0/
│   ├── 000b:02:04.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:04.0/
│   ├── 000b:02:05.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:05.0/
│   ├── 000b:02:06.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:06.0/
│   ├── 000b:02:07.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:07.0/
│   ├── 000b:02:08.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:08.0/
│   ├── 000b:02:09.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:09.0/
│   ├── 000b:02:0a.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0a.0/
│   ├── 000b:02:0b.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0b.0/
│   ├── 000b:02:0c.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0c.0/
│   ├── 000b:02:0d.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0d.0/
│   ├── 000b:02:0e.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0e.0/
│   ├── 000b:02:0f.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0f.0/
│   └── 000b:04:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:01.0/000b:04:00.0/
Comment 4 Colin Percival freebsd_committer freebsd_triage 2024-04-25 02:44:11 UTC
kib: Would it be possible to reproduce the linsysfs tree given the output zeev has provided?

zeev: Do you actually want linsysfs?  The backtrace looks like you're kldloading it; I'm not sure if this is deliberate or not since I can't imagine Amazon needing to run Linux binaries on FreeBSD.
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2024-04-25 21:23:22 UTC
(In reply to Colin Percival from comment #4)
In principle yes.  I would not have time for this until Nvidia needs the
fix again.
Comment 6 Zeev Zilberman 2024-05-13 07:12:17 UTC
Colin, no immediate need for linsysfs. I enabled it while trying to set up something in my test environment and then encountered this bug after fixing the issue reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278233
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2024-06-07 23:02:29 UTC
*** Bug 279581 has been marked as a duplicate of this bug. ***
Comment 8 Dave Cottlehuber freebsd_committer freebsd_triage 2024-06-08 10:00:41 UTC
also seeing this on ampere emag, with mlx5en nics thus linuxkpi also required

kib thanks this boots fine with the patch.
Comment 9 Dave Cottlehuber freebsd_committer freebsd_triage 2024-06-24 08:12:18 UTC
kib@ for stab week this would be a great fix to go in, if your patch is generic enough?
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2024-06-24 13:03:30 UTC
(In reply to Dave Cottlehuber from comment #9)
As I noted, the patch is wrong.  The resulting sysfs nodes' structure is
different from the structure observed on Linux.
Comment 11 Dave Cottlehuber freebsd_committer freebsd_triage 2024-07-22 22:04:17 UTC
this doesn't recur for me (ampere emag) since 9ae91f59c500, the latest CURRENT stab week.
Comment 12 Dave Cottlehuber freebsd_committer freebsd_triage 2025-02-27 09:13:55 UTC
back since #8a85584785e3 

linprocfs registered
panic: pfs_add_node(): homonymous siblings
cpuid = 12
time = 1740647518
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a0
panic() at panic+0x48
pfs_add_node() at pfs_add_node+0x158
pfs_create_dir() at pfs_create_dir+0xc0
linsysfs_run_bus() at linsysfs_run_bus+0x368
linsysfs_run_bus() at linsysfs_run_bus+0x23c
linsysfs_run_bus() at linsysfs_run_bus+0x23c
linsysfs_run_bus() at linsysfs_run_bus+0x23c
linsysfs_run_bus() at linsysfs_run_bus+0x23c
linsysfs_run_bus() at linsysfs_run_bus+0x23c
linsysfs_init() at linsysfs_init+0x1b4
pfs_init() at pfs_init+0xb0
vfs_modevent() at vfs_modevent+0x36c
module_register_init() at module_register_init+0xb4
linker_load_module() at linker_load_module+0xb0c
kern_kldload() at kern_kldload+0x18c
sys_kldload() at sys_kldload+0x6c
do_el0_sync() at do_el0_sync+0x608
handle_el0_sync() at handle_el0_sync+0x4c
--- exception, esr 0x56000000
KDB: enter: panic
[ thread pid 153 tid 101601 ]
Stopped at      kdb_enter+0x48: str     xzr, [x19, #2048]
db>