When booting on a system with 2 pci domains (3 and 11) and linux compatibility layer enabled on arm64 coherent system, crash was observed with the following signature: Mounting local filesystems:.M panic: pfs_add_node(): homonymous siblingsM cpuid = 150M time = 1711898170M KDB: stack backtrace:M db_trace_self() at db_trace_selfM db_trace_self_wrapper() at db_trace_self_wrapper+0x38M vpanic() at vpanic+0x1a4M panic() at panic+0x48M pfs_add_node() at pfs_add_node+0x16cM pfs_create_dir() at pfs_create_dir+0xc0M linsysfs_run_bus() at linsysfs_run_bus+0x36cM linsysfs_run_bus() at linsysfs_run_bus+0x23cM linsysfs_run_bus() at linsysfs_run_bus+0x23cM linsysfs_run_bus() at linsysfs_run_bus+0x23cM linsysfs_run_bus() at linsysfs_run_bus+0x23cM linsysfs_run_bus() at linsysfs_run_bus+0x23cM linsysfs_init() at linsysfs_init+0x1b4M pfs_init() at pfs_init+0xb0M vfs_modevent() at vfs_modevent+0x368M module_register_init() at module_register_init+0xb4M linker_load_module() at linker_load_module+0xac8M kern_kldload() at kern_kldload+0x190M sys_kldload() at sys_kldload+0x64M do_el0_sync() at do_el0_sync+0x59cM handle_el0_sync() at handle_el0_sync+0x48M --- exception, esr 0x56000000M KDB: enter: panicM [ thread pid 115 tid 101679 ]M Stopped at kdb_enter+0x4c: str xzr, [x19, #1408]M This is caused by sys/compat/linsysfs/linsysfs.c:linsysfs_init() hardcoding pci domain 0: ... /* /sys/devices/... */ dir = pfs_create_dir(root, "devices", NULL, NULL, NULL, 0); pci = pfs_create_dir(dir, "pci0000:00", NULL, NULL, NULL, 0); devclass = devclass_find("root"); if (devclass == NULL) { return (0); } dev = devclass_get_device(devclass, 0); linsysfs_run_bus(dev, pci, scsi, chardev, drm, "/pci0000:00", "0000"); ...
Yes, this is kind of known. I tried to look at the real linux sysfs layout for multi-domain machine, and it was not simple change to add other top-level nodes for non-zero domains. Worse, I was not able to find a definitive doc what is the expected layout for sysfs in this case is (I tried to read in-tree linux doc folder). Do you have a pointer to better explanation of the sysfs?
Created attachment 249811 [details] Fix for the panic, but the layout does not match Linux This is the patch I have. As noted, the non-zero domains seems to be handled wrong: real Linux does not create 000X top level nodes, and the 0000 node is there only for legacy reasons. Also I am not sure about HBA nodes.
This seems like a good reference for sysfs: https://cromwell-intl.com/open-source/sysfs.html For reference, on a Linux system with 2 domains, a switch connected to each domain and 2 devices behind one switch and 1 device behind the second switch it looks like following: # tree -L 2 -F /sys/bus/pci* /sys/bus/pci/ ├── devices/ │ ├── 0003:00:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/ │ ├── 0003:01:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/ │ ├── 0003:02:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.0/ │ ├── 0003:02:00.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.1/ │ ├── 0003:02:00.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.2/ │ ├── 0003:02:00.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.3/ │ ├── 0003:02:00.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.4/ │ ├── 0003:02:00.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.5/ │ ├── 0003:02:00.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.6/ │ ├── 0003:02:00.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.7/ │ ├── 0003:02:01.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.0/ │ ├── 0003:02:01.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.1/ │ ├── 0003:02:01.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.2/ │ ├── 0003:02:01.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.3/ │ ├── 0003:02:01.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.4/ │ ├── 0003:02:01.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.5/ │ ├── 0003:02:01.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.6/ │ ├── 0003:02:01.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:01.7/ │ ├── 0003:02:02.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.0/ │ ├── 0003:02:02.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.1/ │ ├── 0003:02:02.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.2/ │ ├── 0003:02:02.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.3/ │ ├── 0003:02:02.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.4/ │ ├── 0003:02:02.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.5/ │ ├── 0003:02:02.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.6/ │ ├── 0003:02:02.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:02.7/ │ ├── 0003:02:03.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.0/ │ ├── 0003:02:03.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.1/ │ ├── 0003:02:03.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.2/ │ ├── 0003:02:03.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.3/ │ ├── 0003:02:03.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.4/ │ ├── 0003:02:03.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.5/ │ ├── 0003:02:03.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.6/ │ ├── 0003:02:03.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:03.7/ │ ├── 0003:02:04.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.0/ │ ├── 0003:02:04.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.1/ │ ├── 0003:02:04.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.2/ │ ├── 0003:02:04.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.3/ │ ├── 0003:02:04.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.4/ │ ├── 0003:02:04.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.5/ │ ├── 0003:02:04.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.6/ │ ├── 0003:02:04.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:04.7/ │ ├── 0003:02:05.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.0/ │ ├── 0003:02:05.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.1/ │ ├── 0003:02:05.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.2/ │ ├── 0003:02:05.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.3/ │ ├── 0003:02:05.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.4/ │ ├── 0003:02:05.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.5/ │ ├── 0003:02:05.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.6/ │ ├── 0003:02:05.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:05.7/ │ ├── 0003:02:06.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.0/ │ ├── 0003:02:06.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.1/ │ ├── 0003:02:06.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.2/ │ ├── 0003:02:06.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.3/ │ ├── 0003:02:06.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.4/ │ ├── 0003:02:06.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.5/ │ ├── 0003:02:06.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.6/ │ ├── 0003:02:06.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:06.7/ │ ├── 0003:02:07.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.0/ │ ├── 0003:02:07.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.1/ │ ├── 0003:02:07.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.2/ │ ├── 0003:02:07.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.3/ │ ├── 0003:02:07.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.4/ │ ├── 0003:02:07.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.5/ │ ├── 0003:02:07.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.6/ │ ├── 0003:02:07.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:07.7/ │ ├── 0003:02:08.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.0/ │ ├── 0003:02:08.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.1/ │ ├── 0003:02:08.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.2/ │ ├── 0003:02:08.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.3/ │ ├── 0003:02:08.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.4/ │ ├── 0003:02:08.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.5/ │ ├── 0003:02:08.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.6/ │ ├── 0003:02:08.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:08.7/ │ ├── 0003:02:09.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.0/ │ ├── 0003:02:09.1 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.1/ │ ├── 0003:02:09.2 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.2/ │ ├── 0003:02:09.3 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.3/ │ ├── 0003:02:09.4 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.4/ │ ├── 0003:02:09.5 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.5/ │ ├── 0003:02:09.6 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.6/ │ ├── 0003:02:09.7 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:09.7/ │ ├── 0003:03:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.0/0003:03:00.0/ │ ├── 0003:04:00.0 -> ../../../devices/pci0003:00/0003:00:00.0/0003:01:00.0/0003:02:00.1/0003:04:00.0/ │ ├── 000b:00:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/ │ ├── 000b:01:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/ │ ├── 000b:02:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:00.0/ │ ├── 000b:02:01.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:01.0/ │ ├── 000b:02:02.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:02.0/ │ ├── 000b:02:03.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:03.0/ │ ├── 000b:02:04.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:04.0/ │ ├── 000b:02:05.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:05.0/ │ ├── 000b:02:06.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:06.0/ │ ├── 000b:02:07.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:07.0/ │ ├── 000b:02:08.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:08.0/ │ ├── 000b:02:09.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:09.0/ │ ├── 000b:02:0a.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0a.0/ │ ├── 000b:02:0b.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0b.0/ │ ├── 000b:02:0c.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0c.0/ │ ├── 000b:02:0d.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0d.0/ │ ├── 000b:02:0e.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0e.0/ │ ├── 000b:02:0f.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:0f.0/ │ └── 000b:04:00.0 -> ../../../devices/pci000b:00/000b:00:00.0/000b:01:00.0/000b:02:01.0/000b:04:00.0/
kib: Would it be possible to reproduce the linsysfs tree given the output zeev has provided? zeev: Do you actually want linsysfs? The backtrace looks like you're kldloading it; I'm not sure if this is deliberate or not since I can't imagine Amazon needing to run Linux binaries on FreeBSD.
(In reply to Colin Percival from comment #4) In principle yes. I would not have time for this until Nvidia needs the fix again.
Colin, no immediate need for linsysfs. I enabled it while trying to set up something in my test environment and then encountered this bug after fixing the issue reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278233
*** Bug 279581 has been marked as a duplicate of this bug. ***
also seeing this on ampere emag, with mlx5en nics thus linuxkpi also required kib thanks this boots fine with the patch.
kib@ for stab week this would be a great fix to go in, if your patch is generic enough?
(In reply to Dave Cottlehuber from comment #9) As I noted, the patch is wrong. The resulting sysfs nodes' structure is different from the structure observed on Linux.
this doesn't recur for me (ampere emag) since 9ae91f59c500, the latest CURRENT stab week.
back since #8a85584785e3 linprocfs registered panic: pfs_add_node(): homonymous siblings cpuid = 12 time = 1740647518 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 pfs_add_node() at pfs_add_node+0x158 pfs_create_dir() at pfs_create_dir+0xc0 linsysfs_run_bus() at linsysfs_run_bus+0x368 linsysfs_run_bus() at linsysfs_run_bus+0x23c linsysfs_run_bus() at linsysfs_run_bus+0x23c linsysfs_run_bus() at linsysfs_run_bus+0x23c linsysfs_run_bus() at linsysfs_run_bus+0x23c linsysfs_run_bus() at linsysfs_run_bus+0x23c linsysfs_init() at linsysfs_init+0x1b4 pfs_init() at pfs_init+0xb0 vfs_modevent() at vfs_modevent+0x36c module_register_init() at module_register_init+0xb4 linker_load_module() at linker_load_module+0xb0c kern_kldload() at kern_kldload+0x18c sys_kldload() at sys_kldload+0x6c do_el0_sync() at do_el0_sync+0x608 handle_el0_sync() at handle_el0_sync+0x4c --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 153 tid 101601 ] Stopped at kdb_enter+0x48: str xzr, [x19, #2048] db>