I'm testing -ALPHA3 on a packet.net ThunderX. When I boot GENERIC-NODEBUG, the kernel panics right about the time it gets to the login prompt: (kgdb) bt #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:366 #1 0xffff00000018f520 in db_dump (dummy=-281474967580032, dummy2=false, dummy3=-1, dummy4=0xffff00014d3cdb4c "") at /usr/src/sys/ddb/db_command.c:574 #2 0xffff00000018f298 in db_command (last_cmdp=0xffff000001018258 <db_last_command>, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:481 #3 0xffff00000018edc8 in db_command_loop () at /usr/src/sys/ddb/db_command.c:534 #4 0xffff0000001951e0 in db_trap (type=37, code=0) at /usr/src/sys/ddb/db_main.c:252 #5 0xffff0000007050c0 in kdb_trap (type=37, code=0, tf=0xffff00014d3ce1e0) at /usr/src/sys/kern/subr_kdb.c:693 #6 0xffff000000c8bec8 in data_abort (td=0xfffffd006112f000, frame=0xffff00014d3ce1e0, esr=2516582404, far=16777259, lower=0) at /usr/src/sys/arm64/arm64/trap.c:261 #7 0xffff000000c8b858 in do_el1h_sync (td=0xfffffd006112f000, frame=0xffff00014d3ce1e0) at /usr/src/sys/arm64/arm64/trap.c:341 #8 <signal handler called> #9 0xffff0000008b5280 in in_pcbremlbgrouphash (inp=0xfffffd00e975a9b0) at /usr/src/sys/netinet/in_pcb.c:414 #10 0xffff0000008b504c in in_pcbdrop (inp=0xfffffd00e975a9b0) at /usr/src/sys/netinet/in_pcb.c:1687 #11 0xffff0000009d4eb4 in tcp_close (tp=0xfffffd00e975d3d0) at /usr/src/sys/netinet/tcp_subr.c:1991 #12 0xffff0000009c13c0 in tcp_do_segment (m=0xfffffd0049dfe100, th=0xfffffd0049e6b0a8, so=0xfffffd007bbfd000, tp=0xfffffd00e975d3d0, drop_hdrlen=52, tlen=31, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:2306 #13 0xffff0000009be02c in tcp_input (mp=0xffff00014d3ceff8, offp=0xffff00014d3cefd0, proto=6) at /usr/src/sys/netinet/tcp_input.c:1392 #14 0xffff0000008c203c in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:827 #15 0xffff000000877330 in netisr_dispatch_src (proto=1, source=0, m=0xfffffd0049dfe100) at /usr/src/sys/net/netisr.c:1122 #16 0xffff000000877ac4 in netisr_dispatch (proto=1, m=0xfffffd0049dfe100) at /usr/src/sys/net/netisr.c:1213 #17 0xffff0000008468a0 in ether_demux (ifp=0xfffffd0049a02000, m=0xfffffd0049dfe100) at /usr/src/sys/net/if_ethersubr.c:874 #18 0xffff000000848fbc in ether_input_internal (ifp=0xfffffd0049a02000, m=0xfffffd0049dfe100) at /usr/src/sys/net/if_ethersubr.c:662 #19 0xffff0000008487e0 in ether_nh_input (m=0xfffffd0049dfe100) at /usr/src/sys/net/if_ethersubr.c:692 #20 0xffff000000877330 in netisr_dispatch_src (proto=5, source=0, m=0xfffffd0049dfe100) at /usr/src/sys/net/netisr.c:1122 #21 0xffff000000877ac4 in netisr_dispatch (proto=5, m=0xfffffd0049dfe100) at /usr/src/sys/net/netisr.c:1213 #22 0xffff000000847100 in ether_input (ifp=0xfffffd00498e4800, m=0xfffffd0049dfe100) at /usr/src/sys/net/if_ethersubr.c:782 #23 0xffff0000009c5d6c in tcp_lro_flush (lc=0xffff000149546788, le=0xfffffd000ae25bf0) at /usr/src/sys/netinet/tcp_lro.c:397 #24 0xffff0000009c6c78 in tcp_lro_rx2 (lc=0xffff000149546788, m=0xfffffd0049dfe000, csum=56586, use_hash=1) at /usr/src/sys/netinet/tcp_lro.c:785 #25 0xffff0000009c7414 in tcp_lro_rx (lc=0xffff000149546788, m=0xfffffd0049dfe000, csum=0) at /usr/src/sys/netinet/tcp_lro.c:952 #26 0xffff000000ce1b80 in nicvf_rcv_pkt_handler (nic=0xfffffd00330d1000, cq=0xffff000149547480, cqe_rx=0xffff00016f402800, cqe_type=2) at /usr/src/sys/dev/vnic/nicvf_queues.c:678 #27 0xffff000000ce181c in nicvf_cq_intr_handler (nic=0xfffffd00330d1000, cq_idx=4 '\004') at /usr/src/sys/dev/vnic/nicvf_queues.c:774 #28 0xffff000000ce1424 in nicvf_cmp_task (arg=0xffff000149547480, pending=1) at /usr/src/sys/dev/vnic/nicvf_queues.c:887 #29 0xffff00000072817c in taskqueue_run_locked (queue=0xfffffd004b261800) at /usr/src/sys/kern/subr_taskqueue.c:465 #30 0xffff00000072a304 in taskqueue_thread_loop (arg=0xffff000149547500) at /usr/src/sys/kern/subr_taskqueue.c:757 #31 0xffff00000061d680 in fork_exit (callout=0xffff00000072a1a4 <taskqueue_thread_loop>, arg=0xffff000149547500, frame=0xffff00014d3cf960) at /usr/src/sys/kern/kern_fork.c:1057 #32 <signal handler called> Interestingly, the panic does not occur under GENERIC. It does occur if I recompile GENERIC-NODEBUG with -O0, so I'm able to get a usable kernel dump. Clearly "grp" is a bogus pointer, but it's not clear where it comes from: (kgdb) frame 9 #9 0xffff0000008b5280 in in_pcbremlbgrouphash (inp=0xfffffd00e975a9b0) at /usr/src/sys/netinet/in_pcb.c:414 414 for (i = 0; i < grp->il_inpcnt; ++i) { (kgdb) info local pcbinfo = 0xffff0000e9851820 hdr = 0xffff000148a3bbb0 grp = 0xffffff i = 0 (kgdb) p *hdr $1 = {lh_first = 0x0}
It looks like the lbgroup hash table is getting corrupted; many of the list heads are equal to 0xffffff00ffffff or 0xffffff. Nothing on the system actually uses SO_REUSEPORT_LB, so we shouldn't be inserting any hash table entries. I tried making the hash-table read-only using pmap_protect(), but that doesn't seem to catch the problem - the system still panics the same way. This plus the fact that the bug is apparently sensitive to memory layout (goes away when compiling with GENERIC or when increasing KSTACK_PAGES) makes it seem like this isn't a generic kernel bug. That said, there are some bugs in the SO_REUSEPORT_LB implementation: - Lookups are protected with epoch, but the hash table doesn't use CK_ lists and we don't defer frees of the hash table entries. - in_pcblbgroup_free() uses the wrong malloc type. - Lots of style bugs.
I discovered that the hash table is "corrupted" immediately after it is allocated and initialized. In my case, the table is allocated starting at physical address 0x10000000 and is physically contiguous. It appears that this collides with an address range used by the vgapci device: pcib9 Device Memory: 0x87e0c0000000-0x87e0c0ffffff pci9 PCI domain 0 bus numbers: 32 pcib10 PCI domain 0 bus numbers: 33 PEM PCIe Memory: 0x10000000-0x110fffff PEM PCIe IO: 0x0-0xfff pci10 pcib10 bus numbers: 33 vgapci0 pcib10 memory window: 0x10000000-0x10ffffff 0x11000000-0x1101ffff Indeed, this range isn't excluded from the EFI map: Type Physical Virtual #Pages Attr RuntimeServicesData 000000500000 500000 00000800 UC WC WT WB RUNTIME ConventionalMemory 000000d00000 0 000ff2ec UC WC WT WB RuntimeServicesData 0000fffec000 fffec000 00000014 UC WC WT WB RUNTIME ConventionalMemory 000100000000 0 00ef0100 UC WC WT WB BootServicesData 000ff0100000 0 00000020 UC WC WT WB ConventionalMemory 000ff0120000 0 0000eacc UC WC WT WB BootServicesData 000ffebec000 0 00000514 UC WC WT WB ConventionalMemory 010000400000 0 00fea102 UC WC WT WB LoaderData 010fea502000 0 00008001 UC WC WT WB LoaderCode 010ff2503000 0 00000086 UC WC WT WB LoaderData 010ff2589000 0 0000218b UC WC WT WB LoaderCode 010ff4714000 0 00000015 UC WC WT WB BootServicesData 010ff4729000 0 0000965d UC WC WT WB ConventionalMemory 010ffdd86000 0 000001de UC WC WT WB BootServicesCode 010ffdf64000 0 00000779 UC WC WT WB ConventionalMemory 010ffe6dd000 0 00000040 UC WC WT WB ACPIReclaimMemory 010ffe71d000 0 0000000b UC WC WT WB ACPIMemoryNVS 010ffe728000 0 00000060 UC WC WT WB RuntimeServicesData 010ffe788000 10ffe788000 00000c30 UC WC WT WB RUNTIME RuntimeServicesCode 010fff3b8000 10fff3b8000 00000c47 UC WC WT WB RUNTIME BootServicesData 010ffffff000 0 00000001 UC WC WT WB MemoryMappedIO 803000000000 803000000000 00001000 UC RUNTIME MemoryMappedIO 804000001000 804000001000 00002000 UC RUNTIME MemoryMappedIO 87e006001000 87e006001000 00001000 UC RUNTIME MemoryMappedIO 87e024000000 87e024000000 00001000 UC RUNTIME MemoryMappedIO 87e0d0001000 87e0d0001000 00000001 UC RUNTIME MemoryMappedIO 903000000000 903000000000 00001000 UC RUNTIME MemoryMappedIO 904000001000 904000001000 00002000 UC RUNTIME MemoryMappedIO 97e006001000 97e006001000 00001000 UC RUNTIME Physical memory chunk(s): 0x00500000 - 0xfff0fffff, 65516 MB (16772096 pages) 0x10000400000 - 0x10ffe71cfff, 65507 MB (16769821 pages) 0x10ffe788000 - 0x10fff3b7fff, 12 MB ( 3120 pages) 0x10ffffff000 - 0x10fffffffff, 0 MB ( 1 pages) Excluded memory regions: 0x00500000 - 0x00cfffff, 8 MB ( 2048 pages) NoAlloc 0xfffec000 - 0xffffffff, 0 MB ( 20 pages) NoAlloc 0x10fea600000 - 0x10fec38cfff, 29 MB ( 7565 pages) NoAlloc 0x10ffe71d000 - 0x10fffffefff, 24 MB ( 6370 pages) NoAlloc 0x803000000000 - 0x803000ffffff, 16 MB ( 4096 pages) NoAlloc 0x804000001000 - 0x804002000fff, 32 MB ( 8192 pages) NoAlloc 0x87e006001000 - 0x87e007000fff, 16 MB ( 4096 pages) NoAlloc 0x87e024000000 - 0x87e024ffffff, 16 MB ( 4096 pages) NoAlloc 0x87e0d0001000 - 0x87e0d0001fff, 0 MB ( 1 pages) NoAlloc 0x903000000000 - 0x903000ffffff, 16 MB ( 4096 pages) NoAlloc
You can try excluding that physical memory from the map by calling arm_physmem_exclude_region from initarm. It takes the start address, length, and the EXFLAG_NOALLOC flag, e.g. arm_physmem_exclude_region(0x10000000, 0x01020000, EXFLAG_NOALLOC). If you put it just after pmap_bootstrap we can still access the memory via the DMAP to see what data is being put into the range.
(In reply to Andrew Turner from comment #3) Indeed, that works around the panic. (I actually had to bump MAX_EXCNT for this to work.) I'm not sure how best to handle this for 12.0 though.
It seems that the issue is really just that the EFI framebuffer is included as ConventionalMemory in the EFI map. It seems to be excluded from the map on the x86 EFI systems that I can easily check. Anyway, this seems easy enough to work around.
https://reviews.freebsd.org/D17073
A commit references this bug: Author: markj Date: Sat Sep 8 21:51:47 UTC 2018 New revision: 338537 URL: https://svnweb.freebsd.org/changeset/base/338537 Log: Bump MAX_HWCNT and MAX_EXCNT. These limits are hit on the ThunderX. Also make arm_physmem_exclude_region() panic rather than fail silently if the limit on excluded regions is reached. PR: 231064 Reviewed by: andrew Approved by: re (kib) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17073 Changes: head/sys/arm/arm/physmem.c
A commit references this bug: Author: markj Date: Sat Sep 8 21:52:45 UTC 2018 New revision: 338538 URL: https://svnweb.freebsd.org/changeset/base/338538 Log: Exclude the EFI framebuffer from phys_avail[] on arm64. On the ThunderX the region occupied by the framebuffer is included in the EFI map, so explicitly add it to the set of regions that aren't managed by the physical memory allocator. PR: 231064 Reviewed by: andrew Approved by: re (gjb) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17073 Changes: head/sys/arm64/arm64/machdep.c
A commit references this bug: Author: markj Date: Sat Sep 15 18:02:28 UTC 2018 New revision: 338695 URL: https://svnweb.freebsd.org/changeset/base/338695 Log: MFC r338538: Exclude the EFI framebuffer from phys_avail[] on arm64. PR: 231064 Changes: _U stable/11/ stable/11/sys/arm64/arm64/machdep.c
A commit references this bug: Author: markj Date: Sat Sep 15 18:47:08 UTC 2018 New revision: 338696 URL: https://svnweb.freebsd.org/changeset/base/338696 Log: Revert r338695: it depends on r334032, which was not MFCed. PR: 231064 Changes: _U stable/11/ stable/11/sys/arm64/arm64/machdep.c