I am unable to complete a boot of a old PowerMac G5 so-called "Quad Core" under the 32-bit powerpc FreeBSD version head -r320482 . (It was a large jump to that from my prior version.) It looks like the 2 anonymous structs in the union in the new "struct socket" are being abused such that the ->sol_upcall from the 2nd struct is being access when it has a value that was apparently assigned via ->so_rcv->sb_sel in the first anonymous struct. [Manually entered from camera pictures of the screen but with notes added.] fatal kernel trap exception = 0x700 (program) (for "illegal instruction") srr0 = 0x70bf878 (note: this varies, for example: 0x5e37230) (note: r0 always matches srr0) (note: ctr always matches srr0) srr1 = 0x89032 (stays the same) lr = 0x5b7b94 (note: solisten_wakeup+0x4c) (stays the same) curthread = 0x5ab8ae0 (varies) pid = 920 (varies), comm = mountd (stays the same) Tracing command mountd pid 920 tid 100119 (varies) td 0x5ab8ae0 (varies)(CPU 1) (stack addr range varies) 0xd250a500: at soisconnected+0x21c (at stays the same) 0xd250a540: at unp_connect2+0xf0 (at stays the same) 0xd250a560: at unp_connectat+0x658 (at stays the same) 0xd250a770: at unp_connect+0x2c (at stays the same) 0xd250a790: at uipc_connect+0xc0 (at stays the same) 0xd250a7d0: at soconnectat+0xa0 (at stays the same) 0xd250a800: at soconnect+0x2c (at stays the same) 0xd250a820: at kern_connect+0134 (at stays the same) 0xd250a870: at sys_connect+0x64 (at stays the same) 0xd250a8b0: at trap+0x638 (at stays the same) 0xd250aa50: at powerpc_interrupt+0x1a0 (at stays the same) 0xd250aa80: at user SC trap (at stays the same) by 0x419db168 (stays the same) srr1=0xf032 (stays the same) r1 =0xffffd5e0 (stays the same) cr =0x24440840 (stays the same) xer =0x20000000 (stays the same) ctr =0x419db160 (stays the same) 005b7b84 <solisten_wakeup+0x3c> lwz r4,236(r3) 005b7b88 <solisten_wakeup+0x40> li r5,1 005b7b8c <solisten_wakeup+0x44> mtctr r0 005b7b90 <solisten_wakeup+0x48> bctrl lr: 005b7b94 <solisten_wakeup+0x4c> b 005b7bb4 <solisten_wakeup+0x6c> Note: r3 reported as: 0x70bf860 void solisten_wakeup(struct socket *sol) { if (sol->sol_upcall != NULL) (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT); else { selwakeuppri(&sol->so_rdsel, PSOCK); KNOTE_LOCKED(&sol->so_rdsel.si_note, 0); } SOLISTEN_UNLOCK(sol); wakeup_one(&sol->sol_comp); } [Note: I've had to do some work to get a kgdb working this much on powerpc. This is not from a minidump.] (kgdb) print/x &((struct socket*)0x70bf860)->sol_upcall $3 = 0x70bf948 (kgdb) print/x ((struct socket*)0x70bf860)->sol_upcall $2 = 0x70bf878 (That is the address of the illegal instruction reported.) (kgdb) print/x &((struct socket*)0x70bf860)->so_rdsel $7 = 0x70bf878 (kgdb) print/x &((struct socket*)0x70bf860)->so_rdsel.si_tdlist $8 = 0x70bf878 (kgdb) print/x &((struct socket*)0x70bf860)->so_rdsel.si_tdlist.tqh_first $9 = 0x70bf878 But comparing to the first anonymous struct in the union in the new "struct socket": (kgdb) print/x &((struct socket*)0x70bf860)->sol_upcall $15 = 0x70bf948 (kgdb) print/x &((struct socket*)0x70bf860)->so_rcv->sb_sel $22 = 0x70bf948 ->so_rcv is a struct sockbuf and ->so_rcv->sb_sel is a struct slinfo* . So ->so_rcv->sb_sel pointing back to ->so_rdsel might well make sense for that struct in the union. But it appears to be the source of the 32-bit powerpc crash during an attempted use of ->sol_upcall as well.
(In reply to Mark Millard from comment #0) Some other supporting code details follow. static struct socket * soalloc(struct vnet *vnet) { struct socket *so; so = uma_zalloc(socket_zone, M_NOWAIT | M_ZERO); . . . so->so_rcv.sb_sel = &so->so_rdsel; so->so_snd.sb_sel = &so->so_wrsel; . . . That so->so_rcv.sb_sel assignment makes so->sol_upcall non-NULL and so appear to be defined for use. And that makes the following code problematical: void solisten_wakeup(struct socket *sol) { if (sol->sol_upcall != NULL) (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT); else { . . . And this code is what is failing on production 32-bit powerpc kernels. There could be more anonymous struct field problems in the union that is in struct socket . I've not checked. I'll note that the only references to sol_upcall are: # grep -r "\<sol_upcall" /usr/src/sys/* | more /usr/src/sys/kern/uipc_socket.c: if (sol->sol_upcall != NULL) /usr/src/sys/kern/uipc_socket.c: (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT); /usr/src/sys/kern/uipc_socket.c: so->sol_upcall = func; /usr/src/sys/kern/uipc_socket.c: so->sol_upcallarg = arg; /usr/src/sys/sys/socketvar.h: so_upcall_t *sol_upcall; /* (e) */ /usr/src/sys/sys/socketvar.h: void *sol_upcallarg; /* (e) */ None of those assign NULL. If NULL was assigned then ->so_rcv.sb_sel would also become NULL in value.
(In reply to Mark Millard from comment #1) FYI: I do understand that the specific aliasing I list may well be 32-bit powerpc specific. But other aliasing between the anonymous structs in that union is just as likely to be problematical for the sol_upcall field of the second anonymnous struct.
(In reply to Mark Millard from comment #2) It looks like the third head i386 -r320202 backtrace in: https://lists.freebsd.org/pipermail/freebsd-current/2017-June/066323.html might also be from the anonymous struct field aliasing in the new union in struct socket and the mishandling of keeping things from interfering with each other when aliased. At least some of the same call chain shows in the backtrace in that report.
I'm observing similar panics on i386 CURRENT r310466 __curthread () at ./machine/pcpu.h:225 225 __asm("movl %%fs:%1,%0" : "=r" (td) (kgdb) #0 __curthread () at ./machine/pcpu.h:225 #1 doadump (textdump=-968633856) at ../../../kern/kern_shutdown.c:318 #2 0xc06e88c4 in kern_reboot (howto=<optimized out>) at ../../../kern/kern_shutdown.c:386 #3 0xc06e8c5b in vpanic (fmt=<optimized out>, ap=0xefd5c73c "\340\334\235\300\310\370\266\306\001") at ../../../kern/kern_shutdown.c:779 #4 0xc06e8b1b in panic (fmt=0xc092e18e "%s") at ../../../kern/kern_shutdown.c:710 #5 0xc08eed21 in trap_fatal (frame=0xefd5c878, eva=<optimized out>) at ../../../i386/i386/trap.c:978 #6 0xc08eea38 in trap (frame=<optimized out>) at ../../../i386/i386/trap.c:704 #7 <signal handler called> #8 0xc6bcda1b in ?? () #9 0xc0770281 in unp_connect2 (so=<optimized out>, so2=<optimized out>, req=<optimized out>) at ../../../kern/uipc_usrreq.c:1497 #10 0xc076ff17 in unp_connectat (fd=<optimized out>, so=<optimized out>, nam=<optimized out>, td=<optimized out>) at ../../../kern/uipc_usrreq.c:1446 #11 0xc076d510 in unp_connect (so=0xc71c9400, nam=0xc662d500, td=<optimized out>) at ../../../kern/uipc_usrreq.c:1310 #12 uipc_connect (so=0xc71c9400, nam=0xc662d500, td=<optimized out>) at ../../../kern/uipc_usrreq.c:587 #13 0xc076a042 in kern_connectat (td=<optimized out>, dirfd=-100, fd=<optimized out>, sa=0xc662d500) at ../../../kern/uipc_syscalls.c:505 #14 0xc0769f49 in sys_connect (td=0xc6bcda18, uap=0xc6b6f988) at ../../../kern/uipc_syscalls.c:470 #15 0xc08ef679 in syscallenter (td=<optimized out>) at ../../../i386/i386/../../kern/subr_syscall.c:132 #16 syscall (frame=<optimized out>) at ../../../i386/i386/trap.c:1103 #17 <signal handler called> #18 0x283a4747 in ?? () Backtrace stopped: Cannot access memory at address 0xbfbfe794 (kgdb)
(In reply to Mark Millard from comment #3) It is a separate issue but I'll note that (some?) C++ compilers reject the use of the enum syntax that is in one of the anonymous struct's in the union. A report about this appeared on the lists before I figured out what I reported in 220404. Likely the header in question should be usable from C and from C++. (In reply to oleg.nauman from comment #4)
(In reply to oleg.nauman from comment #4) The code that I've report predates the i386 listing reported "HEAD/i386 r320212" reference: -r319722 and so could be involved. But it can not be involved for something from: "I'm observing similar panics on i386 CURRENT r310466" May be -r320466 was intended in comment #4? A typo? But if the -r310466 is correct then it is a different problem for comment #4.
(In reply to Mark Millard from comment #6) i386 CURRENT r320466 , of course
I am wondering if this is related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220452
(In reply to Sylvain Garrigues from comment #8) Your 220452 is likely a duplicate of this 220404 report: just another example. I've only done detailed analysis for the 32-bit powerpc context.
I can confirm that r319722 causing panics on i386 , r319721 is stable
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220358 seems related too
I've changed Hardware to Any since the problem is observed on each of: 32-bit powerpc, armv6/7, and i386.
Please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220452 and comment #5 and #8 - i.e. the kernel panic seems to disappear when INVARIANTS is used in the kernel config, or more precisely when one #ifdef INVARIANTS is removed in sys/kern/uipc_socket.c
Related symptom that dtrace can't figure out the struct? lockstat: failed to compile program: "/usr/lib/dtrace/tcp.d", line 239: so_snd is not a member of struct socket
Despite symptom differences the same fix seems to cover 220358 and this (220404) as I understand. 220452 is another. 220358 was first so mark this one as a duplicate of 220358. head -r320752 is a checked in fix for these reports. Side note: head -r320652 's checkin says in part: Verifying the offset locations mentioned above are identical is left as an exercise to the reader. This report has an example of doing so. *** This bug has been marked as a duplicate of bug 220358 ***