Bug 220404 - head -r319722 related changes break powerpc production-style kernel operation: bad function pointer (related i386 and armv6/7 issue likely too)
Summary: head -r319722 related changes break powerpc production-style kernel operation...
Status: Closed DUPLICATE of bug 220358
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-01 01:48 UTC by Mark Millard
Modified: 2017-07-04 19:36 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Millard 2017-07-01 01:48:33 UTC
I am unable to complete a boot of a
old PowerMac G5 so-called "Quad Core"
under the 32-bit powerpc FreeBSD
version head -r320482 . (It was a
large jump to that from my prior
version.)

It looks like the 2 anonymous structs
in the union in the new "struct socket"
are being abused such that the ->sol_upcall
from the 2nd struct is being access when it
has a value that was apparently assigned
via ->so_rcv->sb_sel in the first
anonymous struct.

[Manually entered from camera pictures
of the screen but with notes added.]

fatal kernel trap
exception = 0x700 (program) (for "illegal instruction")
srr0      = 0x70bf878 (note: this varies, for example: 0x5e37230)
           (note:  r0 always matches srr0)
           (note: ctr always matches srr0)
srr1      = 0x89032   (stays the same)
lr        = 0x5b7b94  (note: solisten_wakeup+0x4c) (stays the same)
curthread = 0x5ab8ae0 (varies)
pid = 920 (varies), comm = mountd (stays the same)

Tracing command mountd pid 920 tid 100119 (varies) td 0x5ab8ae0 (varies)(CPU 1)
(stack addr
range varies)
0xd250a500: at soisconnected+0x21c     (at stays the same)
0xd250a540: at unp_connect2+0xf0       (at stays the same)
0xd250a560: at unp_connectat+0x658     (at stays the same)
0xd250a770: at unp_connect+0x2c        (at stays the same)
0xd250a790: at uipc_connect+0xc0       (at stays the same)
0xd250a7d0: at soconnectat+0xa0        (at stays the same)
0xd250a800: at soconnect+0x2c          (at stays the same)
0xd250a820: at kern_connect+0134       (at stays the same)
0xd250a870: at sys_connect+0x64        (at stays the same)
0xd250a8b0: at trap+0x638              (at stays the same)
0xd250aa50: at powerpc_interrupt+0x1a0 (at stays the same)
0xd250aa80: at user SC trap (at stays the same)
           by 0x419db168   (stays the same)
           srr1=0xf032     (stays the same)
           r1  =0xffffd5e0 (stays the same)
           cr  =0x24440840 (stays the same)
           xer =0x20000000 (stays the same)
           ctr =0x419db160 (stays the same)

005b7b84 <solisten_wakeup+0x3c> lwz     r4,236(r3)
005b7b88 <solisten_wakeup+0x40> li      r5,1
005b7b8c <solisten_wakeup+0x44> mtctr   r0
005b7b90 <solisten_wakeup+0x48> bctrl
lr:
005b7b94 <solisten_wakeup+0x4c> b       005b7bb4 <solisten_wakeup+0x6c>


Note: r3 reported as: 0x70bf860

void
solisten_wakeup(struct socket *sol)
{

       if (sol->sol_upcall != NULL)
               (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
       else {
               selwakeuppri(&sol->so_rdsel, PSOCK);
               KNOTE_LOCKED(&sol->so_rdsel.si_note, 0);
       }
       SOLISTEN_UNLOCK(sol);
       wakeup_one(&sol->sol_comp);
}

[Note: I've had to do some work to get a kgdb
working this much on powerpc. This is not
from a minidump.]

(kgdb) print/x &((struct socket*)0x70bf860)->sol_upcall
$3 = 0x70bf948

(kgdb) print/x ((struct socket*)0x70bf860)->sol_upcall
$2 = 0x70bf878

(That is the address of the illegal instruction
reported.)

(kgdb) print/x &((struct socket*)0x70bf860)->so_rdsel
$7 = 0x70bf878
(kgdb) print/x &((struct socket*)0x70bf860)->so_rdsel.si_tdlist
$8 = 0x70bf878
(kgdb) print/x &((struct socket*)0x70bf860)->so_rdsel.si_tdlist.tqh_first
$9 = 0x70bf878

But comparing to the first anonymous struct in
the union in the new "struct socket":

(kgdb) print/x &((struct socket*)0x70bf860)->sol_upcall
$15 = 0x70bf948
(kgdb) print/x &((struct socket*)0x70bf860)->so_rcv->sb_sel
$22 = 0x70bf948

->so_rcv is a struct sockbuf and ->so_rcv->sb_sel
is a struct slinfo* .

So ->so_rcv->sb_sel pointing back to ->so_rdsel
might well make sense for that struct in the union.

But it appears to be the source of the 32-bit powerpc
crash during an attempted use of ->sol_upcall as well.
Comment 1 Mark Millard 2017-07-01 02:08:39 UTC
(In reply to Mark Millard from comment #0)

Some other supporting code details follow.

static struct socket *
soalloc(struct vnet *vnet)
{
        struct socket *so;
 
        so = uma_zalloc(socket_zone, M_NOWAIT | M_ZERO);
. . .
        so->so_rcv.sb_sel = &so->so_rdsel;
        so->so_snd.sb_sel = &so->so_wrsel;
. . .

That so->so_rcv.sb_sel assignment makes so->sol_upcall
non-NULL and so appear to be defined for use.

And that makes the following code problematical:

void
solisten_wakeup(struct socket *sol)
{
 
        if (sol->sol_upcall != NULL)
                (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
        else {
. . .

And this code is what is failing on production 32-bit
powerpc kernels.

There could be more anonymous struct field problems in
the union that is in struct socket . I've not checked.

I'll note that the only references to sol_upcall are:

# grep -r "\<sol_upcall" /usr/src/sys/* | more
/usr/src/sys/kern/uipc_socket.c:        if (sol->sol_upcall != NULL)
/usr/src/sys/kern/uipc_socket.c:                (void )sol->sol_upcall(sol, sol->sol_upcallarg, M_NOWAIT);
/usr/src/sys/kern/uipc_socket.c:        so->sol_upcall = func;
/usr/src/sys/kern/uipc_socket.c:        so->sol_upcallarg = arg;
/usr/src/sys/sys/socketvar.h:                   so_upcall_t     *sol_upcall;    /* (e) */
/usr/src/sys/sys/socketvar.h:                   void            *sol_upcallarg; /* (e) */

None of those assign NULL.

If NULL was assigned then ->so_rcv.sb_sel would
also become NULL in value.
Comment 2 Mark Millard 2017-07-01 02:35:45 UTC
(In reply to Mark Millard from comment #1)

FYI: I do understand that the specific aliasing
I list may well be 32-bit powerpc specific. But
other aliasing between the anonymous structs in
that union is just as likely to be problematical
for the sol_upcall field of the second
anonymnous struct.
Comment 3 Mark Millard 2017-07-01 06:05:00 UTC
(In reply to Mark Millard from comment #2)

It looks like the third head i386 -r320202
backtrace in:

https://lists.freebsd.org/pipermail/freebsd-current/2017-June/066323.html

might also be from the anonymous struct field
aliasing in the new union in struct socket
and the mishandling of keeping things from
interfering with each other when aliased.

At least some of the same call chain shows
in the backtrace in that report.
Comment 4 oleg.nauman 2017-07-01 14:41:31 UTC
I'm observing similar panics on i386 CURRENT r310466

__curthread () at ./machine/pcpu.h:225
225             __asm("movl %%fs:%1,%0" : "=r" (td)
(kgdb) #0  __curthread () at ./machine/pcpu.h:225
#1  doadump (textdump=-968633856) at ../../../kern/kern_shutdown.c:318
#2  0xc06e88c4 in kern_reboot (howto=<optimized out>)
    at ../../../kern/kern_shutdown.c:386
#3  0xc06e8c5b in vpanic (fmt=<optimized out>,
    ap=0xefd5c73c "\340\334\235\300\310\370\266\306\001")
    at ../../../kern/kern_shutdown.c:779
#4  0xc06e8b1b in panic (fmt=0xc092e18e "%s")
    at ../../../kern/kern_shutdown.c:710
#5  0xc08eed21 in trap_fatal (frame=0xefd5c878, eva=<optimized out>)
    at ../../../i386/i386/trap.c:978
#6  0xc08eea38 in trap (frame=<optimized out>)
    at ../../../i386/i386/trap.c:704
#7  <signal handler called>
#8  0xc6bcda1b in ?? ()
#9  0xc0770281 in unp_connect2 (so=<optimized out>, so2=<optimized out>,
    req=<optimized out>) at ../../../kern/uipc_usrreq.c:1497
#10 0xc076ff17 in unp_connectat (fd=<optimized out>, so=<optimized out>,
    nam=<optimized out>, td=<optimized out>)
    at ../../../kern/uipc_usrreq.c:1446
#11 0xc076d510 in unp_connect (so=0xc71c9400, nam=0xc662d500,
    td=<optimized out>) at ../../../kern/uipc_usrreq.c:1310
#12 uipc_connect (so=0xc71c9400, nam=0xc662d500, td=<optimized out>)
    at ../../../kern/uipc_usrreq.c:587
#13 0xc076a042 in kern_connectat (td=<optimized out>, dirfd=-100,
    fd=<optimized out>, sa=0xc662d500) at ../../../kern/uipc_syscalls.c:505
#14 0xc0769f49 in sys_connect (td=0xc6bcda18, uap=0xc6b6f988)
    at ../../../kern/uipc_syscalls.c:470
#15 0xc08ef679 in syscallenter (td=<optimized out>)
    at ../../../i386/i386/../../kern/subr_syscall.c:132
#16 syscall (frame=<optimized out>) at ../../../i386/i386/trap.c:1103
#17 <signal handler called>
#18 0x283a4747 in ?? ()
Backtrace stopped: Cannot access memory at address 0xbfbfe794
(kgdb)
Comment 5 Mark Millard 2017-07-01 15:56:03 UTC
(In reply to Mark Millard from comment #3)

It is a separate issue but I'll note that
(some?) C++ compilers reject the use of the
enum syntax that is in one of the anonymous
struct's in the union. A report about this
appeared on the lists before I figured out
what I reported in 220404.

Likely the header in question should be
usable from C and from C++.
(In reply to oleg.nauman from comment #4)
Comment 6 Mark Millard 2017-07-01 15:59:51 UTC
(In reply to oleg.nauman from comment #4)

The code that I've report predates the i386 listing
reported "HEAD/i386 r320212" reference: -r319722 and
so could be involved.

But it can not be involved for something from:

"I'm observing similar panics on i386 CURRENT r310466"

May be -r320466 was intended in comment #4? A typo?

But if the -r310466 is correct then it is a different
problem for comment #4.
Comment 7 oleg.nauman 2017-07-01 16:33:16 UTC
(In reply to Mark Millard from comment #6)
i386 CURRENT r320466 , of course
Comment 8 Sylvain Garrigues 2017-07-03 17:48:03 UTC
I am wondering if this is related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220452
Comment 9 Mark Millard 2017-07-03 18:03:27 UTC
(In reply to Sylvain Garrigues from comment #8)

Your 220452 is likely a duplicate of this 220404
report: just another example.

I've only done detailed analysis for the 32-bit
powerpc context.
Comment 10 oleg.nauman 2017-07-03 19:52:34 UTC
I can confirm that r319722 causing panics on i386 , r319721 is stable
Comment 11 oleg.nauman 2017-07-03 19:53:13 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220358 seems related too
Comment 12 Mark Millard 2017-07-03 20:00:15 UTC
I've changed Hardware to Any since the problem
is observed on each of: 32-bit powerpc, armv6/7,
and i386.
Comment 13 Sylvain Garrigues 2017-07-03 21:02:18 UTC
Please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220452 and comment #5 and #8 - i.e. the kernel panic seems to disappear when INVARIANTS is used in the kernel config, or more precisely when one #ifdef INVARIANTS is removed in sys/kern/uipc_socket.c
Comment 14 Kevin Bowling freebsd_committer freebsd_triage 2017-07-04 06:37:04 UTC
Related symptom that dtrace can't figure out the struct?

lockstat: failed to compile program: "/usr/lib/dtrace/tcp.d", line 239: so_snd is not a member of struct socket
Comment 15 Mark Millard 2017-07-04 19:36:56 UTC
Despite symptom differences the same fix
seems to cover 220358 and this (220404)
as I understand. 220452 is another.

220358 was first so mark this one as
a duplicate of 220358.

head -r320752 is a checked in fix for
these reports.


Side note:

head -r320652 's checkin says in part:

Verifying the offset locations mentioned above are identical is left
 as an exercise to the reader.

This report has an example of doing so.

*** This bug has been marked as a duplicate of bug 220358 ***