Bug 213778 - stable/11 -r307797 on BPi-M3 (cortex-a7): truss gets segmentation fault for handling SIGSYS
Summary: stable/11 -r307797 on BPi-M3 (cortex-a7): truss gets segmentation fault for h...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.0-STABLE
Hardware: arm Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-25 18:38 UTC by Mark Millard
Modified: 2017-06-03 15:59 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Millard 2016-10-25 18:38:24 UTC
In trying to build lang/gcc6 xgcc's cc1 got some SIGSYS examples. In trying to track things down I ran into truss getting a SIGSEGV when it tries to handle the situation. . .

In truss's enter_syscall there is (from a live gdb on truss, after the segmentation fault):

380		t->cs.name = sysdecode_syscallname(t->proc->abi->abi, t->cs.number);
381		if (t->cs.name == NULL)
(gdb) 
382			fprintf(info->outfile, "-- UNKNOWN %s SYSCALL %d --\n",
383			    t->proc->abi->type, t->cs.number);
384	
385		sc = get_syscall(t->cs.name, narg);
386		t->cs.nargs = sc->nargs;
387		assert(sc->nargs <= nitems(t->cs.s_args));
388	
389		t->cs.sc = sc;

(gdb) print *t
$2 = {entries = {le_next = 0x0, le_prev = 0x20617070}, proc = 0x20617060, tid = 100150, in_syscall = 1, cs = {sc = 0x0, name = 0x0, number = 580828064, args = 0x2061b0c0, nargs = 0, 
    s_args = 0x2061b0ec}, before = {tv_sec = 1477418265, tv_nsec = 492342263}, after = {tv_sec = 1477418265, tv_nsec = 492496630}}

(gdb) print sc
$3 = (struct syscall *) 0x0

So line 386 listed above gets a segmentation fault for sc->nargs when t->cs.name is a NULL pointer: sc ends up NULL.

Looking at the two things that the fprintf on lines 382 and 383 would report:

(gdb) print t->proc->abi->type
$4 = 0x10166 "FreeBSD ELF32"

(gdb) print t->cs.number
$5 = 580828064

(gdb) print narg
$6 = 0

(that last is for context for the get_syscall arguments).

FYI: 580828064 = 0x229EBBA0


Context:

root@bananapi-m3:/usr/ports # uname -apKU
FreeBSD bananapi-m3 11.0-STABLE FreeBSD 11.0-STABLE #0 r307797M: Mon Oct 24 00:41:16 PDT 2016     markmi@FreeBSDx64:/usr/local/src/crochet/work/obj/arm.armv6/usr/src/sys/ALLWINNER  arm armv6 1100505 1100505
Comment 1 Mark Millard 2016-10-26 22:33:09 UTC
(In reply to Mark Millard from comment #0)

The following is from a report about a different issue than truss but what it also says about the value reported bh gdb for t->cs.number when truss gets the segmentation fault may be relevant information for truss's behavior. . .


Using "ktrace -i -t +fw" it looks like every repeat of the problem ends up with the following sort of sequence (a variation is shown later):

34629 cc1      CALL  mmap(0,0x4000,0x3<PROT_READ|PROT_WRITE>,0x1002<MAP_PRIVATE|MAP_ANON>,0xffffffff,0x1c,0,0)
34629 cc1      RET   mmap 568225792/0x21de7000
34629 cc1      PFLT  0x21de7000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      PFLT  0x21de8000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      PFLT  0x21de9000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      PFLT  0x21dea000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      PFLT  0x229e8000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      PFLT  0x229e9000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      PFLT  0x229ea000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      CSW   stop user "ast"
34629 cc1      CSW   resume user "ast"
34629 cc1      PFLT  0x229eb000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      PFLT  0x229ec000 VM_PROT_WRITE
34629 cc1      PRET  KERN_SUCCESS
34629 cc1      CALL  [-17504]
34629 cc1      RET   [-17504] -1 errno 78 Function not implemented
34629 cc1      PSIG  SIGSYS SIG_DFL code=SI_KERNEL
34629 cc1      NAMI  "cc1.core"
34630 as       CSW   stop kernel "piperd"
34630 as       Events dropped.
34630 as       RET   read 0
34630 as       CALL  close(0)
34630 as       RET   close 0
. . .

I'll note that for the source this was compiling I used gdb truss with run -feH -o truss.log and it reported:

(gdb) print t->cs.number
$5 = 580828064

FYI: 580828064 = 0x229EBBA0

where the truss segmentation fault was at line 385 of the following (sc==NULL in the context):

380		t->cs.name = sysdecode_syscallname(t->proc->abi->abi, t->cs.number);
381		if (t->cs.name == NULL)
(gdb) 
382			fprintf(info->outfile, "-- UNKNOWN %s SYSCALL %d --\n",
383			    t->proc->abi->type, t->cs.number);
384	
385		sc = get_syscall(t->cs.name, narg);
386		t->cs.nargs = sc->nargs;
387		assert(sc->nargs <= nitems(t->cs.s_args));
388	
389		t->cs.sc = sc;

The 229E matched the upper part of local PFLT activity around the user "ast" CSW's, including just before the bad call.

But the details do vary some based on the source file being compiled. For example here the user "ast" CSW's are just before the mmap but are still just after the 0x229ea000 PFLT:

34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0xbfbf2000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x229e7000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x229e8000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x229e9000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x229ea000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      CSW   stop user "ast"
34698 cc1      CSW   resume user "ast"
34698 cc1      CALL  mmap(0,0x4000,0x3<PROT_READ|PROT_WRITE>,0x1002<MAP_PRIVATE|MAP_ANON>,0xffffffff,0,0,0)
34698 cc1      RET   mmap 568225792/0x21de7000
34698 cc1      PFLT  0x21de7000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x21de8000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x21de9000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x21dea000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      PFLT  0x229eb000 VM_PROT_WRITE
34698 cc1      PRET  KERN_SUCCESS
34698 cc1      CALL  [-25840]
34698 cc1      RET   [-25840] -1 errno 78 Function not implemented
34698 cc1      PSIG  SIGSYS SIG_DFL code=SI_KERNEL
34698 cc1      NAMI  "cc1.core"
34699 as       CSW   stop kernel "piperd"
34699 as       Events dropped.
34699 as       RET   read 0
34699 as       CALL  close(0)
34699 as       RET   close 0

-25840 in 2's complement is: 0xF...F9B10

Here doing the gdb truss instead reports:

(gdb) print t->cs.number
$1 = 580819728

and 580819728 = 0x229E9B10

and the 229E part matches several PFLT's in the area, including just before the bad call as well as just before the user "ast"s. Between them are some PFLT's that do not match.

I would guess that the 229E in t->cs.number in truss is from the PFLT just before the failing syscall in each case.
Comment 2 Mark Millard 2016-10-31 05:33:10 UTC
(In reply to Mark Millard from comment #1)

FYI notes:

See bugzilla 213936 for material about why lang/gcc6's xgcc's cc1 for armv6/cortex-a7 gets the SIGSYS (or whatever) in the first place: clang 3.8.0's occasional code generation problems lead to messed up stack handling in cc1 and a stack address in the armv6/cortex-a7 pc register. (lang/gcc6's build had not gotten far enough to have built and be using its own self-hosted cc1 yet.)

Still the cc1 produced by clang 3.8.0 shows crash problems in truss for handling processes with errors. So the odd code is useful for testing/improving truss (and possibly more). For now I'm leaving the BPI-M3 set up for such activity.
Comment 3 Mark Millard 2017-01-10 19:42:26 UTC
(In reply to Mark Millard from comment #2)

The attribution to clang 3.8.0 for the original problem
was wrong: It was a SSD bit corruption instead. While
the linked output was wrong the original .o from clang
was correct.

(I noticed this long ago but forgot to note it here
at the time.)
Comment 4 Mark Millard 2017-01-10 19:50:28 UTC
It does not look like head's 309589 check-in (2016-Dec-06) has been
MFC'd yet. So stable/11 likely still has this problem.
Comment 5 John Baldwin freebsd_committer freebsd_triage 2017-06-03 15:59:28 UTC
Fixed in r309589 and r312084.