Summary: | head (e.g.) -r315870 for arm64: sh vs. jemalloc asserts: include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" once swapped in after being swapped out (comment 10); fix is a kernel patch (comment 37) | ||
---|---|---|---|
Product: | Base System | Reporter: | Mark Millard <marklmi26-fbsd> |
Component: | bin | Assignee: | Konstantin Belousov <kib> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | amd64, emaste, freebsd-arm, gonzo, jasone, lidl |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | arm64 | ||
OS: | Any | ||
Attachments: |
Description
Mark Millard
2017-02-16 12:05:54 UTC
(In reply to Mark Millard from comment #0) It turns out that the sh failure during buildworld also gets to __je_tsd_get (but a different way) and then fails the same assertion for "tsd_booted": <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" A back trace is: (lldb) bt * thread #1: tid = 100194, 0x0000000040554e18 libc.so.7`_thr_kill + 8, name = 'sh', stop reason = signal SIGABRT * frame #0: 0x0000000040554e18 libc.so.7`_thr_kill + 8 frame #1: 0x0000000040554ddc libc.so.7`__raise(s=6) + 64 at raise.c:52 frame #2: 0x0000000040554d50 libc.so.7`abort + 84 at abort.c:65 frame #3: 0x0000000040528790 libc.so.7`__je_tsd_fetch [inlined] __je_tsd_get + 248 at tsd.h:687 frame #4: 0x000000004052876c libc.so.7`__je_tsd_fetch [inlined] __je_tsd_fetch_impl(init=true) at tsd.h:692 frame #5: 0x000000004052876c libc.so.7`__je_tsd_fetch + 212 at tsd.h:717 frame #6: 0x0000000040550cc0 libc.so.7`__free(ptr=0x0000000040a17720) + 64 at jemalloc_jemalloc.c:2011 frame #7: 0x0000000000411328 sh`ckfree(p=<unavailable>) + 32 at memalloc.c:88 frame #8: 0x0000000000407cd8 sh`clearcmdentry + 76 at exec.c:505 frame #9: 0x0000000000406bfc sh`evalcommand(cmd=<unavailable>, flags=<unavailable>, backcmd=<unavailable>) + 3476 at eval.c:1182 frame #10: 0x0000000000405570 sh`evaltree(n=0x0000000040a1cde8, flags=<unavailable>) + 212 at eval.c:290 frame #11: 0x000000000041105c sh`cmdloop(top=<unavailable>) + 252 at main.c:231 frame #12: 0x0000000000410ed0 sh`main(argc=<unavailable>, argv=<unavailable>) + 660 at main.c:178 frame #13: 0x0000000000402f30 sh`__start + 360 frame #14: 0x0000000040434658 ld-elf.so.1`.rtld_start + 24 at rtld_start.S:41 It appears that setvar was not used but clearcmdentry (indirectly) gets the same sort of problem when this happens: (lldb) up 9 frame #9: 0x0000000000406bfc sh`evalcommand(cmd=<unavailable>, flags=<unavailable>, backcmd=<unavailable>) + 3476 at eval.c:1182 1179 if (lastarg) 1180 setvar("_", lastarg, 0); 1181 if (do_clearcmdentry) -> 1182 clearcmdentry(); 1183 } If one is going to look into this in a amd64 context it is important to be using head -r313772 or later in order to avoid fork sometimes not preserving the stack pointer on the child-process side of things --at least if experimenting with port or buildworld buildkernel builds as a means of testing. Getting past that stack pointer problem is what allowed me to see this problem during build activity, which started me down this exploration. [My tests for aborting in sh`forkshell if fork changes the stack pointer are still in place but there have been no failures so far.] Before starting a round of updates to a newer version of head I got a couple of sh core dumps that showed the same sort of failures. But in these I'd added recording the pid that was returned by fork. The result was that both the core dumps were from the parent- process side of the most recent fork. [It will be a bit before I'm done with updating to a newer version of head.] I got a somewhat different trace back this time: (lldb) bt * thread #1: tid = 100105, 0x0000000040554e18 libc.so.7`_thr_kill + 8, name = 'sh', stop reason = signal SIGABRT * frame #0: 0x0000000040554e18 libc.so.7`_thr_kill + 8 frame #1: 0x0000000040554ddc libc.so.7`__raise(s=6) + 64 at raise.c:52 frame #2: 0x0000000040554d50 libc.so.7`abort + 84 at abort.c:65 frame #3: 0x0000000040528790 libc.so.7`__je_tsd_fetch [inlined] __je_tsd_get + 248 at tsd.h:687 frame #4: 0x000000004052876c libc.so.7`__je_tsd_fetch [inlined] __je_tsd_fetch_impl(init=true) at tsd.h:692 frame #5: 0x000000004052876c libc.so.7`__je_tsd_fetch + 212 at tsd.h:717 frame #6: 0x0000000040550cc0 libc.so.7`__free(ptr=0x0000000040a390e0) + 64 at jemalloc_jemalloc.c:2011 frame #7: 0x0000000000411330 sh`ckfree(p=<unavailable>) + 32 at memalloc.c:88 frame #8: 0x000000000040e6c8 sh`freejob(jp=<unavailable>) + 108 at jobs.c:475 frame #9: 0x000000000040dd0c sh`waitforjob(jp=<unavailable>, origstatus=<unavailable>) + 352 at jobs.c:1066 frame #10: 0x0000000000406988 sh`evalcommand(cmd=<unavailable>, flags=<unavailable>, backcmd=<unavailable>) + 2848 at eval.c:1166 frame #11: 0x0000000000405570 sh`evaltree(n=0x0000000040ab9060, flags=<unavailable>) + 212 at eval.c:290 frame #12: 0x0000000000411064 sh`cmdloop(top=<unavailable>) + 252 at main.c:231 frame #13: 0x0000000000410ed8 sh`main(argc=<unavailable>, argv=<unavailable>) + 660 at main.c:178 frame #14: 0x0000000000402f30 sh`__start + 360 frame #15: 0x0000000040434658 ld-elf.so.1`.rtld_start + 24 at rtld_start.S:41 (lldb) up 10 frame #10: 0x0000000000406988 sh`evalcommand(cmd=<unavailable>, flags=<unavailable>, backcmd=<unavailable>) + 2848 at eval.c:1166 1163 parent: /* parent process gets here (if we forked) */ 1164 if (mode == FORK_FG) { /* argument to fork */ 1165 INTOFF; -> 1166 exitstatus = waitforjob(jp, &realstatus); 1167 INTON; 1168 if (iflag && loopnest > 0 && WIFSIGNALED(realstatus)) { 1169 evalskip = SKIPBREAK; [Note: the kernels after -r313266 are messed up for at least PowerMac so-called "Quad-Core"s so the below is based on a -r313864 buildworld and a -r313266 buildkernel because I had to revert the kernel to about the last that would avoid randomly timed panics.] I did a buildworld of -r313864 for powerpc64 without having MALLOC_PRODUCTION defined and installed and booted with it in order to see if powerpc64 had the same sort of problem that aarch64 is having with jemalloc, in particular during self-hosted buildworld buildkernel activity (my test case). powerpc64 worked fine with no aborts. So. . . include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" seems to be aarch64 (arm64) specific in some way. See also bugzilla 217239 for another type of arm64 specific problem with jemalloc without MALLOC_PRODUCTION. The following describes a reproducible sequence in my context, unfortunately involving hours of buildworld activity. It fails every time that I have tried it and at the same places each time. I give a contrast to a working context as well. Context: doing buildworld buidlkernel on a pine64+ with 2 GiBytes of RAM. Multiple head revisions, most recently: # uname -apKU FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT r313999M arm64 aarch64 1200021 1200021 The pine64 is running what was a cross build that had MALLOC_PRODUCTION not defined. (Unlike my usual way of building.) Problem: sh core dumps via failing an assert. (script core dumps as well for other reasons for one stage but I'm focused on the earliest failures for now: sh failures.) The following happens when I buildworld buildkernel on the pine64+ using: WITH_CLANG= WITH_CLANG_IS_CC= WITH_CLANG_FULL= WITH_CLANG_EXTRAS= WITH_LLD= WITH_LLDB= but not when using: WITHOUT_CLANG= WITHOUT_CLANG_IS_CC= WITHOUT_CLANG_FULL= WITHOUT_CLANG_EXTRAS= WITHOUT_LLD= WITHOUT_LLDB= (The rest being the same, starting after using cleanworld in both cases.) But note that the first failures happen long after the those have built what they contribute to the _generic_libs stage. (I have not yet tried isolating subsets.) Similarly for the later 2nd stage: well after "everything" did its llvm related activity. I've tried the failing case under both: 2 GiBytes RAM + 3 GiBytes swap and: 2 GiBytes RAM + 6 GiBytes swap It made no difference and there have been no messages about running out of swap space or other forms of resource limitation based process killing or the like. From sysutils/DTraceToolkit 's /usr/local/share/dtrace-toolkit/execsnoop : . . . 2017 Feb 22 16:37:02 0 61019 61018 make install DIRPRFX=lib/libusb/\0 2017 Feb 22 16:37:02 0 61020 61019 sh -e\0 2017 Feb 22 16:37:02 0 61021 61019 sh -e\0 2017 Feb 22 16:37:02 0 61022 61019 sh -e\0 2017 Feb 22 16:37:02 0 61023 61020 sh /usr/src/tools/install.sh -C -o root -g wheel -m 444 libusb.a /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/\0 2017 Feb 22 16:37:02 0 61024 61021 sh /usr/src/tools/install.sh -o root -g wheel -m 444 /usr/src/lib/libusb/libusb-0.1.pc /usr/src/lib/libusb/libusb-1.0.pc /usr/src/lib/libusb/libusb-2.0.pc /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/libdata/pkgconfig/\0 2017 Feb 22 16:37:02 0 61025 61022 sh /usr/src/tools/install.sh -C -o root -g wheel -m 444 /usr/src/lib/libusb/libusb20.h /usr/src/lib/libusb/libusb20_desc.h /usr/src/lib/libusb/usb.h /usr/src/lib/libusb/libusb.h /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/include/\0 2017 Feb 22 16:37:02 0 61023 61020 sh /usr/src/tools/install.sh -C -o root -g wheel -m 444 libusb.a /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/\0 2017 Feb 22 16:37:02 0 61023 61020 install -p libusb.a /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/\0 2017 Feb 22 16:37:02 0 61024 61021 sh /usr/src/tools/install.sh -o root -g wheel -m 444 /usr/src/lib/libusb/libusb-0.1.pc /usr/src/lib/libusb/libusb-1.0.pc /usr/src/lib/libusb/libusb-2.0.pc /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/libdata/pkgconfig/\0 2017 Feb 22 16:37:02 0 61025 61022 sh /usr/src/tools/install.sh -C -o root -g wheel -m 444 /usr/src/lib/libusb/libusb20.h /usr/src/lib/libusb/libusb20_desc.h /usr/src/lib/libusb/usb.h /usr/src/lib/libusb/libusb.h /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/include/\0 2017 Feb 22 16:37:02 0 61024 61021 install -p /usr/src/lib/libusb/libusb-0.1.pc /usr/src/lib/libusb/libusb-1.0.pc /usr/src/lib/libusb/libusb-2.0.pc /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/libdata/pkgconfig/\0 2017 Feb 22 16:37:02 0 61025 61022 install -p /usr/src/lib/libusb/libusb20.h /usr/src/lib/libusb/libusb20_desc.h /usr/src/lib/libusb/usb.h /usr/src/lib/libusb/libusb.h /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/include/\0 2017 Feb 22 16:37:02 0 61026 61020 sh /usr/src/tools/install.sh -s -o root -g wheel -m 444 libusb.so.3 /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/\0 2017 Feb 22 16:37:02 0 61026 61020 sh /usr/src/tools/install.sh -s -o root -g wheel -m 444 libusb.so.3 /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/\0 2017 Feb 22 16:37:02 0 61026 61020 install -p libusb.so.3 /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/\0 2017 Feb 22 16:37:02 0 61027 61020 sh /usr/src/tools/install.sh -o root -g wheel -m 444 libusb.so.3.debug /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/debug/usr/lib/\0 2017 Feb 22 16:37:02 0 61027 61020 sh /usr/src/tools/install.sh -o root -g wheel -m 444 libusb.so.3.debug /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/debug/usr/lib/\0 2017 Feb 22 16:37:02 0 61027 61020 install -p libusb.so.3.debug /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/debug/usr/lib/\0 2017 Feb 22 16:37:02 0 61028 61020 sh /usr/src/tools/install.sh -l rs libusb.so.3 /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/libusb.so\0 2017 Feb 22 16:37:02 0 61029 61028 ln -fsn libusb.so.3 /usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp/usr/lib/libusb.so\0 (That last was it for the build.) That is the end of the exec activity for the _generic_libs part of the build (and since the build stops: the last for the build overall). (The below ps -daux output is from some time before the problem happened but with later, related core files listed as well.) root 91353 0.0 0.1 6856 1500 u0 I+ 10:28 0:00.02 `-- /bin/sh /root/sys_build_scripts.pine64-host/make_pine64_nodebug_clang_bootstrap-pine64-host.sh -j 4 buildworld buildkernel root 91356 0.0 0.1 6204 1560 u0 S+ 10:28 0:06.59 `-- script /root/sys_typescripts/typescript_make_pine64_nodebug_clang_bootstrap-pine64-host-2017-02-22:10:28:28 env __MAKE_CONF=/ -rw------- 1 root wheel 4657152 Feb 22 16:37:04 2017 script.91356.core (from: ls -ltTU) root 91357 0.0 0.0 4948 204 1 Ss+ 10:28 0:01.87 `-- make -j 4 buildworld buildkernel root 91373 0.0 0.1 6856 1500 1 I 10:28 0:00.01 `-- sh -ev -rw------- 1 root wheel 4702208 Feb 22 16:37:03 2017 sh.91373.core (from: ls -ltTU) root 91374 0.0 0.0 4948 204 1 S 10:28 0:01.69 `-- make -m /usr/src/share/mk -f Makefile.inc1 TARGET=arm64 TARGET_ARCH=aarch64 buildworld root 10803 0.0 0.1 6856 1500 1 I 10:43 0:00.01 `-- sh -ev -rw------- 1 root wheel 4702208 Feb 22 16:37:02 2017 sh.10803.core (from: ls -ltTU) root 10804 0.0 0.0 4948 200 1 S 10:43 3:00.18 `-- make -f Makefile.inc1 DESTDIR=/usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp -DNO_FSCHG MK_HTML=no -DNO_LINT MK_MA root 10811 0.0 0.1 6856 1500 1 I 10:43 0:00.01 `-- sh -ev root 38075 0.0 0.0 4948 204 1 S 11:14 0:00.75 `-- make -f Makefile.inc1 _generic_libs root 38085 0.0 0.1 6856 1500 1 I 11:14 0:00.01 `-- sh -ev . . . "shutdown -r now" here makes no difference to the below when done after the reboot. (Of course there is some llvm related build activity during the "everything" stage below.) Doing another buildworld buildkernel to continue the build results in: . . . 2017 Feb 22 18:48:46 0 51772 51454 sh -e\0 2017 Feb 22 18:48:46 0 51773 51772 sed -E s,(^| |B|`)svn,\\1svnlite,g /usr/src/contrib/subversion/subversion/svn/svn.1\0 2017 Feb 22 18:48:46 0 51774 51454 sh -e\0 2017 Feb 22 18:48:46 0 51775 51774 gzip -cn svnlite.1\0 2017 Feb 22 18:48:48 0 51776 51454 sh -e\0 2017 Feb 22 18:48:48 0 51777 51776 \0 2017 Feb 22 18:48:48 0 51778 51777 \0 2017 Feb 22 18:48:49 0 51779 51454 sh -e\0 2017 Feb 22 18:48:49 0 51780 51779 /usr/local/aarch64-freebsd/bin/objcopy --only-keep-debug svnlite.full svnlite.debug\0 2017 Feb 22 18:48:50 0 51781 51454 sh -e\0 2017 Feb 22 18:48:50 0 51782 51781 /usr/local/aarch64-freebsd/bin/objcopy --strip-debug --add-gnu-debuglink=svnlite.debug svnlite.full svnlite\0 The above is the end of the "everything" exec activity but before the buildworld_epilogue (that does not happen). Again it is the last exec activity for the build because the build stops. (Again ps -daux from sometime before the failure mixed with core file ls -ltTU information below:) root 61122 0.0 0.1 6856 1500 u0 I+ 17:13 0:00.01 `-- /bin/sh /root/sys_build_scripts.pine64-host/make_pine64_nodebug_clang_bootstrap-pine64-host.sh -j 4 buildworld buildkernel root 61125 0.0 0.1 6204 1560 u0 S+ 17:13 0:09.56 `-- script /root/sys_typescripts/typescript_make_pine64_nodebug_clang_bootstrap-pine64-host-2017-02-22:17:13:45 env __MAKE_CONF= root 61126 0.0 0.0 4948 204 1 Ss+ 17:13 0:02.36 `-- make -j 4 buildworld buildkernel root 61142 0.0 0.1 6856 1500 1 I 17:13 0:00.01 `-- sh -ev -rw------- 1 root wheel 4702208 Feb 22 18:48:51 2017 sh.61142.core root 61143 0.0 0.0 4948 204 1 S 17:13 0:02.08 `-- make -m /usr/src/share/mk -f Makefile.inc1 TARGET=arm64 TARGET_ARCH=aarch64 buildworld root 81104 0.0 0.1 6856 1500 1 I 17:19 0:00.01 `-- sh -ev -rw------- 1 root wheel 4702208 Feb 22 18:48:50 2017 sh.81104.core root 81105 0.0 0.0 4948 220 1 S 17:19 0:02.57 `-- make -f Makefile.inc1 DESTDIR=/usr/obj/pine64_clang/arm64.aarch64/usr/src/tmp all root 13358 0.0 0.1 6856 1500 1 I 17:49 0:00.01 |-- sh -e . . . (Yep: script does not core dump for this 2nd stage context.) A 3rd buildworld buildkernel finishes the build, with buildworld being essentially a large no-op and then doing the buildkernel. Context details: # more ~/src.configs/make.conf CFLAGS.gcc+= -v (But this was not a gcc based build.) # more ~/src.configs/src.conf.pine64-clang-bootstrap.pine64-host TO_TYPE=aarch64 TOOLS_TO_TYPE=${TO_TYPE} # KERNCONF=GENERIC-NODBG TARGET=arm64 .if ${.MAKE.LEVEL} == 0 TARGET_ARCH=${TO_TYPE} .export TARGET_ARCH .endif # #WITH_CROSS_COMPILER= WITH_SYSTEM_COMPILER= # #CPUTYPE=soft WITH_LIBCPLUSPLUS= WITHOUT_BINUTILS_BOOTSTRAP= WITHOUT_ELFTOOLCHAIN_BOOTSTRAP= #WITHOUT_CLANG_BOOTSTRAP= WITH_CLANG= WITH_CLANG_IS_CC= WITH_CLANG_FULL= WITH_CLANG_EXTRAS= WITH_LLD= WITH_LLDB= # WITH_BOOT= WITHOUT_LIB32= WITHOUT_LIBSOFT= # WITHOUT_GCC_BOOTSTRAP= WITHOUT_GCC= WITHOUT_GCC_IS_CC= WITHOUT_GNUCXX= # NO_WERROR= #WERROR= MALLOC_PRODUCTION= # WITH_REPRODUCIBLE_BUILD= WITH_DEBUG_FILES= # CROSS_BINUTILS_PREFIX=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/ AS=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/as AR=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/ar LD=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/ld NM=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/nm OBJCOPY=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/objcopy OBJDUMP=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/objdump RANLIB=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/ranlib SIZE=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/size STRINGS=/usr/local/${TOOLS_TO_TYPE}-freebsd/bin/strings .export AS .export AR .export LD .export NM .export OBJCOPY .export OBJDUMP .export RANLIB .export SIZE .export STRINGS # svnlite status /usr/src/ | sort ? /usr/src/sys/amd64/conf/GENERIC-DBG ? /usr/src/sys/amd64/conf/GENERIC-NODBG ? /usr/src/sys/arm/conf/BPIM3-DBG ? /usr/src/sys/arm/conf/BPIM3-NODBG ? /usr/src/sys/arm/conf/RPI2-DBG ? /usr/src/sys/arm/conf/RPI2-NODBG ? /usr/src/sys/arm64/conf/GENERIC-DBG ? /usr/src/sys/arm64/conf/GENERIC-NODBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG ? /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-DBG ? /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG M /usr/src/bin/sh/jobs.c M /usr/src/bin/sh/miscbltin.c M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td M /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp M /usr/src/lib/csu/powerpc64/Makefile M /usr/src/libexec/rtld-elf/Makefile M /usr/src/sys/arm/arm/gic.c M /usr/src/sys/boot/ofw/Makefile.inc M /usr/src/sys/boot/powerpc/Makefile.inc M /usr/src/sys/boot/powerpc/kboot/Makefile M /usr/src/sys/boot/uboot/Makefile.inc M /usr/src/sys/conf/Makefile.powerpc M /usr/src/sys/conf/kmod.mk M /usr/src/sys/ddb/db_main.c M /usr/src/sys/ddb/db_script.c M /usr/src/sys/powerpc/ofw/ofw_machdep.c The . . ./conf/*-*DBG files include the standard files and then make adjustments to have a production style kernel build, including the arm64 case. Below the first two files are as they were used to isolate forks' original lack of preserving the sp value for the child process side when interrupts happen. (Since fixed in head but not in stable/11 last I looked.) # svnlite diff /usr/src/bin/sh/jobs.c /usr/src/bin/sh/miscbltin.c /usr/src/sys/arm/arm/gic.c Index: /usr/src/bin/sh/jobs.c =================================================================== --- /usr/src/bin/sh/jobs.c (revision 313999) +++ /usr/src/bin/sh/jobs.c (working copy) @@ -51,6 +51,9 @@ #include <stdlib.h> #include <unistd.h> +/* JUST FOR TESTING */ +#include <stdint.h> + #include "shell.h" #if JOBS #include <termios.h> @@ -833,6 +836,13 @@ * in a pipeline). */ +extern uintptr_t example_stack_address(void); + +uintptr_t stack_address_before_fork = 0; +uintptr_t stack_address_after_fork = 0; + +pid_t pid_from_fork = -1; + pid_t forkshell(struct job *jp, union node *n, int mode) { @@ -845,7 +855,10 @@ if (mode == FORK_BG && (jp == NULL || jp->nprocs == 0)) checkzombies(); flushall(); - pid = fork(); + stack_address_before_fork = example_stack_address(); + pid_from_fork = pid = fork(); + stack_address_after_fork = example_stack_address(); + if (stack_address_after_fork != stack_address_before_fork) abort(); if (pid == -1) { TRACE(("Fork failed, errno=%d\n", errno)); INTON; @@ -946,7 +959,6 @@ return pid; } - pid_t vforkexecshell(struct job *jp, char **argv, char **envp, const char *path, int idx, int pip[2]) { Index: /usr/src/bin/sh/miscbltin.c =================================================================== --- /usr/src/bin/sh/miscbltin.c (revision 313999) +++ /usr/src/bin/sh/miscbltin.c (working copy) @@ -64,6 +64,15 @@ #undef eflag + +/* JUST FOR TESTING */ +uintptr_t example_stack_address(void) +{ + volatile uintptr_t test = 0; + return (uintptr_t)(void*)&test; +} + + int readcmd(int, char **); int umaskcmd(int, char **); int ulimitcmd(int, char **); Index: /usr/src/sys/arm/arm/gic.c =================================================================== --- /usr/src/sys/arm/arm/gic.c (revision 313999) +++ /usr/src/sys/arm/arm/gic.c (working copy) @@ -672,9 +672,13 @@ if (irq >= sc->nirqs) { #ifdef GIC_DEBUG_SPURIOUS +#define EXPECTED_SPURIOUS_IRQ 1023 + if (irq != EXPECTED_SPURIOUS_IRQ) { device_printf(sc->gic_dev, - "Spurious interrupt detected: last irq: %d on CPU%d\n", + "Spurious interrupt %d detected of %d: last irq: %d on CPU%d\n", + irq, sc->nirqs, sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid)); + } #endif return (FILTER_HANDLED); } @@ -720,6 +724,16 @@ if (irq < sc->nirqs) goto dispatch_irq; + if (irq != EXPECTED_SPURIOUS_IRQ) { +#undef EXPECTED_SPURIOUS_IRQ +#ifdef GIC_DEBUG_SPURIOUS + device_printf(sc->gic_dev, + "Spurious end interrupt %d detected of %d: last irq: %d on CPU%d\n", + irq, sc->nirqs, + sc->last_irq[PCPU_GET(cpuid)], PCPU_GET(cpuid)); +#endif + } + return (FILTER_HANDLED); } The gic.c change just avoids getting uninteresting spurious interrupt messages on the console. Other changes are generally tied to my powerpc64 and powerpc investigations. (In reply to Mark Millard from comment #7) I should have explicitly mentioned that the processes that get the failures were created and initialized long before the clang/llvm related parts of the build occur, even though the failures occur after some of the potential clang/llvm build activity. (In reply to Mark Millard from comment #8) A note on script failures: I'll note that the script failures are part of the activity reported in bugzilla 217239, where for script the failures look in some respects like the powerd and su failures reported there: they are not td_booted asserts like reported here. Still there might be a common cause since the script failures are repeatable as reported here in 217138. I've reduced the testing context to the following type of example (no longer involving buildworld buildkernel): # sh # sh # sh . . . # sh (So a bunch of nested sh's in an ssh session that will mostly be swapped out. I happened to have done this under 2 users for the example material below: on I did an su first. Plus I had another session without such a nesting but with an su.) After forcing these to mostly swap out (see below) I use ^D to exit a currently interactive one. Then the rest of the swapped out ones from the nesting get the tsd_booted failure. So, for example: # stress -m 1 --vm-bytes 1536M The context is a PINE64+ with 2GB of RAM. The above was enough in my context to cause the needed swapouts: # ps -aOuser,flags PID USER F TT STAT TIME COMMAND 688 root 4102 u0 IWs 0:00.00 login [pam] (login) 689 root 10004002 u0 I+ 0:00.06 -sh (sh) 72611 markmi 4002 0 IWs 0:00.00 -sh (sh) 72613 markmi 4002 0 IW 0:00.00 sh 72614 markmi 4002 0 IW 0:00.00 sh 72615 markmi 4002 0 IW 0:00.00 sh 72616 markmi 4002 0 IW 0:00.00 sh 72617 markmi 4002 0 IW 0:00.00 sh 72618 markmi 4002 0 IW 0:00.00 sh 72619 markmi 4002 0 IW 0:00.00 sh 72620 markmi 4002 0 IW 0:00.00 sh 72621 markmi 4002 0 IW 0:00.00 sh 72622 markmi 4002 0 IW 0:00.00 sh 72623 markmi 4002 0 IW 0:00.00 sh 72624 markmi 4002 0 IW 0:00.00 sh 72625 markmi 4002 0 IW 0:00.00 sh 72626 markmi 10004002 0 I+ 0:00.01 sh 167 markmi 4002 1 IWs 0:00.00 -sh (sh) 169 root 4102 1 IW 0:00.00 su 170 root 4002 1 IW 0:00.00 su (sh) 171 root 4002 1 IW 0:00.00 sh 172 root 4002 1 IW 0:00.00 sh 173 root 4002 1 IW 0:00.00 sh 174 root 4002 1 IW 0:00.00 sh 175 root 4002 1 IW 0:00.00 sh 176 root 4002 1 IW 0:00.00 sh 177 root 4002 1 IW 0:00.00 sh 178 root 4002 1 IW 0:00.00 sh 179 root 10004002 1 I+ 0:00.01 sh 60961 root 4002 2 IW 0:00.00 stress -m 1 --vm-bytes 1536M 60962 root 10000002 2 R 0:29.41 stress -m 1 --vm-bytes 1536M 60964 root 10004002 2 R+ 0:00.01 ps -aOuser,flags 82389 markmi 4002 2 IWs 0:00.00 -sh (sh) 82391 root 4102 2 IW 0:00.00 su 82392 root 10004002 2 S 0:00.22 su (sh) So with that swapped out context established: (The markmi user case:) $ ^D<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" (The su'd user case:) # ^D<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" The one without the deeply nested sh's (but an su): # exit <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted" (In reply to Mark Millard from comment #10) I'll note that the sh.*.core files show that __je_tsd_booted and a very large area around it is all zero. (I've been seeing this for a long time but until the swapped-out-before discovery I had little information about when the big block of zeros showed up.) Some possibilities: A) The memory was not restored at all. B) Incorrect content was restored (possibly incorrect content was swapped out earlier). C) The memory and caches and such were not forced to be coherent in one or both directions of transfer. There may be other things that could be involved. Other folks on the lists have reported getting the Failed assertion: "tsd_booted" notices. So I'm changing the Importance field to track that now that a fairly simple, quick way to reproduce the problem is known. (Of course my results do involve only the equipment that I have access to: one PINE64+, powered USB hub, USB SSD, heat-sink, fan, case. So it is possible that I'm assuming too much about reproduction based on comments 10 and later.) (In reply to Mark Millard from comment #12) Here is an even stronger reproduction claim from the lists (for an rpi3 instead of for a pine64+): I can confirm process corruption on RPi3 as of r313567. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. (In reply to Mark Millard from comment #11) On reading about FreeBSD's swapping I see that swap-in does certain essentials, including setting up to demand-page most of the restored material if/when referenced. Thus another possibility for the problem is: D) The configuration to enable this demand-paging might be incorrect (possibly incomplete?) such that some of demand paging does not happen. (In reply to Mark Millard from comment #10) If I try just after booting stress -m 1 --vm-bytes 1800M is more like what I need to cause the swap-outs. 1700M is too little in that context. Also: Trying a debug kernel got the same results with only one notice: a vfs_bio.c bufwait vs. ufs_dirhash.c dirhash lock order reversal that likely is a well known one. (I guess the LOR is from writing core files into /var/crash/ .) (In reply to Mark Millard from comment #15) The problem only seems to occur when top shows RES 0K and a positive SWAP for the failing sh process before it executes. When top shows both a positive RES and a positive SWAP before the execution the sh process does not die. For example the following two did not die: PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND 11167 root 1 20 0 6880K 908K 1216K ttyin 0 0:00 0.00% su (sh) . . . 11144 root 1 20 0 6880K 568K 1556K ttyin 2 0:00 0.00% su (sh) https://lists.freebsd.org/pipermail/freebsd-arm/2017-March/015869.html shows an about 110 line C program that shows data corruption for fork-then-swap-out-then-swap-in. The known-corrupted data is a memory region that was malloc'd and that passed its byte pattern test before the fork. I'll also add the source as an attachment. Created attachment 180808 [details] C source for showing fork/swap memory corruption failure I'm still at a loss about how to figure out what stages are messed up. (Memory coherency? Some memory not swapped out? Bad data swapped out? Wrong data swapped in?) But at least I've found a much smaller/simpler example to demonstrate some problem with in my Pine64+_ 2GB context. The Pine64+ 2GB is the only arm64 context that I have access to. The attached program fails its check for data having its expected byte pattern in dynamically allocated memory after a fork/swap-out/swap-in sequence. The bytes end up all zero instead. I'll note that the program sleeps for 60s after forking to give time to do something else to cause the parent and child processes to swap out (RES=0 as seen in top). A point is the size of the region matters: <= 14K Bytes fails and > 14K Bytes works for as much has I have tested. https://lists.freebsd.org/pipermail/freebsd-arm/2017-March/015869.html has more material, including lldb's disassembly of the compiled and linked C code. (In reply to Mark Millard from comment #18) https://lists.freebsd.org/pipermail/freebsd-arm/2017-March/015869.html Should have had a subject/title saying arm64, not amd64. https://lists.freebsd.org/pipermail/freebsd-current/2017-March/065219.html (for example) has that corrected. (The correction has not shown up on the arm list yet for some reason.) Created attachment 180836 [details] Updated source that shows a tie to a 4KByte sized unit of failure See the comments tied to partial_test_check. See: https://lists.freebsd.org/pipermail/freebsd-arm/2017-March/015880.html I my test program I reported: #define region_size (4u*1024u+1u) // Bad dyn_region pattern, parent and child // processes: // 256u, 2u*1024u, 4u*1024u, 8u*1024u, // 9u*1024u, 12u*1024u, 14u*1024u // (but see the partial_test_check() call // notes above). // Works: // 14u*1024u+1u, 15u*1024u, 16u*1024u, // 32u*1024u, 256u*1024u*1024u It appears to turn out that the 14u*1024u vs. 14u*1024+1u boundary matches up with jemalloc: contrib/jemalloc/include/jemalloc/internal/size_classes.h : . . . size == 1<<lg_grp + ndelta<<lg_delta . . .\ * bin: 'yes' if a small bin size class, 'no' otherwise. . . . #if (LG_SIZEOF_PTR == 3 && LG_TINY_MIN == 3 && LG_QUANTUM == 3 && LG_PAGE == 12) #define SIZE_CLASSES \ /* index, lg_grp, lg_delta, ndelta, psz, bin, lg_delta_lookup */ \ . . . SC( 38, 13, 11, 3, no, yes, no) \ SC( 39, 13, 11, 4, yes, no, no) \ . . . #define SMALL_MAXCLASS ((((size_t)1) << 13) + (((size_t)3) << 11)) So SMALL_MAXCLASS==(14u*1024u). The problem in the example program is tied to small bin size handling in jemalloc. As far as my test program goes using ln -s 'tcache:false' /etc/malloc.conf prevents the memory corruptions for the fork-then-swap-out-then-swap-in sequence. So it is doing more than just avoid an assert. Note: I now have access to both a pine64+ 2GB and a rpi3 with no hardware in common (USB hub or SSD or such). (The rpi3 has less RAM and so does not need as big of a stress run.) The behaviors happen in both contexts: it was not some sort of pine64+ 2GB quirk (or the hub/SSD that I typically use in that context). (In reply to Mark Millard from comment #22) Being explicit about the combined conditions for failure for my test program: Both tcache enabled and allocations fitting in SMALL_MAXCLASS are required in order to make the program fail. Note: lldb) print __je_tcache_maxclass (size_t) $0 = 32768 which is larger than SMALL_MAXCLASS. I've not observed failures for sizes above SMALL_MAXCLASS but not exceeding __je_tcache_maxclass. Thus tcache use by itself does not seen sufficient for my program to get corruption of its dynamically allocated memory: the small allocation size also matters. Here are procstat -v results from while the two processes from the fork of my test program have already been swapped-out (before they run and fail during swap-in). I also show the results from looking at the matching core files. The start of the address range of that ends up all-zero when none of it should be is: (lldb) print dyn_region (region *volatile) $0 = 0x0000000040616000 # procstat -v 1954 PID START END PRT RES PRES REF SHD FLAG TP PATH 1954 0x10000 0x11000 r-- 1 51 5 1 CN-- vn /root/c_tests/a.out 1954 0x20000 0x21000 r-x 1 51 5 1 CN-- vn /root/c_tests/a.out 1954 0x30000 0x40000 rw- 16 0 1 0 C--- vn /root/c_tests/a.out 1954 0x40000 0x41000 r-- 0 0 2 0 CN-- sw 1954 0x41000 0x54000 rw- 0 0 2 0 CN-- sw 1954 0x40030000 0x4004a000 r-x 26 29 34 0 CN-- vn /libexec/ld-elf.so.1 1954 0x4004a000 0x40051000 rw- 3 0 1 0 C--- sw 1954 0x4005a000 0x4005c000 rw- 1 0 1 0 C--- sw 1954 0x4005c000 0x401b3000 r-x 343 384 61 27 CN-- vn /lib/libc.so.7 1954 0x401b3000 0x401c2000 --- 0 0 2 0 CN-- df 1954 0x401c2000 0x401cf000 rw- 13 0 2 0 CN-- vn /lib/libc.so.7 1954 0x401cf000 0x40201000 rw- 1 1 2 0 CN-- sw 1954 0x40400000 0x40800000 rw- 0 0 2 0 CN-- sw 1954 0xfffffffdf000 0xfffffffff000 rw- 0 0 1 0 C--D sw 1954 0xfffffffff000 0x1000000000000 r-x 1 1 37 0 ---- ph # procstat -v 1955 PID START END PRT RES PRES REF SHD FLAG TP PATH 1955 0x10000 0x11000 r-- 1 51 5 1 CN-- vn /root/c_tests/a.out 1955 0x20000 0x21000 r-x 1 51 5 1 CN-- vn /root/c_tests/a.out 1955 0x30000 0x40000 rw- 16 0 1 0 C--- vn /root/c_tests/a.out 1955 0x40000 0x41000 r-- 0 0 2 0 CN-- sw 1955 0x41000 0x54000 rw- 0 0 2 0 CN-- sw 1955 0x40030000 0x4004a000 r-x 26 29 34 0 CN-- vn /libexec/ld-elf.so.1 1955 0x4004a000 0x40051000 rw- 3 0 1 0 C--- sw 1955 0x4005a000 0x4005c000 rw- 1 0 1 0 C--- sw 1955 0x4005c000 0x401b3000 r-x 343 384 61 27 CN-- vn /lib/libc.so.7 1955 0x401b3000 0x401c2000 --- 0 0 2 0 CN-- df 1955 0x401c2000 0x401cf000 rw- 13 0 2 0 CN-- vn /lib/libc.so.7 1955 0x401cf000 0x40201000 rw- 1 1 2 0 CN-- sw 1955 0x40400000 0x40800000 rw- 0 0 2 0 CN-- sw 1955 0xfffffffdf000 0xfffffffff000 rw- 0 0 1 0 C--D sw 1955 0xfffffffff000 0x1000000000000 r-x 1 1 37 0 ---- ph The core file results are: # procstat -v /var/crash/a.out.1954.core PID START END PRT RES PRES REF SHD FLAG TP PATH 1954 0x10000 0x11000 r-- 1 51 3 1 CN-- vn /root/c_tests/a.out 1954 0x20000 0x21000 r-x 1 51 3 1 CN-- vn /root/c_tests/a.out 1954 0x30000 0x40000 rw- 16 0 1 0 C--- vn /root/c_tests/a.out 1954 0x40000 0x41000 r-- 1 1 1 0 CN-- sw 1954 0x41000 0x54000 rw- 4 4 1 0 C--- sw 1954 0x40030000 0x4004a000 r-x 26 29 30 0 CN-- vn /libexec/ld-elf.so.1 1954 0x4004a000 0x40051000 rw- 7 7 1 0 C--- sw 1954 0x4005a000 0x4005c000 rw- 2 2 1 0 C--- sw 1954 0x4005c000 0x401b3000 r-x 343 384 55 25 CN-- vn /lib/libc.so.7 1954 0x401b3000 0x401c2000 --- 0 0 1 0 CN-- df 1954 0x401c2000 0x401cf000 rw- 13 0 1 0 C--- vn /lib/libc.so.7 1954 0x401cf000 0x40201000 rw- 50 50 1 0 CN-- sw 1954 0x40400000 0x40800000 rw- 1024 1024 1 0 CN-- sw 1954 0xfffffffdf000 0xfffffffff000 rw- 3 3 1 0 C--D sw 1954 0xfffffffff000 0x1000000000000 r-x 1 1 33 0 ---- ph # procstat -v /var/crash/a.out.1955.core PID START END PRT RES PRES REF SHD FLAG TP PATH 1955 0x10000 0x11000 r-- 1 51 5 1 CN-- vn /root/c_tests/a.out 1955 0x20000 0x21000 r-x 1 51 5 1 CN-- vn /root/c_tests/a.out 1955 0x30000 0x40000 rw- 16 0 1 0 C--- vn /root/c_tests/a.out 1955 0x40000 0x41000 r-- 0 0 2 0 CN-- sw 1955 0x41000 0x54000 rw- 4 0 1 0 C--- sw 1955 0x40030000 0x4004a000 r-x 26 29 31 0 CN-- vn /libexec/ld-elf.so.1 1955 0x4004a000 0x40051000 rw- 4 0 1 0 C--- sw 1955 0x4005a000 0x4005c000 rw- 2 0 1 0 C--- sw 1955 0x4005c000 0x401b3000 r-x 343 384 56 25 CN-- vn /lib/libc.so.7 1955 0x401b3000 0x401c2000 --- 0 0 2 0 CN-- df 1955 0x401c2000 0x401cf000 rw- 13 0 1 0 C--- vn /lib/libc.so.7 1955 0x401cf000 0x40201000 rw- 1 1 2 0 CN-- sw 1955 0x40400000 0x40800000 rw- 1 1 2 0 CN-- sw 1955 0xfffffffdf000 0xfffffffff000 rw- 1 0 1 0 C--D sw 1955 0xfffffffff000 0x1000000000000 r-x 1 1 34 0 ---- ph Using procstat -v on one of the sh.*.core files from a process that directly reported the Failed assertion: "tds_booted" shows the following. But first the address of the tsd_booted storage: (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 # procstat -v /var/crash/sh.1766.core PID START END PRT RES PRES REF SHD FLAG TP PATH 1766 0x400000 0x423000 r-x 35 37 7 0 CN-- vn /bin/sh 1766 0x433000 0x436000 rw- 3 3 1 0 C--- sw 1766 0x40433000 0x4044d000 r-x 26 29 32 0 CN-- vn /libexec/ld-elf.so.1 1766 0x4044d000 0x40454000 rw- 3 3 1 0 C--- sw 1766 0x4045d000 0x4045f000 rw- 2 2 1 0 C--- sw 1766 0x4045f000 0x4048b000 r-x 44 48 14 7 CN-- vn /lib/libedit.so.7 1766 0x4048b000 0x4049b000 --- 0 0 1 0 CN-- df 1766 0x4049b000 0x4049d000 rw- 2 0 1 0 CN-- vn /lib/libedit.so.7 1766 0x4049d000 0x404a1000 rw- 0 0 1 0 CN-- sw 1766 0x404a1000 0x405f8000 r-x 343 384 58 26 CN-- vn /lib/libc.so.7 1766 0x405f8000 0x40607000 --- 0 0 1 0 CN-- df 1766 0x40607000 0x40614000 rw- 13 0 1 0 C--- vn /lib/libc.so.7 1766 0x40614000 0x4062d000 rw- 5 5 1 0 C--- sw 1766 0x4062d000 0x4067a000 r-x 77 82 16 8 CN-- vn /lib/libncursesw.so.8 1766 0x4067a000 0x40689000 --- 0 0 1 0 CN-- df 1766 0x40689000 0x4068e000 rw- 5 0 1 0 CN-- vn /lib/libncursesw.so.8 1766 0x4068e000 0x406ad000 rw- 0 0 1 0 C--- sw 1766 0x40800000 0x40c00000 rw- 9 9 1 0 C--- sw 1766 0xfffffffdf000 0xfffffffff000 rw- 6 6 1 0 C--D sw 1766 0xfffffffff000 0x1000000000000 r-x 1 1 34 0 ---- ph Similarly for an example of su.*.core that reported the assertion failure: (lldb) print &__je_tsd_booted (bool *) $0 = 0x000000004061d520 # procstat -v /var/crash/su.1765.core PID START END PRT RES PRES REF SHD FLAG TP PATH 1765 0x400000 0x404000 r-x 4 5 3 0 CN-- vn /usr/bin/su 1765 0x413000 0x414000 rw- 1 1 1 0 C--- sw 1765 0x40413000 0x4042d000 r-x 26 29 30 0 CN-- vn /libexec/ld-elf.so.1 1765 0x4042d000 0x40435000 rw- 8 8 1 0 C--- sw 1765 0x4043d000 0x4043f000 rw- 2 2 1 0 C--- sw 1765 0x4043f000 0x4044e000 r-x 15 16 33 16 CN-- vn /lib/libutil.so.9 1765 0x4044e000 0x4045e000 --- 0 0 1 0 CN-- df 1765 0x4045e000 0x4045f000 rw- 1 0 1 0 CN-- vn /lib/libutil.so.9 1765 0x4045f000 0x40461000 rw- 0 0 1 0 CN-- sw 1765 0x40461000 0x4046c000 r-x 11 13 21 9 CN-- vn /usr/lib/libpam.so.6 1765 0x4046c000 0x4047c000 --- 0 0 1 0 CN-- df 1765 0x4047c000 0x4047d000 rw- 1 0 1 0 C--- vn /usr/lib/libpam.so.6 1765 0x4047d000 0x40494000 r-x 8 8 19 8 CN-- vn /usr/lib/libbsm.so.3 1765 0x40494000 0x404a4000 --- 0 0 1 0 CN-- df 1765 0x404a4000 0x404a6000 rw- 0 0 1 0 CN-- vn /usr/lib/libbsm.so.3 1765 0x404a6000 0x405fd000 r-x 343 384 54 24 CN-- vn /lib/libc.so.7 1765 0x405fd000 0x4060c000 --- 0 0 1 0 CN-- df 1765 0x4060c000 0x40619000 rw- 13 0 1 0 CN-- vn /lib/libc.so.7 1765 0x40619000 0x40653000 rw- 4 4 1 0 C--- sw 1765 0x40653000 0x40654000 r-x 0 0 6 3 CN-- vn /usr/lib/pam_rootok.so.6 1765 0x40654000 0x40663000 --- 0 0 1 0 CN-- df 1765 0x40663000 0x40664000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_rootok.so.6 1765 0x40664000 0x40665000 r-x 0 0 8 4 CN-- vn /usr/lib/pam_self.so.6 1765 0x40665000 0x40674000 --- 0 0 1 0 CN-- df 1765 0x40674000 0x40675000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_self.so.6 1765 0x40675000 0x40676000 r-x 0 0 6 3 CN-- vn /usr/lib/pam_group.so.6 1765 0x40676000 0x40685000 --- 0 0 1 0 CN-- df 1765 0x40685000 0x40686000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_group.so.6 1765 0x40686000 0x40687000 r-x 0 0 17 7 CN-- vn /usr/lib/pam_opie.so.6 1765 0x40687000 0x40697000 --- 0 0 1 0 CN-- df 1765 0x40697000 0x40698000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_opie.so.6 1765 0x40698000 0x4069e000 r-x 0 0 17 7 CN-- vn /usr/lib/libopie.so.8 1765 0x4069e000 0x406ae000 --- 0 0 1 0 CN-- df 1765 0x406ae000 0x406b1000 rw- 0 0 1 0 CN-- vn /usr/lib/libopie.so.8 1765 0x406b1000 0x406d1000 r-x 0 0 17 7 CN-- vn /lib/libmd.so.6 1765 0x406d1000 0x406e0000 --- 0 0 1 0 CN-- df 1765 0x406e0000 0x406e1000 rw- 0 0 1 0 CN-- vn /lib/libmd.so.6 1765 0x406e1000 0x406e2000 r-x 0 0 17 7 CN-- vn /usr/lib/pam_opieaccess.so.6 1765 0x406e2000 0x406f1000 --- 0 0 1 0 CN-- df 1765 0x406f1000 0x406f2000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_opieaccess.so.6 1765 0x406f2000 0x406f5000 r-x 0 0 45 18 CN-- vn /usr/lib/pam_unix.so.6 1765 0x406f5000 0x40704000 --- 0 0 1 0 CN-- df 1765 0x40704000 0x40705000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_unix.so.6 1765 0x40705000 0x40710000 r-x 0 0 19 8 CN-- vn /lib/libcrypt.so.5 1765 0x40710000 0x40720000 --- 0 0 1 0 CN-- df 1765 0x40720000 0x40721000 rw- 0 0 1 0 CN-- vn /lib/libcrypt.so.5 1765 0x40721000 0x40732000 rw- 0 0 0 0 ---- -- 1765 0x40732000 0x40736000 r-x 0 0 17 7 CN-- vn /usr/lib/libypclnt.so.4 1765 0x40736000 0x40745000 --- 0 0 1 0 CN-- df 1765 0x40745000 0x40746000 rw- 0 0 1 0 CN-- vn /usr/lib/libypclnt.so.4 1765 0x40746000 0x4074f000 rw- 0 0 1 0 CN-- sw 1765 0x4074f000 0x40751000 r-x 0 0 17 7 CN-- vn /usr/lib/pam_login_access.so.6 1765 0x40751000 0x40760000 --- 0 0 1 0 CN-- df 1765 0x40760000 0x40761000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_login_access.so.6 1765 0x40761000 0x40764000 r-x 0 0 45 18 CN-- vn /usr/lib/pam_unix.so.6 1765 0x40764000 0x40773000 --- 0 0 1 0 CN-- df 1765 0x40773000 0x40774000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_unix.so.6 1765 0x40774000 0x40775000 r-x 0 0 21 9 CN-- vn /usr/lib/pam_permit.so.6 1765 0x40775000 0x40784000 --- 0 0 1 0 CN-- df 1765 0x40784000 0x40785000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_permit.so.6 1765 0x40785000 0x40786000 r-x 0 0 21 9 CN-- vn /usr/lib/pam_permit.so.6 1765 0x40786000 0x40795000 --- 0 0 1 0 CN-- df 1765 0x40795000 0x40796000 rw- 0 0 1 0 CN-- vn /usr/lib/pam_permit.so.6 1765 0x40800000 0x40c00000 rw- 8 8 1 0 C--- sw 1765 0xfffffffdf000 0xfffffffff000 rw- 2 2 1 0 C--D sw 1765 0xfffffffff000 0x1000000000000 r-x 1 1 33 0 ---- ph The earlier procstat -v comments have something interesting in common other than the procstat results themselves: the 3 separate programs all end up with zeros in the same general memory area of each process: (lldb) print dyn_region (region *volatile) $0 = 0x0000000040616000 (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 The first is from dynamic allocation ending up in the area. The other two are from libc.so.7 globals/statics ending up in the general area. It looks like something is trashing a specific memory area for some reason. (In reply to Mark Millard from comment #27) I incorrectly copy/pasted one of the addresses. The last one below is the correction: (lldb) print dyn_region (region *volatile) $0 = 0x0000000040616000 (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 (lldb) print &__je_tsd_booted (bool *) $0 = 0x000000004061d520 Created attachment 181044 [details]
Program showing posix_madvise with POSIX_MADV_WILLNEED can prevent the failure
posix_madvise(.,.,POSIX_MADV_WILLNEED) in the child process
after the fork but before the sleep/swap-out avoids the failure.
In the parent process it makes no difference.
The .,. is the address and size for a memory region that is
to avoid the problem.
(In reply to Mark Millard from comment #29) It turns out that POSIX_MADV_WILLNEED can instead be POSIX_MADV_DONTNEED and it still prevents the failures. (In reply to Mark Millard from comment #30) Ignore the POSIX_MADV_DONTNEED: the example child had not reached the zero RES(ident memory) status. When it does it fails (and so to the ancestor processes if they also reached such). I have confirmed that -r315870 still trashes memory for the sequence: allocations (tcache and <= SMALL_MAXCLASS) fork sleep/wait then swap-out [zero RES(ident memory)] swap-in Created attachment 181258 [details]
Test program with thousands of 14 KiByte allocations
The test program that has thousands of 14 KiByte
allocations usually ends up with them all being
zeroed in the parent and child of the fork.
But I've had a couple of runs where a much smaller
prefix was messed up and then there were normal,
expected values.
#define region_size (14u*1024u)
. . .
#define num_regions (256u*1024u*1024u/region_size)
So num_regions==18724, using up most of 256 MiBytes.
Note: each region has its own 14 KiByte allocation.
But dyn_regions[1296].array[0] in one example was
the first normal value.
In another example dyn_regions[2180].array[4096] was
the first normal value.
The last is interesting for being part way through
an allocation's space. That but aligning with a 4
KiByte page size would seem odd for a pure-jemalloc
issue.
Created attachment 181498 [details]
Source for allowing swap-before-fork in test in addition to swap-after-fork
I've discovered that if between initializing the allocated memory
and the fork() the process swaps-out then the problem does not
happen later, even if the parent and child are swapped-out after
the fork() as well.
I've found that for the "Source for allowing" attachment I can make variations that change how much of the allocated region prefix ends up zero vs. stays good. I vary the sleep time between testing the initialized allocations and doing the fork. The longer the sleep the more zero pages show up: # diff swap_testing[56].c 1c1 < // swap_testing5.c --- > // swap_testing6.c 5c5 < // cc -g -std=c11 -Wpedantic -o swaptesting5 swap_testing5.c --- > // cc -g -std=c11 -Wpedantic -o swaptesting5 swap_testing6.c 33c33 < sleep(30); // Potentialy force swap-out here. --- > sleep(150); // Potentialy force swap-out here. 37a38,48 > // For no-swap-out here cases: > // > // The longer the sleep here the more allocations > // that end up as zero. > // > // top's Mem Active, Inact, Wired, Bug, Free and > // Swap Total, Used, and Free stay unchanged. > // What does change is the process RES decreases > // while the process SIZE and SWAP stay unchanged > // during this sleep. > NOTE: On other architectures that I've tried (such as armv6/v7) RES does not decrease during the sleep --and the problem does not happen even for as long of sleeps as I've tried. (I use "stress -m 2 --vm-bytes 900M" on armv6/v7 instead of -m 1 --vm-bytes 1800M because that large in one process is not allowed.) So watching top's RES during the sleep (longer than a few seconds) predicts the later fails-vs.-not status: If RES decreases (while other things associated with the process status stay the same) then there will be a failure. (In reply to Mark Millard from comment #35) I've been able to identify what code sequence is gradually removing the "small_mappings" via some breakpointing in the kernel after reaching the "should be just sleeping" status. Specifically I started with breakpointing when pmap_resident_count_dec was on the call stack in order to see the call chain(s) that lead to it being called while RES(ident memory) is gradually decreasing during the sleep that is just before forking. (tid 100067 is [pagedaemon{pagedaemon}], which is in vm_pageout_worker. bt does not show inlined layers.) [ thread pid 17 tid 100067 ] Breakpoint at $x.1: undefined d65f03c0 db> bt Tracing pid 17 tid 100067 td 0xfffffd0001c4aa00 . . . handle_el1h_sync() at pmap_remove_l3+0xdc pc = 0xffff000000604870 lr = 0xffff000000611158 sp = 0xffff000083a49980 fp = 0xffff000083a49a40 pmap_remove_l3() at pmap_ts_referenced+0x580 pc = 0xffff000000611158 lr = 0xffff000000615c50 sp = 0xffff000083a49a50 fp = 0xffff000083a49ac0 pmap_ts_referenced() at vm_pageout+0xe60 pc = 0xffff000000615c50 lr = 0xffff0000005d1f74 sp = 0xffff000083a49ad0 fp = 0xffff000083a49b50 vm_pageout() at fork_exit+0x94 pc = 0xffff0000005d1f74 lr = 0xffff0000002e01c0 sp = 0xffff000083a49b60 fp = 0xffff000083a49b90 fork_exit() at fork_trampoline+0x10 pc = 0xffff0000002e01c0 lr = 0xffff0000006177b4 sp = 0xffff000083a49ba0 fp = 0x0000000000000000 It turns out that pmap_ts_referenced is on its: small_mappings: . . . path for the above so the pmap_remove_l3 call is the one from that execution path. (Found by more breakpointing after enabling such on the paths.) So this is the path with: (breakpoint hook not shown) /* * Wired pages cannot be paged out so * doing accessed bit emulation for * them is wasted effort. We do the * hard work for unwired pages only. */ pmap_remove_l3(pmap, pte, pv->pv_va, tpde, &free, &lock); pmap_invalidate_page(pmap, pv->pv_va); cleared++; if (pvf == pv) pvf = NULL; pv = NULL; . . . From what I can tell this code is eliminating the content of pages that in the failing tests, ones with no backing store yet (not swapped-out yet). The observed behavior is that the pages that have the above happen end up as zero pages once swapped-out and back in. I do not see anything putting the pages that this happens to into any other lists to keep track of the contents of the page content. The swap-out and swap-in seem to have ignored these pages and to have been based on automatically zeroed pages instead. Note that the (or a) question might be if these pages should have ever gotten to this code at all. (I'm no expert overall.) But that might get into why POSIX_MADV_WILLNEED spanning each page is sufficient to avoid the zeros issue for work-then-swapout-and-back-in. I'll only write here about what the backtrace code seems to be doing if I'm interpreting correctly. One oddity here is that pmap_remove_l3 does its own pmap_invalidate_page to invalidate the same tlb entry as the above pmap_invalidate_page, so a double-invalidate. (I've no clue if such is just suboptimal vs. a form of error.) pmap_remove_l3 here does things that the analogous sys/arm/arm/pmap-v6.c's pmap_ts_referenced does not do and pmap-v6 does something this code does not. arm64's pmap_remove_l3 does (in summary): pmap_invalidate_page decrements the resident_count pmap_unwire_l3 (then pmap_ts_referenced's small_mappings code does another pmap_invalidate_page for the same argument values) arm pmap-v6's pmap_ts_referenced's small_mappings code does: conditional vm_page_dirty pte2_clear_bit for PTE2_A pmap_tlb_flush There is, for example, no decrement of the resident_count involved (that I found anyway). But I've no clue just what should be analogous vs. what should not between pmap-v6 and arm64's pmap code in this area. I'll also note that the code before the arm64 small_mappings code also uses pmap_remove_l3 but does not do the decrement nor the extra pmap_invalidate_page (for example). But again I do not know how analogous the two paths should be. Only the small_mappings path seems to have the end-up-with-zeros problem for the later fork-then-swap-out and then swap-back-in context. Konstantin Belousov provided a patch that seems to have fixed the issue. . . On 2017-Apr-9, at 5:27 AM, Konstantin Belousov <kostikbel@gmail.com> wrote: Hmm, could you try the following patch, I did not even compiled it. diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c index 3d5756ba891..55aa402eb1c 100644 --- a/sys/arm64/arm64/pmap.c +++ b/sys/arm64/arm64/pmap.c @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot) sva += L3_SIZE) { l3 = pmap_load(l3p); if (pmap_l3_valid(l3)) { + if ((l3 & ATTR_SW_MANAGED) && + pmap_page_dirty(l3)) { + vm_page_dirty(PHYS_TO_VM_PAGE(l3 & + ~ATTR_MASK)); + } pmap_set(l3p, ATTR_AP(ATTR_AP_RO)); PTE_SYNC(l3p); /* XXX: Use pmap_invalidate_range */ (In reply to Mark Millard from comment #37) buildworld buildkernel completed non-stop for the first time on a BPI-M3 board. Looks good for a check-in to svn to me (head and stable/11). This combined with 2017-Feb-15's -r313772's fix to the fork trampline code's updating of sp_el0 makes arm64 far more stable for my purposes. -r313772 was never MFC'd to stable/11. In my view it should be. *** Bug 217239 has been marked as a duplicate of this bug. *** A commit references this bug: Author: kib Date: Mon Apr 10 15:32:26 UTC 2017 New revision: 316679 URL: https://svnweb.freebsd.org/changeset/base/316679 Log: Do not lose dirty bits for removing PROT_WRITE on arm64. Arm64 pmap interprets accessed writable ptes as modified, since ARMv8.0 does not track Dirty Bit Modifier in hardware. If writable bit is removed, page must be marked as dirty for MI VM. This change is most important for COW, where fork caused losing content of the dirty pages which were not yet scanned by pagedaemon. Reviewed by: alc, andrew Reported and tested by: Mark Millard <markmi@dsl-only.net> PR: 217138, 217239 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Changes: head/sys/arm64/arm64/pmap.c (In reply to Mark Millard from comment #38) I had been thinking of the BPI-M3 for other reasons and typed that instead of the correct: Pine64+ 2GB. (True elsewhere as well.) I do really mean arm64 here, not armv7. (In reply to commit-hook from comment #40) stable/11 -r317354 completes the fixes there: Author: kib Date: Mon Apr 24 07:52:44 2017 New Revision: 317354 URL: https://svnweb.freebsd.org/changeset/base/317354 Log: MFC r316679: Do not lose dirty bits for removing PROT_WRITE on arm64. Modified: stable/11/sys/arm64/arm64/pmap.c Directory Properties: stable/11/ (props changed) . . . See: https://lists.freebsd.org/pipermail/svn-src-stable-11/2017-April/003041.html So now the old rpi3 wiki page material is only for masking some of the problems in older contexts for 12 or for older contexts for 11. Note: -r313772 from head is now -r317147 in stable/11 . So interrupts will no longer trash the sp_el0 register during the fork-trampoline code. This was the other issue with fork behavior that caused me build problems. With both in place for head I no loner have problems with buildworld and the like ,in head. [I've not been running 11, just 12.] Fix has now been merged to stable/11 and will be in the 11.1 release. Mark thank you for testing/confirming! |