Failed runs: https://ci.freebsd.org/job/FreeBSD-head-i386-build/9799/console https://ci.freebsd.org/job/FreeBSD-head-armv7-build/2129/console Output (from FreeBSD-head-i386-build/9799) The order looks correct but might be the timing is too close or some path issues. 00:10:50.832 --- lib/libgcc_s__L --- 00:10:50.832 sh /usr/src/tools/install.sh -l rs -o root -g wheel -m 755 /usr/obj/usr/src/i386.i386/tmp/lib/libgcc_s.so.1 /usr/obj/usr/src/i386.i386/tmp/usr/lib/libgcc_s.so ... 00:10:50.836 --- lib/libgssapi__L --- 00:10:50.836 /usr/obj/usr/src/i386.i386/tmp/usr/bin/ld: error: unable to find library -lgcc_s ... 00:10:50.838 --- lib/libgssapi__L --- 00:10:50.838 cc: error: linker command failed with exit code 1 (use -v to see invocation)
Back on 2018-June-18 Bryan Drewery wrote in a response for something else that looked initially to be a possible race: If it was -lgcc_s then it's a known rare build race due to tools/install.sh not handling -S.
Occurred in a private CI build of mine: https://cirrus-ci.com/task/6381946174177280
Indeed, the build installs libgcc_s twice to the same location: % grep 'install.*libgcc_s.so$' ~/Downloads/6381946174177280-main.log sh /tmp/cirrus-ci-build/tools/install.sh -l rs -o root -g wheel -m 755 /usr/obj/tmp/cirrus-ci-build/amd64.amd64/tmp/lib/libgcc_s.so.1 /usr/obj/tmp/cirrus-ci-build/amd64.amd64/tmp/usr/lib/libgcc_s.so sh /tmp/cirrus-ci-build/tools/install.sh -l rs -o root -g wheel -m 755 /usr/obj/tmp/cirrus-ci-build/amd64.amd64/tmp/lib/libgcc_s.so.1 /usr/obj/tmp/cirrus-ci-build/amd64.amd64/tmp/usr/lib/libgcc_s.so
In a successful build we actually install libgcc_s.so three times to the same location.
(In reply to Ed Maste from comment #4) Is there a installworld vs. buildworld confusion? (I'd expect ld to be a buildworld issue, not a installworld one.) I see from recent activity, using: egrep -r '(--- installworld ---|install.*libgcc_s.so)' /root/sys_typescripts/ /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:--- installworld --- /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:--- installworld --- /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:install -s -o root -g wheel -m 444 -S libgcc_s.so.1 /lib/ /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:install -o root -g wheel -m 444 libgcc_s.so.1.debug /usr/lib/debug/lib/ /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:install -l rs -o root -g wheel -m 755 /lib/libgcc_s.so.1 /usr/lib/libgcc_s.so /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:install -s -o root -g wheel -m 444 -S libgcc_s.so.1 /usr/lib32/ /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:install -o root -g wheel -m 444 libgcc_s.so.1.debug /usr/lib/debug/usr/lib32/ /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-21:16:45:49:install -l rs -o root -g wheel -m 755 libgcc_s.so.1 /usr/lib32/libgcc_s.so /root/sys_typescripts/typescript_make_amd64_nodebug_clang-amd64-host-2018-10-22:15:07:14:Command: env __MAKE_CONF=/root/src.configs/make.conf SRCCONF=/dev/null SRC_ENV_CONF=/root/src.configs/src.conf. (So all from the same log file: one for installworld.) No matches found in files without installworld showing. Similarly for all the log history the grep went through. (I run buildworld and installworld in separate runs, not together, so separate log files in the directory.) The logs span amd64, armv7, aarch64, powerpc64, and powerpc builds. I see no evidence of buildworld doing install's of libgcc_s.so materials.
(In reply to Mark Millard from comment #5) The installs are done during buildworld as part of the libraries target. In a submake the _prereq_libs, _startup_libs and _generic_libs targets are built in turn and each of these builds and installs libgcc_s.so (see the ${_lib}__PL and ${_lib}__L targets in Makefile.inc1)
(In reply to Ed Maste from comment #6) Ahh, I needed to look in the .meta files. Sorry. What you are reporting may mean that some .meta files are written more than once, so I'd see content from only the last pass for each. Messy. For reference (for a first time build of a head -r345520 based tree): # grep -lr 'CMD .*install.*libgcc_s\.so' /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/gnu/lib/libgcc/libgcc_s.so.1.meta /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/gnu/lib/libgcc/_libinstall.meta /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/gnu/lib/libgcc/libgcc_s.so.1.meta /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/gnu/lib/libgcc/_libinstall.meta So 4 meta files, 2 for lib32 activity and 2 not: # Meta data file /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/gnu/lib/libgcc/libgcc_s.so.1.meta CMD /usr/local/powerpc64-unknown-freebsd13.0/bin/objcopy --strip-debug --add-gnu-debuglink=libgcc_s.so.1.debug libgcc_s.so.1.full libgcc_s.so.1 CMD @sh /usr/src/tools/install.sh -l s -o root -g wheel -m 444 libgcc_s.so.1 libgcc_s.so CWD /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/gnu/lib/libgcc TARGET libgcc_s.so.1 # Meta data file /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/gnu/lib/libgcc/_libinstall.meta CMD sh /usr/src/tools/install.sh -o root -g wheel -m 444 -S libgcc_s.so.1 /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/lib/ CMD sh /usr/src/tools/install.sh -o root -g wheel -m 444 libgcc_s.so.1.debug /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/usr/lib/debug/lib/ CMD sh /usr/src/tools/install.sh -l rs -o root -g wheel -m 755 /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/lib/libgcc_s.so.1 /usr/obj/powerpc64 vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/usr/lib/libgcc_s.so CWD /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/gnu/lib/libgcc TARGET _libinstall # Meta data file /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/gnu/lib/libgcc/libgcc_s.so.1.meta CMD /usr/local/powerpc64-unknown-freebsd13.0/bin/objcopy --strip-debug --add-gnu-debuglink=libgcc_s.so.1.debug libgcc_s.so.1.full libgcc_s.so.1 CMD @sh /usr/src/tools/install.sh -l s -o root -g wheel -m 444 libgcc_s.so.1 libgcc_s.so CWD /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/gnu/lib/libgcc TARGET libgcc_s.so.1 # Meta data file /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/gnu/lib/libgcc/_libinstall.meta CMD sh /usr/src/tools/install.sh -o root -g wheel -m 444 -S libgcc_s.so.1 /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/tmp/usr/lib32/ CMD sh /usr/src/tools/install.sh -o root -g wheel -m 444 libgcc_s.so.1.debug /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/tmp/usr/lib/de bug/usr/lib32/ CMD sh /usr/src/tools/install.sh -l rs -o root -g wheel -m 755 libgcc_s.so.1 /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/tmp/usr/lib32/lib gcc_s.so CWD /usr/obj/powerpc64vtsc_clang_altbinutils-oldunwind/powerpc.powerpc64/usr/src/powerpc.powerpc64/obj-lib32/gnu/lib/libgcc TARGET _libinstall
(In reply to Mark Millard from comment #7) And, finally what I should have looked for in the overall logs, using a aarch64 build this time (no lib32): # egrep '(libgcc_s\.so|libgcc_s.*_libinstall)' ~/sys_typescripts/typescript_make_cortexA72_nodebug_clang_bootstrap-amd64-host-2019-03-26:01:34:22 Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_s/libgcc_s.so.1.full --- libgcc_s.so.1.full --- building shared library libgcc_s.so.1 Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_s/libgcc_s.so.1.debug Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_s/libgcc_s.so.1 Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_s/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_s/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_s/_libinstall So 3 _libinstall 's for libgcc_s . Looking at other things in the build there are a lot of pairs of _libinstall 's and libgcc_eh has 3: FBSDFSSD# egrep '/_libinstall' ~/sys_typescripts/typescript_make_cortexA72_nodebug_clang_bootstrap-amd64-host-2019-03-26:01:34:22 | sort | more Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/cddl/lib/libavl/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/cddl/lib/libavl/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/cddl/lib/libctf/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/cddl/lib/libctf/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/cddl/lib/libdtrace/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/cddl/lib/libnvpair/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/cddl/lib/libnvpair/_libinstall . . . Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_eh/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_eh/_libinstall Building /usr/obj/cortexA72_clang/arm64.aarch64/usr/src/arm64.aarch64/lib/libgcc_eh/_libinstall . . .
On Tue, 30 Jul 2019 at 13:38, <jenkins-admin@freebsd.org> wrote: > > FreeBSD-head-powerpcspe-build - Build #12186 (r350451) - Failure > > Build information: https://ci.freebsd.org/job/FreeBSD-head-powerpcspe-build/12186/ > Full change log: https://ci.freebsd.org/job/FreeBSD-head-powerpcspe-build/12186/changes > Full build log: https://ci.freebsd.org/job/FreeBSD-head-powerpcspe-build/12186/console > 13:37:03 /usr/obj/usr/src/powerpc.powerpcspe/tmp/usr/bin/ld: cannot find -lgcc_s PR 233769 Possible build race: ld: error: unable to find library -lgcc_s https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233769
This is due to tools/install.sh not supporting -S. If that is fixed then this is easily fixed.
Just passing through -S in tools/install.sh is not enough. 1. Maybe this tool is expected to work on Linux where -S does something different. 2. The symlink/hardlink support call ln(1) which are is atomic for symlinks. It does unlink(2) followed by symlink(2). Reworking the build to avoid double installs here is possible but would be more hackish IMO. We would need to muck with the SUBDIR lists for gnu/lib to fix this specific case so that it only installs during the right phase of buildworld (like we do for llvm kinda). But that's JUST gnu/lib. The same problem could come up for other libraries eventually given enough install-to-worldtmp threads. META_MODE+filemon may eventually learn how to avoid the double installs but it has not yet. DIRDEPS_BUILD learned how. It's not hard to implement just risky and needs time devoted to testing. Of course this then requires filemon which isn't the solution some people want to hear. The double install/relink issue isn't causing a problem beyond this race which is why I think we just need to make the install atomic rather than wack-a-mole with SUBDIR hacks.
Oh and it's actually the symlink causing the problem here not the library itself.
So to be clear, the problem here is that we are installing libgcc_s.so.1 multiple times, and in doing that we introduce windows where the symlink is temporarily unlinked?
From what I can see install.sh is not being used when the problem occurs: 21:48:22 --- lib/libgcc_s__L --- 21:48:22 install -U -o root -g wheel -m 444 libgcc_s.so.1.debug /usr/obj/usr/src/powerpc.powerpc64/tmp/usr/lib/debug/lib/ 21:48:22 install -U -l rs -o root -g wheel -m 755 /usr/obj/usr/src/powerpc.powerpc64/tmp/lib/libgcc_s.so.1 /usr/obj/usr/src/powerpc.powerpc64/tmp/usr/lib/libgcc_s.so 21:48:22 --- lib/ofed/libibverbs__L --- 21:48:22 ld: error: unable to find library -lgcc_s and we are not specifying -S for creating the symlink. Why not?
A possible fix: https://reviews.freebsd.org/D26453 buildworld completes for me with this change. I don't think I've ever seen the build failure in question in my own builds, so I can't easily verify that this actually solves anything. Li-Wen, is it possible to try this patch in CI without committing? I'm not sure how many builds we'd need in order to verify whether the issue is gone.
(In reply to Mark Johnston from comment #15) It's a bit hard to inject a patch to every CI builds. The easier way would be put build world/kernel in a loop with this patch and see if the issue happens again. I guess maybe 50 successful clean build is sufficient.
(In reply to Li-Wen Hsu from comment #16) Ok, trying it.
(In reply to Mark Johnston from comment #15) I'm not sure I've ever seen it in local builds either, but I have encountered it more than once on cirrus-ci builds.
(In reply to Ed Maste from comment #18) I also don't remember I see this in my local builds. I suspect this is a race condition more easily happen on a slow or high loading machine.
It looks like r169717 https://reviews.freebsd.org/rS169717 is where this originated
(In reply to Ed Maste from comment #18) I have seen this on rare occasion. I've seen it on a ThreadRipper 1950X doing personal builds, including cross builds. I've seen it during lib32 build activity as well (for amd64). I normally use -j32 on the ThreadRipper for buildworld buildkernel. I appear to have sent list-notices in 2018-Apr and 2019-Jul. The later had a 2nd notice after figuring out that I'd seen the problem before, something I did not initially remember.
No luck attempting to reproduce the problem. I did however just get email about yet another instance of the bug: 13:16:31 --- lib/libgcc_s__L --- 13:16:31 install -U -l rs -o root -g wheel -m 755 /usr/obj/usr/src/mips.mips/tmp/lib/libgcc_s.so.1 /usr/obj/usr/src/mips.mips/tmp/usr/lib/libgcc_s.so 13:16:31 --- lib/libpam/libpam__L --- 13:16:31 ld: error: unable to find library -lgcc_s 13:16:31 cc: error: linker command failed with exit code 1 (use -v to see invocation) 13:16:31 *** [libpam.so.6.full] Error code 1 So again it looks like the error coincides with a parallel non-atomic install of the libgcc_s.so symlink.
A commit references this bug: Author: markj Date: Fri Sep 18 19:03:34 UTC 2020 New revision: 365889 URL: https://svnweb.freebsd.org/changeset/base/365889 Log: Install library symlinks atomically. As we do for shared library binaries, pass -S to install(1) when installing symlinks. Doing so helps avoid transient failures when libraries are being reinstalled, which seems to be the root cause of spurious libgcc_s.so link failures during CI builds. PR: 233769 Reviewed by: emaste MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26453 Changes: head/share/mk/bsd.lib.mk
I guess this is not quite sufficient. We can hit the same race during the lib32 build, which for some reason uses install.sh.
I've seen -lgcc_s error for years although mine looks due to a different root cause. I have / and /usr separate mount points. When I run buildworld under my own account, I get -lgcc_s error. On the other hand, when I run buildworld as root user, this doesn't happen. By the way, the link error happens during the tool-chain build, i.e. building tools under /usr/obj/usr/src/i386.i386/tmp location. After each installworld, I need to run the following: cd /usr/lib ls -l *.so | nawk '$NF ~ /..\/..\/lib/{cmd="ln -sf " substr($NF, 6) " " $(NF-2);system(cmd)}' If symbolic link between / and /usr, I need to adjust symbolic link somehow.
(In reply to ota from comment #25) Could you post a build log?
(In reply to Mark Johnston from comment #26) I recently installed 12.2-BETA for testing. My problem happens during boot-strapping. So, easiest way to reproduce is compile outside the /usr/src. % is command as my own user account. $ is the root user. Root user can compile without a problem while all others fails to link. % uname -a FreeBSD XXX 12.2-BETA2 FreeBSD 12.2-BETA2 #348 r365986M: Sat Sep 19 14:36:23 EDT 2020 hiro@XXX:/usr/obj/usr/src/i386.i386/sys/ZFS i386 % cd /tmp % cat main.c int main() { return 0; } % cc main.c ld: error: unable to find library -lgcc_s ld: error: unable to find library -lgcc_s cc: error: linker command failed with exit code 1 (use -v to see invocation) % c++ main.c c++: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated] ld: error: cannot open /usr/lib/libcxxrt.so: Permission denied ld: error: unable to find library -lgcc_s ld: error: unable to find library -lgcc_s c++: error: linker command failed with exit code 1 (use -v to see invocation) % ls -lsd /usr/lib/libcxxrt.so 0 lrwxr-xr-x 1 root wheel 23 Sep 21 23:27 /usr/lib/libcxxrt.so -> ../../lib/libcxxrt.so.1 % ls -lsd /lib/libcxxrt.so.1 90 -r--r--r-- 1 root wheel 90644 Sep 21 23:27 /lib/libcxxrt.so.1 % mount /dev/ada0s4a on / (ufs, local, soft-updates) devfs on /dev (devfs, local, multilabel) /dev/ada0s4d on /usr (ufs, local, noatime, soft-updates) /dev/ada0s4e on /usr/local (ufs, local, noatime, soft-updates) $ cc main.c $ ./a.out $ I had run the re-link program from comment #25. % ls -lsd /usr/lib/libcxxrt.so /lib/libcxxrt.so.1 90 -r--r--r-- 1 root wheel 90644 Sep 21 23:27 /lib/libcxxrt.so.1 0 lrwxr-xr-x 1 root wheel 23 Sep 21 23:27 /usr/lib/libcxxrt.so -> ../../lib/libcxxrt.so.1 % ls -lsd /usr/lib/libcxxrt.so /lib/libcxxrt.so.1 90 -r--r--r-- 1 root wheel 90644 Sep 21 23:27 /lib/libcxxrt.so.1 0 lrwxr-xr-x 1 root wheel 18 Sep 22 22:58 /usr/lib/libcxxrt.so -> /lib/libcxxrt.so.1 % cc main.c % ./a.out % I don't think this is the same problem as originally posted and wonder if we should create another PR.
(In reply to ota from comment #27) Yes, this looks like a separate issue that should be discussed in a separate PR. This PR is for inconsistent link errors that occur during a buildworld, and the bug you're describing is rather different.
A commit references this bug: Author: markj Date: Fri Sep 25 13:53:32 UTC 2020 New revision: 366157 URL: https://svnweb.freebsd.org/changeset/base/366157 Log: MFC r365889: Install library symlinks atomically. PR: 233769 Changes: _U stable/12/ stable/12/share/mk/bsd.lib.mk
I've been checking on CI periodically and haven't seen any spurious failures since the initial commit. Please re-open if I missed any.
Still happening: https://ci.freebsd.org/job/FreeBSD-head-mips64-build/14732/console https://ci.freebsd.org/job/FreeBSD-head-powerpc64-build/16971/console 00:27:26.087 ld: error: cannot open /usr/obj/usr/src/mips.mips64/obj-lib32/tmp/usr/lib32/libgcc_s.so: No such file or directory The message is different than the title of this ticket (ld: error: unable to find library -lgcc_s) but it also occurred before base r365889.
(In reply to Li-Wen Hsu from comment #31) See comment #24 about lib32 contexts. It might be considered a known, technically-different problem with similar symptoms.
(In reply to Mark Millard from comment #32) Indeed, this is a problem with install.sh being used during 32-bit compat builds. We were discussing this on IRC yesterday with Warner, who's testing a patch to fix it. I'll keep this PR open until that's done.
(In reply to Mark Johnston from comment #33) Committed in r366541. So let's try again - please re-open if CI builds continue to fail with the -lgcc_s error.