Summary: | lang/mono net-p2p/sonarr Sonarr crashes on startup with SIGSEGV since base r296727 on 10.3-STABLE | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Ports & Packages | Reporter: | Andrej Ebert <andrej> | ||||||||||
Component: | Individual Port(s) | Assignee: | freebsd-mono (Nobody) <mono> | ||||||||||
Status: | Closed FIXED | ||||||||||||
Severity: | Affects Many People | CC: | andrej, bdrewery, charl.lotter, dbn, feld, kib, pmichel, radovanovic, ultima, w.schwarzenfeld | ||||||||||
Priority: | --- | Keywords: | crash | ||||||||||
Version: | Latest | Flags: | bugzilla:
maintainer-feedback?
(mono) andrej: maintainer-feedback? (feld) |
||||||||||
Hardware: | Any | ||||||||||||
OS: | Any | ||||||||||||
Attachments: |
|
Description
Andrej Ebert
2016-07-25 16:19:10 UTC
Weird, that SmartOS bug is for their lx branded zones. We're running Sonarr/mono natively on FreeBSD. I haven't been able to reproduce this, and my box running Sonarr is a CURRENT machine with Sonarr in a 10.2-RELEASE jail. FreeBSD skeletor.feld.me 11.0-ALPHA6 FreeBSD 11.0-ALPHA6 #66 r302316: Sat Jul 2 10:05:43 CDT 2016 root@skeletor.feld.me:/usr/obj/usr/src/sys/GENERIC amd64 root@skeletor:/usr/home/feld # jexec sonarr sh # freebsd-version 10.2-RELEASE-p20 Yeah, it wouldn't manifest on 10.2, not even on 10.3 RELEASE, but try anything higher than 10-STABLE base r296649 and you'll see it :) Is it strictly the 10-STABLE train or should I also see it on CURRENT? (In reply to Mark Felder from comment #3) I don't have a CURRENT machine to test, but it's definitely (still) present on FreeBSD 199-SERVER 11.0-BETA2 FreeBSD 11.0-BETA2 #0 r303255: Sun Jul 24 11:45:56 CEST 2016 root@199-SERVER:/usr/obj/usr/src/sys/MASK amd64 so you should see it on your 11.0-ALPHA6 outside of the 10.2 jail, or inside a 10-STABLE (beginning with base r296727) or 11-STABLE jail. There was a regression in r294373, fixed by r302908 in HEAD. stable/11 commit was r303193, 2016-07-22. Could you try if it changes anything WRT the issue ? (In reply to Konstantin Belousov from comment #5) As I mentioned in my initial report, I'm still seeing this on 11-STABLE r303255, which is after r303193. I'll try the latest revision in the 11-STABLE tree, but I doubt anything'll change. (In reply to Andrej Ebert from comment #6) Can you provide me something self-contained (e.g. binary and all required libraries, no dependencies outside base system and content of the pack) which demonstrates the issue ? I.e. the thing should work before the revision, and fail after. (In reply to Konstantin Belousov from comment #7) Sorry, I really don't know how to do that. But i did try different 10-STABLE revisions in jails, and it's fine up to base r296649, and starts crashing with base r296727 on 10-STABLE and keeps crashing at least up to 11-STABLE base r303255. I have a slight temperature problem at the moment(need a bigger case) and can't rebuild world to go to a newer revision of 11-STABLE, but maybe someone's willing to confirm by building and trying to start sonarr in a jail or on a machine that's on a revision higher than base r296649. (In reply to Andrej Ebert from comment #8) Can you please try running this simple test case I am linking here (so we can determine if ProcessName getter is completely broken or it is some specific usage pattern that is causing Sonarr to crash). Link to test: https://github.com/radovanovic/monobsd/blob/freebsd/mono/tests/process_name.cs (In reply to Andrej Ebert from comment #8) These recompilations do not provide any value. I need an isolated test case, then I will fix the problem. (In reply to Ivan Radovanovic from comment #9) This is what I get on FreeBSD 199-SERVER 11.0-BETA4 FreeBSD 11.0-BETA4 #0 r303834 (I did get a newer world installed): root@199-SERVER ~# mcs process_name.cs root@199-SERVER ~# mono process_name.exe Stacktrace: at <unknown> <0xffffffff> at (wrapper managed-to-native) System.Diagnostics.Process.ProcessName_internal (intptr) <0x0005c> at System.Diagnostics.Process.get_ProcessName () <0x00082> at (wrapper remoting-invoke-with-check) System.Diagnostics.Process.get_ProcessName () <0x0006c> at Test.Main () <0x00036> at (wrapper runtime-invoke) <Module>.runtime_invoke_int (object,intptr,intptr,intptr) <0x000fb> ================================================================= Got a SIGSEGV while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= [1] 4637 abort (core dumped) mono process_name.exe This is with: #mono -V Mono JIT compiler version 4.4.1 (Nightly 4.4.1.0/4747417 Sun Jul 24 15:04:46 UTC 2016) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notification: kqueue Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen installed with the patch from bug #211004. I'll check with the newest mono version committed to the tree in a moment. The same with the mono version committed today to ports: root@199-SERVER ~# uname -a FreeBSD 199-SERVER 11.0-BETA4 FreeBSD 11.0-BETA4 #0 r303834: Mon Aug 8 20:42:14 CEST 2016 root@199-SERVER:/usr/obj/usr/src/sys/MASK amd64 root@199-SERVER ~# mcs process_name.cs root@199-SERVER ~# mono -V Mono JIT compiler version 4.4.2 (Stable 4.4.2.11/f72fe45 Mon Aug 8 20:01:57 UTC 2016) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notification: kqueue Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen root@199-SERVER ~# mono process_name.exe Stacktrace: at <unknown> <0xffffffff> at (wrapper managed-to-native) System.Diagnostics.Process.ProcessName_internal (intptr) <0x0005c> at System.Diagnostics.Process.get_ProcessName () <0x00082> at (wrapper remoting-invoke-with-check) System.Diagnostics.Process.get_ProcessName () <0x0006c> at Test.Main () <0x00036> at (wrapper runtime-invoke) <Module>.runtime_invoke_int (object,intptr,intptr,intptr) <0x000fb> ================================================================= Got a SIGSEGV while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= [1] 64394 abort (core dumped) mono process_name.exe Fantastic, thanks for providing a simple reproducible case! Can you attach that small bit of code to the ticket? Thanks! (In reply to Konstantin Belousov from comment #10) @Konstantin, I suppose you need some minimal test code in C? (I will try to create one for you) Created attachment 173444 [details] test case Test case from https://github.com/radovanovic/monobsd/blob/freebsd/mono/tests/process_name.cs For completeness, here's the testcase with base r296649, as you see, no crash. I'll attach the verbose output of mono separately. # uname -a FreeBSD 10mono296649-development 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE r296649 amd64 # mcs process_name.cs # mono process_name.exe # mono -V Mono JIT compiler version 4.4.2 (Stable 4.4.2.11/f72fe45 Tue Aug 9 09:34:42 UTC 2016) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notification: kqueue Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen (In reply to Ivan Radovanovic from comment #14) Test case in C would be ideal, but this is probably too much work. I am fine with some binary pack that reproduces the issue. The only requirement is that the pack should be self-contained. In other words, I have bare base system where I can, say, untar it into correct location and reproduce the issue. This would be enough. Created attachment 173445 [details]
mono -v output from crash
mono -v output from crash on
FreeBSD 11.0-BETA4 #0 r303834: Mon Aug 8 20:42:14 CEST 2016 root@199-SERVER:/usr/obj/usr/src/sys/MASK amd64
Created attachment 173446 [details]
mono -v output non crashing
Here is the mono -v output from the test case not crashing on
10.3-PRERELEASE FreeBSD 10.3-PRERELEASE r296649 amd64
(In reply to Andrej Ebert from comment #12) Can you please try to pull stack backtrace from this crash dump - for example: $ gdb `which mono` mono.core (gdb) bt <STACKTRACE WILL BE HERE> Problem is that mono's implementation of pulling process name is quite complicated, this would give me pointer where to look for problem (In reply to Konstantin Belousov from comment #17) Unfortunately without me manually creating test case in C the only self contained test case for you would be to install mono and to try to run that attached test case (compile with "mono process_name.cs" and run with "mono process_name.exe"). (In reply to Ivan Radovanovic from comment #20) Here, this is from a 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE r296727 amd64 jail, I hope that helps: # gdb `which mono-sgen` mono-sgen.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Core was generated by `mono-sgen'. Program terminated with signal 6, Aborted. Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x000000080107835a in thr_kill () from /lib/libc.so.7 [New Thread 801807800 (LWP 100641/<unknown>)] [New Thread 801806c00 (LWP 100637/<unknown>)] [New Thread 801806400 (LWP 100929/<unknown>)] (gdb) bt #0 0x000000080107835a in thr_kill () from /lib/libc.so.7 #1 0x0000000801078346 in raise () from /lib/libc.so.7 #2 0x00000008010782c9 in abort () from /lib/libc.so.7 #3 0x00000000004a6593 in mono_debugger_run_finally () #4 0x00000000004f6ce6 in mono_breakpoint_clean_code () #5 0x00000000004174c7 in mono_get_jit_info_from_method () #6 0x0000000800da1b37 in pthread_sigmask () from /lib/libthr.so.3 #7 0x0000000800da122c in pthread_getspecific () from /lib/libthr.so.3 #8 <signal handler called> #9 0x000000000062b9bf in mono_gchandle_free () #10 0x000000000062b704 in mono_gchandle_free () #11 0x00000000005884e4 in mono_opcode_value () #12 0x0000000000012e8d in ?? () #13 0x00000008018c9b70 in ?? () #14 0x0000000801868740 in ?? () #15 0x0000000801828a60 in ?? () #16 0x0000000801c00428 in ?? () #17 0x0000000801c00428 in ?? () #18 0x0000000801995f60 in ?? () #19 0x0000000000012c7a in ?? () #20 0x00007fffffffe3e0 in ?? () #21 0x00007fffffffe350 in ?? () #22 0x0000000000012a33 in ?? () #23 0x0000000801c00428 in ?? () #24 0x0000000801c00428 in ?? () #25 0x000000000000322f in ?? () #26 0x0000000801c00428 in ?? () #27 0x00000008018fa101 in ?? () #28 0x000000000001297b in ?? () #29 0x000000000001297b in ?? () #30 0x00000000000129b0 in ?? () #31 0x00007fffffffe450 in ?? () #32 0x000000000001296d in ?? () #33 0x0000000000000000 in ?? () Ok, I realized that the issue was in r257811. Please try the patch attached. Created attachment 173581 [details]
Fill phdr value for rtld itself when reporting it in dl_iterate_phdr.
A commit references this bug: Author: kib Date: Fri Aug 12 18:31:44 UTC 2016 New revision: 304012 URL: https://svnweb.freebsd.org/changeset/base/304012 Log: Fill phdr and phsize for rtld object. It is needed for dl_iterate_phdr() reporting the correct values. PR: 211367 Sponsored by: The FreeBSD Foundation MFC after: 1 week Changes: head/libexec/rtld-elf/rtld.c (In reply to Konstantin Belousov from comment #10) Many thanks, your patch fixed it for me on: FreeBSD 11.0-PRERELEASE #0 r304040: Sat Aug 13 13:27:44 CEST 2016 root@199-SERVER:/usr/obj/usr/src/sys/MASK amd64 *** Bug 212642 has been marked as a duplicate of this bug. *** I just swapped motherboards / CPUs from a single Xeon E3 to now 2x Xeon E5s. Nothing else needed configuring and everything works fine EXCEPT I now get a nearly identical error to the one reported here when running mono with not just NzbDrone.exe but any executable: ================================================================= Got a SIGSEGV while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= If I load the dumped core, I get: gdb mono-sgen mono-sgen.core GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Core was generated by `mono-sgen'. Program terminated with signal 6, Aborted. Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /usr/local/lib/libinotify.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/local/lib/libinotify.so.0 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x00000008012aa50a in thr_kill () from /lib/libc.so.7 [New Thread 801816000 (LWP 100910/<unknown>)] (gdb) where #0 0x00000008012aa50a in thr_kill () from /lib/libc.so.7 #1 0x00000008012aa4db in raise () from /lib/libc.so.7 #2 0x00000008012aa449 in abort () from /lib/libc.so.7 #3 0x00000000004a56e2 in mono_debugger_run_finally () #4 0x0000000000417e00 in mono_get_jit_info_from_method () #5 0x0000000800fca79d in pthread_sigmask () from /lib/libthr.so.3 #6 0x0000000800fc9d6f in pthread_getspecific () from /lib/libthr.so.3 #7 <signal handler called> #8 0x00000000006366b5 in mono_gchandle_free () #9 0x00000000005bd734 in mono_init () #10 0x0000000000418bec in mini_get_debug_options () #11 0x0000000000473ffc in mono_main () #12 0x00000000004156fa in _start () #13 0x000000000041516f in _start () #14 0x000000080098b000 in ?? () #15 0x0000000000000000 in ?? () So it looks very similar to the bug reported above. I'm running 11.0-RELEASE-p2 (GENERIC) and mono 4.6.2 (latest pkg) in a jail. I've tried running it on the host system, with the same error. I've also tried building from source (poudriere) and the build fails due to a similar SIGSEGV: if test -w /wrkdirs/usr/ports/lang/mono/work/mono-4.6.2/mcs; then :; else chmod -R +w /wrkdirs/usr/ports/lang/mono/work/mono-4.6.2/mcs; fi cd /wrkdirs/usr/ports/lang/mono/work/mono-4.6.2/mcs && gmake --no-print-directory -s NO_DIR_CHECK=1 PROFILES='binary_reference_assemblies net_4_x xbuild_12 xbuild_14 ' CC='cc' all-profiles mkdir -p -- build/deps gmake[7]: mcs: Command not found gmake[7]: *** [build/profiles/basic.make:93: build/deps/basic-profile-check.exe] Error 127 *** The compiler 'mcs' doesn't appear to be usable. *** Trying the 'monolite' directory. ================================================================= Got a SIGSEGV while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= gmake[9]: *** [build/profiles/basic.make:93: build/deps/basic-profile-check.exe] Abort trap (core dumped) Any reason this bug would re-appear on a multi-processor system? After some more playing around trying to run mono both on bare metal and a VM, I've narrowed it down to what seems to be a memory issue. I have 256G of ram in the server. If I restrict hw.physmem=140G, mono works fine. Anything above hw.physmem=150G or so seems to cause it to crash! Changing the cpu count or cpu type (in the VM) has no impact. Seems like something's off when there's too much memory available... Any thoughts? Any way you can test to see if this is reproducible in Linux as well? This should probably be covered in a separate PR as well. On Linux (Ubuntu Trusty or Zesty), mono-4.6.2.7 runs fine on a system with 256G of ram (just via apt-get). I also built it from source on that same machine and it built without issues. We have version 5.2.0.215. Is this still a problem? (In reply to w.schwarzenfeld from comment #31) Didn't have any issues until the moment version 5.2.0.215 landed on r460430. Iv had to hold back on updating it in production. Even installing it returns a stack trace: The following 1 package(s) will be affected (of 0 checked): Installed packages to be REINSTALLED: mono-5.2.0.215 Number of packages to be reinstalled: 1 Proceed with this action? [y/N]: y [1/1] Reinstalling mono-5.2.0.215... [1/1] Extracting mono-5.2.0.215: 100% mono_w32file_find_first: error creating find handle Stacktrace: at <unknown> <0xffffffff> at System.IO.__Error.WinIOError (int,string) [0x00011] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at System.IO.FileSystemEnumerableIterator`1<TSource_REF>.HandleError (int,string) [0x00006] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at System.IO.FileSystemEnumerableIterator`1<TSource_REF>.CommonInit () [0x00054] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at System.IO.FileSystemEnumerableIterator`1<TSource_REF>..ctor (string,string,string,System.IO.SearchOption,System.IO.SearchResultHandler`1<TSource_REF>,bool) [0x000d6] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at System.IO.FileSystemEnumerableFactory.CreateFileNameIterator (string,string,string,bool,bool,System.IO.SearchOption,bool) [0x00009] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at System.IO.Directory.InternalGetFileDirectoryNames (string,string,string,bool,bool,System.IO.SearchOption,bool) [0x00000] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at System.IO.Directory.InternalGetFiles (string,string,System.IO.SearchOption) [0x00000] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at System.IO.Directory.GetFiles (string,string) [0x0001c] in <c5bcd0ec45b240acb20cfcfa5eee2246>:0 at Mono.Security.X509.X509Store.BuildCertificatesCollection (string) [0x0001f] in <fb76ee468de246ca98b18301a125c185>:0 at Mono.Security.X509.X509Store.get_Certificates () [0x00008] in <fb76ee468de246ca98b18301a125c185>:0 at Mono.Tools.CertSync.ImportToStore (Mono.Security.X509.X509CertificateCollection,Mono.Security.X509.X509Store) [0x00000] in <0d57d21a7e454689b82f5a82dd0b9e48>:0 at Mono.Tools.CertSync.Process () [0x00087] in <0d57d21a7e454689b82f5a82dd0b9e48>:0 at Mono.Tools.CertSync.Main (string[]) [0x00022] in <0d57d21a7e454689b82f5a82dd0b9e48>:0 at (wrapper runtime-invoke) <Module>.runtime_invoke_int_object (object,intptr,intptr,intptr) [0x00054] in <0d57d21a7e454689b82f5a82dd0b9e48>:0 ================================================================= Got a SIGSEGV while executing native code. This usually indicates a fatal error in the mono runtime or one of the native libraries used by your application. ================================================================= Abort trap (core dumped) There are all sorts of errors in the build log as well I have no idea how this hasn't been reverted yet. This is my most recent build log on current and 11-stable. https://poudriere.ultimasbox.com/data/111amd64-default/2018-04-08_09h37m49s/logs/mono-5.2.0.215.log https://poudriere.ultimasbox.com/data/12amd64-default/2018-04-08_01h28m35s/logs/mono-5.2.0.215.log FYI, the patch [1] to update mono to 5.10 by feld@ fixes this issue. [1] https://reviews.freebsd.org/D15780 A commit references this bug: Author: feld Date: Sat Jun 16 15:56:44 UTC 2018 New revision: 472555 URL: https://svnweb.freebsd.org/changeset/ports/472555 Log: Update Mono to 5.10.1.47 This brings a more modern Mono release to the ports tree. After discussions with others in the Mono community I targeted the mono 5.10.1.47 release which is the latest release in the "Visual Studio" release channel. This is considered to be the most stable and widely tested, which makes it a good candidate for us. We may upgrade to 5.12 after additional testing or introduce another Mono package for users who require testing against a newer release; this has yet to be determined. - Build from official release tarballs - Now include BoringSSL per upstream guidelines [1] - Remove ACCEPTANCE_TESTS, not being updated by upstream - No long require glib; Mono includes their own replacement - USES=display:tests required for some tests - Remove broken for armv6, armv7: file now available [2] - Mark as LLD safe as mono changed how it handles TLS [3] Changelog: http://www.mono-project.com/docs/about-mono/releases/5.10.0/ PR: 222271 [1] PR: 221236 [2] PR: 218885 [3] PR: 211367 Approved by: dbn Differential Revision: https://reviews.freebsd.org/D15780 Changes: head/lang/mono/Makefile head/lang/mono/distinfo head/lang/mono/files/patch-configure.ac head/lang/mono/files/patch-eglib_src_gfile-posix.c _U head/lang/mono/files/patch-mcs_class_Mono.Security_Mono.Security.Cryptography_KeyPairPersistence.cs _U head/lang/mono/files/patch-mcs_class_Mono.Security_Mono.Security.X509_X509StoreManager.cs _U head/lang/mono/files/patch-mcs_tools_mono-configuration-crypto_lib_Mono.Configuration.Crypto_KeyContainerCollection.cs _U head/lang/mono/files/patch-mcs_tools_xbuild_data_12.0_Microsoft.CSharp.targets _U head/lang/mono/files/patch-mcs_tools_xbuild_data_14.0_Microsoft.CSharp.targets head/lang/mono/files/patch-mono_eglib_gfile-posix.c head/lang/mono/files/patch-mono_mini_Makefile.am.in head/lang/mono/files/patch-mono_mini_mini-posix.c head/lang/mono/files/patch-mono_mini_tramp-amd64.c head/lang/mono/files/patch-mono_profiler_ptestrunner.pl head/lang/mono/files/patch-mono_utils_mono-context.h head/lang/mono/files/patch-mono_utils_mono-proclib.c head/lang/mono/files/patch-mono_utils_mono-threads.c head/lang/mono/files/patch-scripts_mono-heapviz head/lang/mono/pkg-plist head/lang/mono-basic/Makefile head/lang/mono-basic/distinfo head/lang/mono-basic/files/patch-configure head/x11-toolkits/gtk-sharp30/Makefile head/x11-toolkits/gtk-sharp30/files/ head/x11-toolkits/gtk-sharp30/files/patch-gtk_gui-thread-check_profiler_gui-thread-check.c Fixed in feld's commit. Thank you for the report. Please report any issues encountered in the update to mono. |