(Sorry about the empty description, accidentally pressed enter earlier.) This is a clean install of 10.1-RELEASE on a new Dell Poweredge R730xd. After rebooting, this happens: root@bsd-18:~ # man ls Segmentation fault (core dumped) It turns out this is because groff and tbl dump core: (gdb) run Starting program: /usr/bin/groff (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x0000000800aae004 in std::__1::__time_get_c_storage<wchar_t>::__weeks () from /usr/lib/libc++.so.1 (gdb) bt #0 0x0000000800aae004 in std::__1::__time_get_c_storage<wchar_t>::__weeks () from /usr/lib/libc++.so.1 #1 0x0000000800a908de in std::__1::ios_base::Init::Init () from /usr/lib/libc++.so.1 #2 0x0000000800a90c89 in std::__1::ios_base::Init::~Init () from /usr/lib/libc++.so.1 #3 0x0000000800ae7f42 in operator delete[] () from /usr/lib/libc++.so.1 #4 0x0000000800a8d0c6 in _init () from /usr/lib/libc++.so.1 #5 0x00007fffffffe160 in ?? () #6 0x00000008006116bf in r_debug_state () from /libexec/ld-elf.so.1 #7 0x0000000800610d17 in __tls_get_addr () from /libexec/ld-elf.so.1 #8 0x000000080060f129 in .text () from /libexec/ld-elf.so.1 #9 0x0000000000000000 in ?? () (gdb) run Starting program: /usr/bin/tbl (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. 0x0000000800ab7004 in std::__1::__time_get_c_storage<wchar_t>::__weeks () from /usr/lib/libc++.so.1 (gdb) bt #0 0x0000000800ab7004 in std::__1::__time_get_c_storage<wchar_t>::__weeks () from /usr/lib/libc++.so.1 #1 0x0000000800a998de in std::__1::ios_base::Init::Init () from /usr/lib/libc++.so.1 #2 0x0000000800a99c89 in std::__1::ios_base::Init::~Init () from /usr/lib/libc++.so.1 #3 0x0000000800af0f42 in operator delete[] () from /usr/lib/libc++.so.1 #4 0x0000000800a960c6 in _init () from /usr/lib/libc++.so.1 #5 0x00007fffffffe160 in ?? () #6 0x000000080061a6bf in r_debug_state () from /libexec/ld-elf.so.1 #7 0x0000000800619d17 in __tls_get_addr () from /libexec/ld-elf.so.1 #8 0x0000000800618129 in .text () from /libexec/ld-elf.so.1 #9 0x0000000000000000 in ?? () The only other binary I've found that does this is dtrace, which crashed in a different place: (gdb) run Starting program: /usr/sbin/dtrace (no debugging symbols found)...[New LWP 100206] (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... Program received signal SIGSEGV, Segmentation fault. [Switching to LWP 100206] 0x0000000800a88350 in _dtrace_init () from /lib/libdtrace.so.2 (gdb) bt #0 0x0000000800a88350 in _dtrace_init () from /lib/libdtrace.so.2 #1 0x0000000800ab10a2 in dt_list_delete () from /lib/libdtrace.so.2 #2 0x0000000800a60e3e in _init () from /lib/libdtrace.so.2 #3 0x00007fffffffe150 in ?? () #4 0x000000080060b6bf in r_debug_state () from /libexec/ld-elf.so.1 #5 0x000000080060ad17 in __tls_get_addr () from /libexec/ld-elf.so.1 #6 0x0000000800609129 in .text () from /libexec/ld-elf.so.1 #7 0x0000000000000000 in ?? () Running freebsd-update results in a non-functional system. I assume something Went Very Wrong somewhere, but I have no clue what it might be. This behaviour is 100% repeatable.
Further investigation points to libc++ initialisation code. [root@bsd-18 ~]# cat test.c #include <stdio.h> int main() { printf("Hello world!\n"); } [root@bsd-18 ~]# clang -o test test.c [root@bsd-18 ~]# ./test Hello world! [root@bsd-18 ~]# clang -lc++ -o test test.c [root@bsd-18 ~]# ./test Segmentation fault (core dumped) (gdb) bt #0 0x0000000800879004 in std::__1::__time_get_c_storage<wchar_t>::__weeks () from /usr/lib/libc++.so.1 #1 0x000000080085b8de in std::__1::ios_base::Init::Init () from /usr/lib/libc++.so.1 #2 0x000000080085bc89 in std::__1::ios_base::Init::~Init () from /usr/lib/libc++.so.1 #3 0x00000008008b2f42 in operator delete[] () from /usr/lib/libc++.so.1 #4 0x00000008008580c6 in _init () from /usr/lib/libc++.so.1 #5 0x00007fffffffe150 in ?? () #6 0x00000008006046bf in r_debug_state () from /libexec/ld-elf.so.1 #7 0x0000000800603d17 in __tls_get_addr () from /libexec/ld-elf.so.1 #8 0x0000000800602129 in .text () from /libexec/ld-elf.so.1 #9 0x0000000000000000 in ?? ()
I found a working reference system and narrowed it down to a difference in /usr/lib/libc++so.1. When I copy the lib from the working system, everything is fine. [root@bsd-18 ~]# ls -l -r--r--r-- 1 root wheel 775544 Mar 10 18:45 libc++.so.1 -r--r--r-- 1 root wheel 775544 Mar 10 18:45 libc++.so.1.orig [root@bsd-18 ~]# md5 * MD5 (libc++.so.1) = e3a0faec125bbbc5032869fdbcff6e54 MD5 (libc++.so.1.orig) = d3cd3e49d79a9bd2ea46a7e180a603bf
Well. dd'ing the whole disk with zeroes and reinstalling fixed the problem. Either I'm suffing from a serious case of pebkac or this server is haunted.
Okay, I figured it out. The problem was using the default mfi driver instead of mrsas. mfi usually works but occasionally seems to go into a mode where it stops flushing data to disk. This of course cause all sorts of hilarious filesystem corruption. An additional complication was that this server (still) suffers from the "shutdown hangs after freebsd-update" described here: https://lists.freebsd.org/pipermail/freebsd-stable/2014-October/080599.html
(In reply to Walter Heukels from comment #5) > An additional complication was that this server (still) suffers from the > "shutdown hangs after freebsd-update" described here: > https://lists.freebsd.org/pipermail/freebsd-stable/2014-October/080599.html See: https://lists.freebsd.org/pipermail/freebsd-stable/2015-March/081959.html