Bug 204147 - net/rabbitmq: beam.smp exited on signal 11
Summary: net/rabbitmq: beam.smp exited on signal 11
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Jimmy Olgeni
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-30 14:29 UTC by elofu17
Modified: 2016-02-27 13:05 UTC (History)
1 user (show)

See Also:
bugzilla: maintainer-feedback? (olgeni)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description elofu17 2015-10-30 14:29:53 UTC
Hi.
This is a pretty generic bug report, since I don't know where the problem is located or what causes it.

On random FreeBSD 9.3-RELEASE boxes, both i386 and amd64, beam.smp crash with a segmentation fault.
Not always. but pretty often. If I run '/usr/local/etc/rc.d/rabbitmq restart' I get a crash 1 out of 20 times.

beam.smp sometimes crash when I run '/usr/local/etc/rc.d/rabbitmq stop' and more often on 'rabbitmq start'.

When starting rabbit, even though beam.smp crash, things seem to be functioning. I guess it tries again, and this time the process is successfully started.


Example (FreeBSD 9.3 i386 machine):
/var/log/rabbitmq> /usr/local/etc/rc.d/rabbitmq start
Starting rabbitmq.
Segmentation fault (core dumped)




/var/log/rabbitmq> ls -l
total 40724
-rw-------  1 root      rabbitmq  41635840 Oct 30 13:36 beam.smp.core
-rw-r--r--  1 rabbitmq  rabbitmq         0 Oct 30 13:28 rabbit@ch13-sasl.log
-rw-r--r--  1 rabbitmq  rabbitmq      3780 Oct 30 13:36 rabbit@ch13.log

/var/log/rabbitmq> tail -50 rabbit\@ch13.log
=INFO REPORT==== 30-Oct-2015::13:36:21 ===
Starting RabbitMQ 3.5.6 on Erlang 18.1.3
Copyright (C) 2007-2015 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
node           : rabbit@ch13
home dir       : /var/db/rabbitmq
config file(s) : /usr/local/etc/rabbitmq/rabbitmq.config
cookie hash    : hcmp31d6kNGIQ123456789==
log            : /var/log/rabbitmq/rabbit@ch13.log
sasl log       : /var/log/rabbitmq/rabbit@ch13-sasl.log
database dir   : /var/db/rabbitmq/mnesia/rabbit@ch13

=WARNING REPORT==== 30-Oct-2015::13:36:21 ===
You are using a 32-bit version of Erlang: you may run into memory address
space exhaustion or statistic counters overflow.

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
Memory limit set to 196MB of 984MB total.

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
Disk free limit set to 50MB

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
Limiting to approx 28763 file handles (25884 sockets)

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
FHC read buffering:  ON
FHC write buffering: ON

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
Priority queues enabled, real BQ is rabbit_variable_queue

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
started TCP Listener on [::]:5672

=INFO REPORT==== 30-Oct-2015::13:36:21 ===
started TCP Listener on 0.0.0.0:5672

=INFO REPORT==== 30-Oct-2015::13:36:22 ===
Server startup complete; 2 plugins started.
 * rabbitmq_shovel
 * amqp_client


/var/log/rabbitmq> gdb /usr/local/lib/erlang/erts-7.1/bin/beam.smp beam.smp.core 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols found)...
Core was generated by `beam.smp'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libncurses.so.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libncurses.so.8
Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.6
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/librt.so.1
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x081e019f in sys_sigrelease ()
[New Thread 28807900 (LWP 101340/aux)]
[New Thread 28807600 (LWP 101339/2_scheduler)]
[New Thread 28807300 (LWP 101338/1_scheduler)]
[New Thread 28807000 (LWP 101337/child_waiter)]
[New Thread 28806d00 (LWP 101336/async_10)]
[New Thread 28806a00 (LWP 101335/async_9)]
[New Thread 28806700 (LWP 101334/async_8)]
[New Thread 28806400 (LWP 101333/async_7)]
[New Thread 28806100 (LWP 101332/async_6)]
[New Thread 28805e00 (LWP 101331/async_5)]
[New Thread 28805b00 (LWP 101330/async_4)]
[New Thread 28805800 (LWP 101329/async_3)]
[New Thread 28805500 (LWP 101328/async_2)]
[New Thread 28805200 (LWP 101327/async_1)]
[New Thread 28804f00 (LWP 101326/sys_msg_dispatc)]
[New Thread 28804c00 (LWP 101325/sys_sig_dispatc)]
[New Thread 28804300 (LWP 101156/beam.smp)]
(gdb) backtrace full
#0  0x081e019f in sys_sigrelease ()
No symbol table info available.
#1  0x0819658d in erts_check_async_ready ()
No symbol table info available.
(gdb) 

The above is from a FreeBSD 9.3 i386.

I got the same segfault (on rabbitmq startup) on a 9.3 amd64 box. Running gdb on that core file:
ch214:~> gdb /usr/local/lib/erlang/erts-7.1/bin/beam.smp beam.smp.core
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
Core was generated by `beam.smp'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libutil.so.9...(no debugging symbols found)...done.
Loaded symbols for /lib/libutil.so.9
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libncurses.so.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libncurses.so.8
Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.6
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/librt.so.1
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x000000000059b0c1 in sys_sigrelease ()
[New Thread 801c0bc00 (LWP 100625/aux)]
[New Thread 801c0b800 (LWP 100624/2_scheduler)]
[New Thread 801c0b400 (LWP 100623/1_scheduler)]
[New Thread 801c0b000 (LWP 100622/child_waiter)]
[New Thread 801c0ac00 (LWP 100621/async_10)]
[New Thread 801c0a800 (LWP 100620/async_9)]
[New Thread 801c0a400 (LWP 100619/async_8)]
[New Thread 801c0a000 (LWP 100618/async_7)]
[New Thread 801c09c00 (LWP 100617/async_6)]
[New Thread 801c09800 (LWP 100616/async_5)]
[New Thread 801c09400 (LWP 100615/async_4)]
[New Thread 801c09000 (LWP 100614/async_3)]
[New Thread 801c08c00 (LWP 100613/async_2)]
[New Thread 801c08800 (LWP 100612/async_1)]
[New Thread 801c08400 (LWP 100611/sys_msg_dispatc)]
[New Thread 801c07c00 (LWP 100610/sys_sig_dispatc)]
[New Thread 801c07400 (LWP 100088/beam.smp)]
(gdb) backtrace full
#0  0x000000000059b0c1 in sys_sigrelease ()
No symbol table info available.
#1  0x0000000000551005 in erts_check_async_ready ()
No symbol table info available.
#2  0x000000000060d1ea in ethr_thr_exit ()
No symbol table info available.
#3  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
No symbol table info available.
#4  0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) info registers
rax            0x0      0
rbx            0x801e000b0      34391195824
rcx            0x1637   5687
rdx            0x1635   5685
rsi            0x801e000f0      34391195888
rdi            0x803c00110      34422653200
rbp            0x8021191c0      0x8021191c0
rsp            0x7fffff6c0ea0   0x7fffff6c0ea0
r8             0x0      0
r9             0x0      0
r10            0x0      0
r11            0x246    582
r12            0x803c00110      34422653200
r13            0x0      0
r14            0x7fffff6c0f20   140737478659872
r15            0x8008c06c0      34368915136
rip            0x59b0c1 0x59b0c1 <sys_sigrelease+529>
eflags         0x10206  66054
cs             0x43     67
ss             0x3b     59
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) x/16i $pc
0x59b0c1 <sys_sigrelease+529>:  mov    (%rax),%rdi
0x59b0c4 <sys_sigrelease+532>:  callq  0x4c36e0 <driver_pdl_lock>
0x59b0c9 <sys_sigrelease+537>:  mov    (%r12),%rdi
0x59b0cd <sys_sigrelease+541>:  lea    0x14(%rsp),%rsi
0x59b0d2 <sys_sigrelease+546>:  callq  0x4c6b20 <driver_peekq>
0x59b0d7 <sys_sigrelease+551>:  cmpl   $0x400,0x14(%rsp)
0x59b0df <sys_sigrelease+559>:  mov    $0x400,%edx
0x59b0e4 <sys_sigrelease+564>:  mov    %rax,%rbx
0x59b0e7 <sys_sigrelease+567>:  cmovle 0x14(%rsp),%edx
0x59b0ec <sys_sigrelease+572>:  mov    0x33e3c5(%rip),%rsi        # 0x8d94b8 <erts_allctrs+88>
0x59b0f3 <sys_sigrelease+579>:  mov    $0x19,%edi
0x59b0f8 <sys_sigrelease+584>:  mov    %edx,0x14(%rsp)
0x59b0fc <sys_sigrelease+588>:  movslq %edx,%rdx
0x59b0ff <sys_sigrelease+591>:  shl    $0x4,%rdx
0x59b103 <sys_sigrelease+595>:  callq  *0x33e397(%rip)        # 0x8d94a0 <erts_allctrs+64>
0x59b109 <sys_sigrelease+601>:  test   %rax,%rax
(gdb) thread apply all backtrace

Thread 17 (Thread 801c07400 (LWP 100088/beam.smp)):
#0  0x00000008018530dc in select () from /lib/libc.so.7
#1  0x000000080132c204 in select () from /lib/libthr.so.3
#2  0x0000000000599e00 in erts_sys_main_thread ()
#3  0x0000000000485bb6 in erl_start ()
#4  0x000000000043ad59 in main ()

Thread 16 (Thread 801c07c00 (LWP 100610/sys_sig_dispatc)):
#0  0x000000080185315a in read () from /lib/libc.so.7
#1  0x000000080132c400 in read () from /lib/libthr.so.3
#2  0x000000000059b718 in sys_sigrelease ()
#3  0x000000000060d1ea in ethr_thr_exit ()
#4  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#5  0x0000000000000000 in ?? ()

Thread 15 (Thread 801c08400 (LWP 100611/sys_msg_dispatc)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060b209 in ethr_cond_wait ()
#4  0x00000000004a3c8c in erts_foreach_sys_msg_in_q ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 14 (Thread 801c08800 (LWP 100612/async_1)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 13 (Thread 801c08c00 (LWP 100613/async_2)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 12 (Thread 801c09000 (LWP 100614/async_3)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 11 (Thread 801c09400 (LWP 100615/async_4)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 10 (Thread 801c09800 (LWP 100616/async_5)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 9 (Thread 801c09c00 (LWP 100617/async_6)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
---Type <return> to continue, or q <return> to quit---
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 8 (Thread 801c0a000 (LWP 100618/async_7)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 7 (Thread 801c0a400 (LWP 100619/async_8)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 6 (Thread 801c0a800 (LWP 100620/async_9)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x0000000000550f94 in erts_check_async_ready ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 5 (Thread 801c0ac00 (LWP 100621/async_10)):
#0  0x000000000059b0c1 in sys_sigrelease ()
#1  0x0000000000551005 in erts_check_async_ready ()
#2  0x000000000060d1ea in ethr_thr_exit ()
#3  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#4  0x0000000000000000 in ?? ()

Thread 4 (Thread 801c0b000 (LWP 100622/child_waiter)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060b209 in ethr_cond_wait ()
#4  0x000000000059b44b in sys_sigrelease ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()

Thread 3 (Thread 801c0b400 (LWP 100623/1_scheduler)):
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060da57 in ethr_thr_create ()
#4  0x00000000004e1661 in erts_empty_runq ()
#5  0x00000000004ed037 in schedule ()
#6  0x000000000043dd6b in process_main () 
#7  0x00000000004e92f0 in erts_get_total_context_switches ()
#8  0x000000000060d1ea in ethr_thr_exit ()
#9  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()

Thread 2 (Thread 801c0b800 (LWP 100624/2_scheduler)):
#0  0x00000008017ecfac in poll () from /lib/libc.so.7
#1  0x000000080132c51e in poll () from /lib/libthr.so.3
#2  0x00000000005a62fa in erts_poll_wait_nkp ()
#3  0x00000000005a98a9 in erts_check_io_nkp ()
#4  0x00000000004e2516 in erts_empty_runq ()
#5  0x00000000004ed037 in schedule ()
#6  0x000000000043dd6b in process_main ()
#7  0x00000000004e92f0 in erts_get_total_context_switches ()
#8  0x000000000060d1ea in ethr_thr_exit ()
#9  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()

Thread 1 (Thread 801c0bc00 (LWP 100625/aux)):
---Type <return> to continue, or q <return> to quit---
#0  0x000000080133159c in pthread_kill () from /lib/libthr.so.3
#1  0x000000080132b845 in pthread_getschedparam () from /lib/libthr.so.3
#2  0x00000008013339ad in pthread_cond_signal () from /lib/libthr.so.3
#3  0x000000000060d9ee in ethr_thr_create ()
#4  0x00000000004e2b20 in erts_empty_runq ()
#5  0x000000000060d1ea in ethr_thr_exit ()
#6  0x0000000801329dc4 in pthread_getprio () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()
#0  0x000000000059b0c1 in sys_sigrelease ()
(gdb)

ch214:~> ps faxuwwwd | egrep "rabbit|erlang"
root     55553   0.0  0.0  12088      0 ??  IWs  -          0:00.00 |-- daemon: /usr/local/sbin/rabbitmq-server[55554] (daemon)
rabbitmq 55554   0.0  0.0  14540      0 ??  IW   -          0:00.00 | `-- /bin/sh -e /usr/local/sbin/rabbitmq-server
rabbitmq 55628   0.0  3.6 122572  36676 ??  S     2:07PM    0:07.18 |   `-- /usr/local/lib/erlang/erts-7.1/bin/beam.smp -W w -A 64 -P 1048576 -B i -- -root /usr/local/lib/erlang -progname erl -- -home /var/db/rabbitmq -- -pa /usr/local/lib/erlang/lib/rabbitmq_server-3.5.6/sbin/../ebin -noshell -noinput -s rabbit boot -sname rabbit@ch214 -boot start_sasl -config /usr/local/etc/rabbitmq/rabbitmq -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit@ch214.log"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@ch214-sasl.log"} -rabbit enabled_plugins_file "/usr/local/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/local/lib/erlang/lib/rabbitmq_server-3.5.6/sbin/../plugins" -rabbit plugins_expand_dir "/var/db/rabbitmq/mnesia/rabbit@ch214-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/db/rabbitmq/mnesia/rabbit@ch214" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672 inet_dist_listen_max 25672
rabbitmq 55731   0.0  0.2  14268   1684 ??  Ss    2:07PM    0:00.01 |     `-- inet_gethost 4
rabbitmq 55732   0.0  0.2  14268   1768 ??  S     2:07PM    0:00.01 |       `-- inet_gethost 4
rabbitmq 55618   0.0  0.2  14264   1732 ??  S     2:07PM    0:00.03 |-- /usr/local/lib/erlang/erts-7.1/bin/epmd -daemon


Since it terminates with a segfault, no interesting logs are written to /var/log/rabbitmq/ nor /var/log/ in general, just the syslog message "kernel: pid 31445 (beam.smp), uid 0: exited on signal 11 (core dumped)".

I've tried 'pkg install -f rabbitmq erlang', but the problem persists.



This problem started after I updated my server park.
Unfortunetly there were several changes to this update... :

* The FreeBSD base system (updated with 'freebsd-update') was updated. New libs, new ntp, etc.
* I build my own ports. After updating the builder machine and the ports tree, poudriere rebuilt *all* my ports. I suspect there was a change in some shared library.
* Apart from this rebuild, erlang was upgraded on the system from 18.0.2,3 to 18.1.3,3 and rabbitmq from 3.5.4 to 3.5.6.

...so I can't pinpoint the cause of the problem. :-(

I don't even know if I should send this report to the rabbitmq people or the erlang people.
I'll start with rabbitmq, since that is the highest level.


Let me know if I should perform some debugging. It is easy to reproduce the signal 11 segfault.


PS: So far, beam.smp has not crashed on any of my FreeBSD 10.1 (amd64) boxes.

/Elof
Comment 1 elofu17 2015-12-04 13:41:54 UTC
This is still an issue.

I just ran /usr/local/etc/rc.d/rabbitmq restart on three 9.3-boxes. Two of them logged:
2015-12-04 14:37:30 +01:00 ch-12 kernel: pid 97154 (beam.smp), uid 0: exited on signal 11 (core dumped)
2015-12-04 14:37:38 +01:00 ch-11 kernel: pid 45105 (beam.smp), uid 0: exited on signal 11 (core dumped)


Please let me know if I should run any debug commands or if you need additional information.
Comment 2 Jimmy Olgeni freebsd_committer freebsd_triage 2015-12-18 14:38:00 UTC
Core dumps are definitely something bad on the Erlang VM side - RabbitMQ is only the trigger.

I just committed Erlang 18.2.1 and RabbitMQ 3.5.7. Could you check what happens with them in your setup?

Also, are your HIPE and SMP options enabled or disabled?
Comment 3 Jimmy Olgeni freebsd_committer freebsd_triage 2015-12-30 11:08:09 UTC
I got to reproduce it and found something... it's in the status check, so it's ugly but relatively benign. Looking for a fix.
Comment 4 commit-hook freebsd_committer freebsd_triage 2015-12-30 22:45:24 UTC
A commit references this bug:

Author: olgeni
Date: Wed Dec 30 22:44:37 UTC 2015
New revision: 404880
URL: https://svnweb.freebsd.org/changeset/ports/404880

Log:
  Avoid calling "rabbitmqctl status" in a loop to make sure that RabbitMQ is
  started.

  "rabbitmqctl wait" alone should suffice, and the loop seems to cause some
  kind of race condition that causes a segfault in the Erlang VM.

  RabbitMQ would start anyway, but users would get a segmentation fault
  message on the console.

  We also wait on daemon(8)'s pid to make sure that restarts are synchronized
  (i.e. daemon(8) is stopped before starting it again with the same pidfile).

  PR:		204147
  Submitted by:	elofu17@hotmail.com

Changes:
  head/net/rabbitmq/Makefile
  head/net/rabbitmq/files/rabbitmq.in
Comment 5 Jimmy Olgeni freebsd_committer freebsd_triage 2015-12-30 22:49:16 UTC
Just committed a fix. It happened on 10.2 amd64 too, but with the new startup script I could no longer reproduce the problem.
Comment 6 elofu17 2016-01-25 15:32:45 UTC
Thanks.

It's been a few weeks now without any segmentation faults, so the fix fixed it. :-)

/Elof
Comment 7 Jimmy Olgeni freebsd_committer freebsd_triage 2016-01-25 15:33:47 UTC
Good news :) thank you!
Comment 8 Alexey Lebedeff 2016-02-15 15:33:14 UTC
Will be properly fixed by rabbitmq 3.6.1 - https://github.com/rabbitmq/rabbitmq-common/pull/55
Comment 9 commit-hook freebsd_committer freebsd_triage 2016-02-16 22:20:33 UTC
A commit references this bug:

Author: olgeni
Date: Tue Feb 16 22:20:17 UTC 2016
New revision: 409020
URL: https://svnweb.freebsd.org/changeset/ports/409020

Log:
  Remove custom stderr formatting from net/rabbitmq.

  From upstream commit fecd0e5 in rabbitmq/rabbitmq-common:

      Opening several ports for single fd is considered undefined behaviour
      in erlang. It's safe to replace this whole function with 'io:format'.

      Because writing to standard_error with io:format is synchronous - after
      this call has returned data was definitely sent to the port. And
      `erlang:halt` guarantees that this data will be flushed afterwards.

  See also ba531a1 in erlang/otp:

      Instead of outputting a formatted message showing errors found, a core
      was (often) created.

  This commit should fix all issues related to core dumps with RabbitMQ on
  Erlang 18, which were most often observed when creating or joining
  clusters.

  MFH requested because a beam core dump would be most certainly interpreted
  as the symptom of something worse within the Erlang VM.

  PR:		204147
  Submitted by:	Alexey Lebedeff (follow up)
  MFH:		2016Q1

Changes:
  head/net/rabbitmq/Makefile
  head/net/rabbitmq/files/patch-src_rabbit__misc.erl
Comment 10 Jimmy Olgeni freebsd_committer freebsd_triage 2016-02-16 22:21:06 UTC
(In reply to Alexey Lebedeff from comment #8)

Thanks!
Comment 11 commit-hook freebsd_committer freebsd_triage 2016-02-27 13:05:58 UTC
A commit references this bug:

Author: olgeni
Date: Sat Feb 27 13:05:31 UTC 2016
New revision: 409663
URL: https://svnweb.freebsd.org/changeset/ports/409663

Log:
  MFH: r409020

  Remove custom stderr formatting from net/rabbitmq.

  From upstream commit fecd0e5 in rabbitmq/rabbitmq-common:

      Opening several ports for single fd is considered undefined behaviour
      in erlang. It's safe to replace this whole function with 'io:format'.

      Because writing to standard_error with io:format is synchronous - after
      this call has returned data was definitely sent to the port. And
      `erlang:halt` guarantees that this data will be flushed afterwards.

  See also ba531a1 in erlang/otp:

      Instead of outputting a formatted message showing errors found, a core
      was (often) created.

  This commit should fix all issues related to core dumps with RabbitMQ on
  Erlang 18, which were most often observed when creating or joining
  clusters.

  MFH requested because a beam core dump would be most certainly interpreted
  as the symptom of something worse within the Erlang VM.

  PR:		204147
  Submitted by:	Alexey Lebedeff (follow up)
  Approved by:	ports-secteam (miwi)

Changes:
_U  branches/2016Q1/
  branches/2016Q1/net/rabbitmq/Makefile
  branches/2016Q1/net/rabbitmq/files/patch-src_rabbit__misc.erl