Hi, I have a problem with ejabberd startup. After jail restart it won't start, I get those in ejabberd.log: 2021-06-06 21:17:00.704 [error] <0.171.0> External eimp process (pid=4614) has terminated unexpectedly, restarting in a few seconds 2021-06-06 21:17:00.704 [error] <0.169.0> External eimp process (pid=4613) has terminated unexpectedly, restarting in a few seconds 2021-06-06 21:17:00.704 [error] <0.195.0> External eimp process (pid=4626) has terminated unexpectedly, restarting in a few seconds It also creates erl_crash.dump file. If I restart ejabberd manually with service ejabberd restart, it works fine. Host: 13.0-RELEASE-p2 Jail: 13.0-RELEASE-p2 Ejabberd: ejabberd-21.04 I noticed this since moving my ejabberd service from 12.2 to 13.0 Host/Jail. Jail was not upgraded, but re-implemented from scratch, on another hardware. Host as well. Any idea on how to fix this? I can provide erl_crash.dump file if needed. Thank you in advance,
I don't use it in jail, although I'll check and get back to you by this weekend.
Created attachment 226171 [details] erlang crash dump Any news? I'm still experiencing the same problem - on system (jail) start, i get the attached dump. `service ejabberd restart` brings up ejabberd.
(In reply to Marko Cupać from comment #2) Sorry, I was stuck at the first step, aka building a jail with ejabberd setup. Could you share instructions to provision such a jail ? I could not get jail to work with my setup (one public IPv4 address, and a /64 IPv6 network block on the host). Thanks!
(In reply to Ashish SHUKLA from comment #3) Hi, I use standard "fat" jail. I have some articles in which I documented my jail setup (sorry for self-promotion), hopefully they will help you. First I prepare my jail host to be "jail friendly": https://www.mimar.rs/blog/host-your-own-services-with-freebsd-jails-part-1-prepare-jail-host Next, I install my jail with bsdinstall jail: https://www.mimar.rs/blog/host-your-own-services-with-freebsd-jails-part-2-initial-jail-setup After that, I use jail as it was standard host - ssh to it, install and configure packages, start services etc. If you need any further help just ask. Thank you for looking into this!
Thanks for sharing your articles. They are very helpful. I guess I was missing "fat jail" after which ejabberd started fine. Unfortunately, I'm not able to reproduce your issue here with my mostly pristine ejabberd configuration after restarting my jail couple of times. In all the restarts eimp processes came UP just fine, and stayed online for minutes. Could you share your ejabberd configuration (masking any identifying information), output of "pkg info -a" from within the jail, output of "pkg info ejabberd", and a recent crash dump file ? Thanks!
Created attachment 226229 [details] requested files and command outputs Hi, I have attached requested files and command outputs in shar archive. I can give you public key to my poudriere repo if needed for testing purposes. I have another ejabberd server which doesn't have the problem (12.2 jail, ejabberd 21.01), as opposed to the problematic one (13.0 jail, ejabberd 21.04). Thanks once again for looking into this.
Hi, I looked at provided information but didn't find anything that'll help us get bottom to this. I was using similar configuration except for the LDAP stuff which your configuration has. Do you know if the eimp dumps core ? If yes, can you install ejabberd (built with debug symbols, i.e. make -C /usr/ports/net-im/ejabberd -DWITH_DEBUG install), and then provide the backtrace here ? % lldb -c /path/to/core/file /usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp (lldb) bt all Thanks!
(In reply to Ashish SHUKLA from comment #7) Hi, I searched for all *core* and *dump* files in a jail, but couldn't find anything related to eimp, I don't think it dumps core. I recreated identical jail on 12.2-RELEASE and it doesn't have described problem. Interestingly, I have somewhat similar problem with amavisd-new startup as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256974 Is there a chance to bring some jail people into discussion? At this point I think problem could be related to some changes in jail implementation between 12.2 and 13.0.
(In reply to Marko Cupać from comment #8) 1. Replace `ejabberd_start' function in `/usr/local/etc/rc.d/ejabberd' with following: === ejabberd_start() { echo "Starting $name." #su $EJABBERDUSER -c "env ERL_EPMD_ADDRESS=\"${ejabberd_epmd_address}\" $EJABBERDCTL --node $ejabberd_node start" su $EJABBERDUSER -c "env ERL_EPMD_ADDRESS=\"${ejabberd_epmd_address}\" truss -s 200 -f $EJABBERDCTL --node $ejabberd_node start 2>/tmp/ejabberd.for-bug-256453.err" } === 2. Make sure that sysctl knob "security.bsd.unprivileged_proc_debug" is set to 1: # sysctl security.bsd.unprivileged_proc_debug=1 3. Restart the jail. Please note that jail command will not exit immediately, courtesy: truss. 4. In the meantime, in another terminal session you can keep an eye on /var/log/ejabberd.log for the messages "External eimp process (pid=XXXXX) has terminated unexpectedly, restarting in a few seconds". Once this appeared for couple of times, you can stop the `ejabberd' from within the jail, and kill the `epmd' process. After you kill the `epmd' process, `truss' will exit, and so will the jail command. 5. You can revert the changes made in step 1, and step 2. After this you can compress, and send over file "/tmp/ejabberd.for-bug-256453.err", or figure out yourself from the file, why "eimp" keeps dying (and share the findings). Please note that the file may contain stuff related to your setup, so you may want to mask the secret/identifying information before sending it over. Good luck! HTH
Created attachment 226421 [details] debug Hi, I have obtained and attached the file as instructed.
(In reply to Marko Cupać from comment #10) Looking at the provided log file, I don't see any sign of eimp being launched: λ fgrep execve ejabberd.for-bug-256453.err 30540: execve("/usr/bin/id",0x800a36810,0x800a36890) EJUSTRETURN 30554: execve("/usr/local/bin/erl",0x800a36518,0x800a366d8) EJUSTRETURN 30557: execve("/usr/bin/sed",0x800a36188,0x800a36228) EJUSTRETURN 30554: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erlexec",0x800a360c8,0x800a36278) EJUSTRETURN 30554: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/beam.smp",0x800a09000,0x800a2d000) EJUSTRETURN 30568: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erl_child_setup",0x7fffffffdda0,0x7fffffffea48) EJUSTRETURN 31867: execve("/bin/date",0x800a36138,0x800a36208) EJUSTRETURN 31871: execve("/usr/bin/sed",0x800a362c0,0x800a36498) EJUSTRETURN 31873: execve("/usr/bin/sed",0x800a362d8,0x800a36438) EJUSTRETURN 31874: execve("/usr/local/bin/epmd",0x800a7d410,0x800a7d4a0) EJUSTRETURN 31875: execve("/usr/bin/grep",0x800a7d410,0x800a7d4a0) EJUSTRETURN 31876: execve("/usr/local/bin/erl",0x800aa4810,0x800abe908) EJUSTRETURN 31879: execve("/usr/bin/sed",0x800a3a188,0x800a3a228) EJUSTRETURN 31876: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erlexec",0x800a69018,0x800a68608) EJUSTRETURN 31880: execve("/bin/sh",0x7fffffffadc0,0x800a30000) EJUSTRETURN 31880: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/epmd",0x800a480e0,0x800a48208) EJUSTRETURN 31884: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/beam.smp",0x800a90800,0x800a30000) EJUSTRETURN 31885: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erl_child_setup",0x7fffffffa840,0x7fffffffc958) EJUSTRETURN And it seems like the process crashed: 30532: read(3,"416 ===\nFile operation error: eacces. Target: ./global.beam. Function: get_file. Process: code_server.\n=ERROR REPORT==== 13-Jul-",128) = 128 (0x80) 30554: writev(1,[{"=ERROR REPORT==== 13-Jul-2021::12:37:48.859880 ===\nFile operation error: eacces. Target: .. Function: read_file_info. Process: code_server.\n",140},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.524325 ===\nFile operation error: eacces. Target: ./maps.beam. Function: get_file. Process: code_server.\n",144},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.757860 ===\nFile operation error: eacces. Target: ./string.beam. Function: get_file. Process: code_server.\n",146},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.784508 ===\nFile operation error: eacces. Target: ./unicode_util.beam. Function: get_file. Process: code_server.\n",152},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.727759 ===\nFile operation error: eacces. Target: ./rpc.beam. Function: get_file. Process: code_server.\n",143},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.878416 ===\nFile operation error: eacces. Target: ./global.beam. Function: get_file. Process: code_server.\n",146},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.845088 ===\nFile operation error: eacces. Target: ./io_lib.beam. Function: get_file. Process: code_server.\n",146},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.864122 ===\nFile operation error: eacces. Target: ./io_lib_format.beam. Function: get_file. Process: code_server.\n",153},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.974589 ===\nFile operation error: eacces. Target: ./inet_db.beam. Function: get_file. Process: code_server.\n",147},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.884210 ===\nFile operation error: eacces. Target: ./epp.beam. Function: get_file. Process: code_server.\n",143},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.109728 ===\nFile operation error: eacces. Target: ./inet_config.beam. Function: get_file. Process: code_server.\n",151},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.910855 ===\nFile operation error: eacces. Target: ./io.beam. Function: get_file. Process: code_server.\n",142},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.217600 ===\nFile operation error: eacces. Target: ./inet_udp.beam. Function: get_file. Process: code_server.\n",148},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.352064 ===\nFile operation error: eacces. Target: ./inet.beam. Function: get_file. Process: code_server.\n",144},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.502473 ===\nFile operation error: eacces. Target: ./inet_parse.beam.Function: get_file. Process: code_server.\n",150},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.589117 ===\nFile operation error: eacces. Target: ./erl_distribution.beam. Function: get_file. Process: code_server.\n",156}],16) = 2351 (0x92f) .... .... .... 31884: write(3,"=erl_crash_dump:0.5\nTue Jul 13 12:38:00 2021\nSlogan: could not start kernel pid (application_controller) ({bad_environment_value,"#{label=>{erl_prim_loader,file_error},report=>\\"File"})\nSystem ver"...,65536) = 65536 (0x10000) The cause of crash seems to be the some kind of permission errors, but that seems like a red herring to me. Are you using some kind of resource limits for your jail, or any special sysctl knobs ? Can you share your jail's rc.conf ? Maybe that can shed some light. FTR, following is the one from my jailed environment: λ fgrep execve ejabberd.err 56079: execve("/usr/bin/id",0x800a36810,0x800a36890) EJUSTRETURN 62756: execve("/bin/date",0x800a36138,0x800a36208) EJUSTRETURN 64426: execve("/usr/bin/sed",0x800a362c0,0x800a36498) EJUSTRETURN 70458: execve("/usr/bin/sed",0x800a362d8,0x800a36438) EJUSTRETURN 72781: execve("/usr/local/bin/epmd",0x800a49410,0x800a494a0) EJUSTRETURN 79487: execve("/usr/bin/grep",0x800a49410,0x800a494a0) EJUSTRETURN 83659: execve("/usr/local/bin/erl",0x800a498c8,0x800a49a18) EJUSTRETURN 89909: execve("/usr/bin/sed",0x800a3a188,0x800a3a228) EJUSTRETURN 83659: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erlexec",0x800a3a238,0x800a3a408) EJUSTRETURN 93156: execve("/bin/sh",0x7fffffffe060,0x800a29000) EJUSTRETURN 93156: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/epmd",0x800a480e0,0x800a48208) EJUSTRETURN 7607: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/beam.smp",0x800a30000,0x800a29000) EJUSTRETURN 11440: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erl_child_setup",0x7fffffffdae0,0x7fffffffe800) EJUSTRETURN 50748: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 53496: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 55343: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 55774: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 62227: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 66142: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 69352: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 73650: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN 78691: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN 78691: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/inet_gethost",0x800a480b8,0x800a48208) EJUSTRETURN 95310: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN 95310: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN 98622: execve("/sbin/sysctl",0x800a43210,0x800a432a8) EJUSTRETURN 35: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN 35: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN 4407: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN 11970: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN 11970: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN 13251: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN 97556: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN 97556: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN 35: execve("/sbin/sysctl",0x800a43210,0x800a432a8) EJUSTRETURN 6123: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN 6123: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN 12506: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN 19993: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN 19993: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN 25076: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN Thanks!
Hi, both sysctl.conf and loader.conf on jail host are empty (no sysctl modification). No resource limits or special knobs. ---jail's-rc.conf--- clear_tmp_enable="YES" cron_flags="$cron_flags -J 15" syslogd_flags="-ss" sendmail_enable="NO" sendmail_submit_enable="NO" sendmail_outbound_enable="NO" sendmail_msp_queue_enable="NO" sshd_enable="YES" ejabberd_enable="YES" synapse_enable="YES" turnserver_enable="YES" ---jail's-rc.conf--- ---host's-jail.conf--- path = "/usr/jail/${host.hostname}"; exec.prestart = "/sbin/mount_nullfs -o ro /usr/local/etc/letsencrypt \ /usr/jail/${host.hostname}/usr/local/etc/letsencrypt"; exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown jail"; exec.poststop = "/sbin/umount /usr/jail/${host.hostname}/usr/local/etc/letsencrypt/"; exec.clean; mount.devfs; im1_example_org { # MASKED host.hostname = im1.example.org; # MASKED host.domainname = example.org; # MASKED ip4.addr = 'lo2|127.0.2.113/32'; # MASKED ip4.addr += 'lagg0|192.0.2.133/32'; # MASKED } ---host's-jail.conf--- Same thing happens on two different physical jail hosts (both are older machines, HP gen6 and gen8): CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (2266.79-MHz K8-class CPU) Origin="GenuineIntel" Id=0x106a5 Family=0x6 Model=0x1a Stepping=5 CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz (1995.04-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206d7 Family=0x6 Model=0x2d Stepping=7 I also think that permissions issue is strange. If permissions were wrong, why would everything work with manual start, but not with jail start? Also, how come there are no problems with 12.2 jails? Anyway, I created ansible playbook which gives me ability to rebuild identical jail in a matter of seconds. It is no problem for me to create any combination of 12.2 / 13.0 jail with 2021Q2, 2021Q3 or main ports. If it can be of any help to take logs from different combinations of jails / ports it would be no problem for me. I don't have enough knowledge for troubleshooting this, and maybe this will make no sense, but could it be that ejabberd service on 13.0 jail somehow starts too early, before "everything is in place" (in layman's terms)? Thank you for debugging this, if there's anything else I can do please let me know.
Hi, I tried reproducing on jail in another host (DigitalOcean VM running 13.0-RELEASE with GENERIC kernel/configuration), and unable to reproduce with it. The only difference between my setup, and yours is: * no lagg device * no public IP address in jail * a RFC1918 IP address bound to the ethernet interface * pf NAT on host * local_unbound in jail used as resolver * only ejabberd is running in jail Is it possible for you to provision a separate jail with a bare minimum configuration with just ejabberd ? Also on the host are you running non-GENERIC kernel ? Any sysctl.conf(5) customizations you applied on the host ? > I also think that permissions issue is strange. If permissions were wrong, why would everything work with manual start, but not with jail start? Those were red-herring, so you can ignore that. > Also, how come there are no problems with 12.2 jails? I'm not quite sure about that yet. Also can you run commands in comment# 9 again ? Maybe this time there will be some success. Thanks!
(In reply to Marko Cupać from comment #12) Also do other daemons in the jail work fine at startup, i.e. sshd, synapse, turnserver ?
Here's one more data point. In a freshly created jail [1] and a nearly pristine ejabberd.conf [2] didn't see any problems with ejabberd starting when the jail was restarted. [1] # bsdinstall jail /usr/jails/ejabberd # cat /etc/jail.conf ejabberd { host.hostname = alaj1.ftfl.ca; ip4.addr = 'lo0|127.0.0.2/32'; ip4.addr += 'em0|192.168.2.9'; path = "/usr/jails/ejabberd"; mount.devfs; exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown"; allow.raw_sockets = 1; } [2] # cp /usr/local/etc/ejabberd/ejabberd.yml.example /usr/local/etc/ejabberd/ejabberd.yml # sed -I -e 's/"::"/"0.0.0.0"/' /usr/local/etc/ejabberd/ejabberd.yml Could there be a problem with your jail? Maybe you could create a new jail with similar steps to verify?
Hi Marko, Is it fine to close this PR, since we are not able to reproduce the reported issue in FreeBSD jails ? Thanks!
(In reply to Ashish SHUKLA from comment #16) Hi, I still have the same problem with my configuration. If you want to close it I agree, but I propose to leave it for now. I was quite busy lately so I was working around with manual restarts, but I would like to find the issue, and for that I plan to create another jail with default configuration, see if there's no problem, and then slowly add features one by one (ldap etc.) to find where exactly problems start. I guess I will have to use quarterly ports and not main because there's another problem with latest ejabberd / erlang combination I reported in another PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259067 Best regards,
(In reply to Marko Cupać from comment #17) Fair enough. I'm looking at the other PR, and will follow up with you over there. Thanks!
(In reply to Ashish SHUKLA from comment #18) Hi, I think I made some progress. TL;DR: The problem appears to be in ejabberdctl.cfg's INET_DIST_INTERFACE. If it is defined to either public IP or loopback IP, ejabberd won't start with jail, but starts on manual restart. Longer version: My new server with mostly default config starts fine with jail. I like to have all the services bound to specific IP addresses so I changed config so that all the listeners in ejabberd.yml are bound to IP address of the server. Still, I had something listening on wildcard: tcp4 0 0 *.48012 *.* LISTEN I noticed that my problematic servers have ejabberdctl.cfg, while new one didn't (only .sample), so I set one up (I redacted IP address and domain below, ERL_EPMD_ADDRESS is server's public IP address, and it resolves, forward and reverse, to host specified in ERLANG_NODE, which I also redacted). FIREWALL_WINDOW=4200-4210 ERL_EPMD_ADDRESS=192.0.2.226 ERLANG_NODE=ejabberd@im.example.org EJABBERD_CONFIG_PATH=/usr/local/etc/ejabberd/ejabberd.yml At this point netstat still shows something listens on wildcard, but on another port: tcp4 0 0 *.4200 *.* LISTEN As port is the first one in FIREWALL_WINDOW, I concluded I should also set INET_DIST_INTERFACE in order to bind it to specific IP. However, setting it to either public IP address (192.0.2.226 - redacted), or jail's loopback IP address (127.0.2.226) results in failure to start ejabberd with jail, whereas it starts fine on later manual restart. My inetrc is default. Best regards,