Bug 256453 - net-im/ejabberd: External eimp process has terminated unexpectedly
Summary: net-im/ejabberd: External eimp process has terminated unexpectedly
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Ashish SHUKLA
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-06 19:29 UTC by Marko Cupać
Modified: 2021-07-27 14:48 UTC (History)
2 users (show)

See Also:
bugzilla: maintainer-feedback? (ashish)


Attachments
erlang crash dump (275.41 KB, text/plain)
2021-07-02 15:27 UTC, Marko Cupać
no flags Details
requested files and command outputs (478.44 KB, text/plain)
2021-07-05 09:29 UTC, Marko Cupać
no flags Details
debug (113.38 KB, application/gzip)
2021-07-13 11:08 UTC, Marko Cupać
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marko Cupać 2021-06-06 19:29:47 UTC
Hi,

I have a problem with ejabberd startup. After jail restart it won't start, I get those in ejabberd.log:

2021-06-06 21:17:00.704 [error] <0.171.0> External eimp process (pid=4614) has terminated unexpectedly, restarting in a few seconds
2021-06-06 21:17:00.704 [error] <0.169.0> External eimp process (pid=4613) has terminated unexpectedly, restarting in a few seconds
2021-06-06 21:17:00.704 [error] <0.195.0> External eimp process (pid=4626) has terminated unexpectedly, restarting in a few seconds

It also creates erl_crash.dump file.

If I restart ejabberd manually with service ejabberd restart, it works fine.

Host: 13.0-RELEASE-p2
Jail: 13.0-RELEASE-p2
Ejabberd: ejabberd-21.04

I noticed this since moving my ejabberd service from 12.2 to 13.0 Host/Jail. Jail was not upgraded, but re-implemented from scratch, on another hardware. Host as well.

Any idea on how to fix this? I can provide erl_crash.dump file if needed.

Thank you in advance,
Comment 1 Ashish SHUKLA freebsd_committer 2021-06-09 06:03:22 UTC
I don't use it in jail, although I'll check and get back to you by this weekend.
Comment 2 Marko Cupać 2021-07-02 15:27:26 UTC
Created attachment 226171 [details]
erlang crash dump

Any news? I'm still experiencing the same problem - on system (jail) start, i get the attached dump. `service ejabberd restart` brings up ejabberd.
Comment 3 Ashish SHUKLA freebsd_committer 2021-07-04 08:36:10 UTC
(In reply to Marko Cupać from comment #2)

Sorry, I was stuck at the first step, aka building a jail with ejabberd setup. Could you share instructions to provision such a jail ? I could not get jail to work with my setup (one public IPv4 address, and a /64 IPv6 network block on the host).

Thanks!
Comment 4 Marko Cupać 2021-07-04 09:51:45 UTC
(In reply to Ashish SHUKLA from comment #3)

Hi,

I use standard "fat" jail. I have some articles in which I documented my jail setup (sorry for self-promotion), hopefully they will help you.

First I prepare my jail host to be "jail friendly":

https://www.mimar.rs/blog/host-your-own-services-with-freebsd-jails-part-1-prepare-jail-host

Next, I install my jail with bsdinstall jail:

https://www.mimar.rs/blog/host-your-own-services-with-freebsd-jails-part-2-initial-jail-setup

After that, I use jail as it was standard host - ssh to it, install and configure packages, start services etc.

If you need any further help just ask.

Thank you for looking into this!
Comment 5 Ashish SHUKLA freebsd_committer 2021-07-04 13:56:31 UTC
Thanks for sharing your articles. They are very helpful.

I guess I was missing "fat jail" after which ejabberd started fine. Unfortunately, I'm not able to reproduce your issue here with my mostly pristine ejabberd configuration after restarting my jail couple of times. In all the restarts eimp processes came UP just fine, and stayed online for minutes.

Could you share your ejabberd configuration (masking any identifying information), output of "pkg info -a" from within the jail, output of "pkg info ejabberd", and a recent crash dump file ?

Thanks!
Comment 6 Marko Cupać 2021-07-05 09:29:55 UTC
Created attachment 226229 [details]
requested files and command outputs

Hi,

I have attached requested files and command outputs in shar archive.

I can give you public key to my poudriere repo if needed for testing purposes.

I have another ejabberd server which doesn't have the problem (12.2 jail, ejabberd 21.01), as opposed to the problematic one (13.0 jail, ejabberd 21.04).

Thanks once again for looking into this.
Comment 7 Ashish SHUKLA freebsd_committer 2021-07-11 11:31:58 UTC
Hi,

I looked at provided information but didn't find anything that'll help us get bottom to this. I was using similar configuration except for the LDAP stuff which your configuration has.

Do you know if the eimp dumps core ? If yes, can you install ejabberd (built with debug symbols, i.e. make -C /usr/ports/net-im/ejabberd -DWITH_DEBUG install), and then provide the backtrace here ?

   % lldb -c /path/to/core/file /usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp

   (lldb) bt all

Thanks!
Comment 8 Marko Cupać 2021-07-12 20:14:36 UTC
(In reply to Ashish SHUKLA from comment #7)

Hi,

I searched for all *core* and *dump* files in a jail, but couldn't find anything related to eimp, I don't think it dumps core.

I recreated identical jail on 12.2-RELEASE and it doesn't have described problem.

Interestingly, I have somewhat similar problem with amavisd-new startup as reported here:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256974

Is there a chance to bring some jail people into discussion? At this point I think problem could be related to some changes in jail implementation between 12.2 and 13.0.
Comment 9 Ashish SHUKLA freebsd_committer 2021-07-13 05:52:23 UTC
(In reply to Marko Cupać from comment #8)


1. Replace `ejabberd_start' function in `/usr/local/etc/rc.d/ejabberd' with following: 


===

ejabberd_start()
{
    echo "Starting $name."
    #su $EJABBERDUSER -c "env ERL_EPMD_ADDRESS=\"${ejabberd_epmd_address}\" $EJABBERDCTL --node $ejabberd_node start"
    su $EJABBERDUSER -c "env ERL_EPMD_ADDRESS=\"${ejabberd_epmd_address}\" truss -s 200 -f $EJABBERDCTL --node $ejabberd_node start 2>/tmp/ejabberd.for-bug-256453.err"
}

===

2. Make sure that sysctl knob "security.bsd.unprivileged_proc_debug" is set to 1:

   # sysctl security.bsd.unprivileged_proc_debug=1

3. Restart the jail. Please note that jail command will not exit immediately, courtesy: truss. 

4. In the meantime, in another terminal session you can keep an eye on /var/log/ejabberd.log for the messages "External eimp process (pid=XXXXX) has terminated unexpectedly, restarting in a few seconds". Once this appeared for couple of times, you can stop the `ejabberd' from within the jail, and kill the `epmd' process. After you kill the `epmd' process, `truss' will exit, and so will the jail command.

5. You can revert the changes made in step 1, and step 2.


After this you can compress, and send over file "/tmp/ejabberd.for-bug-256453.err", or figure out yourself from the file, why "eimp" keeps dying (and share the findings).

Please note that the file may contain stuff related to your setup, so you may want to mask the secret/identifying information before sending it over.

Good luck!

HTH
Comment 10 Marko Cupać 2021-07-13 11:08:20 UTC
Created attachment 226421 [details]
debug

Hi,

I have obtained and attached the file as instructed.
Comment 11 Ashish SHUKLA freebsd_committer 2021-07-13 19:48:15 UTC
(In reply to Marko Cupać from comment #10)

Looking at the provided log file, I don't see any sign of eimp being launched:

λ fgrep execve ejabberd.for-bug-256453.err
30540: execve("/usr/bin/id",0x800a36810,0x800a36890) EJUSTRETURN
30554: execve("/usr/local/bin/erl",0x800a36518,0x800a366d8) EJUSTRETURN
30557: execve("/usr/bin/sed",0x800a36188,0x800a36228) EJUSTRETURN
30554: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erlexec",0x800a360c8,0x800a36278) EJUSTRETURN
30554: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/beam.smp",0x800a09000,0x800a2d000) EJUSTRETURN
30568: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erl_child_setup",0x7fffffffdda0,0x7fffffffea48) EJUSTRETURN
31867: execve("/bin/date",0x800a36138,0x800a36208) EJUSTRETURN
31871: execve("/usr/bin/sed",0x800a362c0,0x800a36498) EJUSTRETURN
31873: execve("/usr/bin/sed",0x800a362d8,0x800a36438) EJUSTRETURN
31874: execve("/usr/local/bin/epmd",0x800a7d410,0x800a7d4a0) EJUSTRETURN
31875: execve("/usr/bin/grep",0x800a7d410,0x800a7d4a0) EJUSTRETURN
31876: execve("/usr/local/bin/erl",0x800aa4810,0x800abe908) EJUSTRETURN
31879: execve("/usr/bin/sed",0x800a3a188,0x800a3a228) EJUSTRETURN
31876: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erlexec",0x800a69018,0x800a68608) EJUSTRETURN
31880: execve("/bin/sh",0x7fffffffadc0,0x800a30000) EJUSTRETURN
31880: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/epmd",0x800a480e0,0x800a48208) EJUSTRETURN
31884: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/beam.smp",0x800a90800,0x800a30000) EJUSTRETURN
31885: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erl_child_setup",0x7fffffffa840,0x7fffffffc958) EJUSTRETURN


And it seems like the process crashed:

30532: read(3,"416 ===\nFile operation error: eacces. Target: ./global.beam. Function: get_file. Process: code_server.\n=ERROR REPORT==== 13-Jul-",128) = 128 (0x80)
30554: writev(1,[{"=ERROR REPORT==== 13-Jul-2021::12:37:48.859880 ===\nFile operation error: eacces. Target: .. Function: read_file_info. Process: code_server.\n",140},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.524325 ===\nFile operation error: eacces. Target: ./maps.beam. Function: get_file. Process: code_server.\n",144},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.757860 ===\nFile operation error: eacces. Target: ./string.beam. Function: get_file. Process: code_server.\n",146},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.784508 ===\nFile operation error: eacces. Target: ./unicode_util.beam. Function: get_file. Process: code_server.\n",152},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.727759 ===\nFile operation error: eacces. Target: ./rpc.beam. Function: get_file. Process: code_server.\n",143},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.878416 ===\nFile operation error: eacces. Target: ./global.beam. Function: get_file. Process: code_server.\n",146},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.845088 ===\nFile operation error: eacces. Target: ./io_lib.beam. Function: get_file. Process: code_server.\n",146},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.864122 ===\nFile operation error: eacces. Target: ./io_lib_format.beam. Function: get_file. Process: code_server.\n",153},{"=ERROR REPORT==== 13-Jul-2021::12:37:49.974589 ===\nFile operation error: eacces. Target: ./inet_db.beam. Function: get_file. Process: code_server.\n",147},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.884210 ===\nFile operation error: eacces. Target: ./epp.beam. Function: get_file. Process: code_server.\n",143},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.109728 ===\nFile operation error: eacces. Target: ./inet_config.beam. Function: get_file. Process: code_server.\n",151},{"=ERROR REPORT==== 13-Jul-2021::12:37:52.910855 ===\nFile operation error: eacces. Target: ./io.beam. Function: get_file. Process: code_server.\n",142},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.217600 ===\nFile operation error: eacces. Target: ./inet_udp.beam. Function: get_file. Process: code_server.\n",148},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.352064 ===\nFile operation error: eacces. Target: ./inet.beam. Function: get_file. Process: code_server.\n",144},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.502473 ===\nFile operation error: eacces. Target: ./inet_parse.beam.Function: get_file. Process: code_server.\n",150},{"=ERROR REPORT==== 13-Jul-2021::12:37:50.589117 ===\nFile operation error: eacces. Target: ./erl_distribution.beam. Function: get_file. Process: code_server.\n",156}],16) = 2351 (0x92f)
....
....
....
31884: write(3,"=erl_crash_dump:0.5\nTue Jul 13 12:38:00 2021\nSlogan: could not start kernel pid (application_controller) ({bad_environment_value,"#{label=>{erl_prim_loader,file_error},report=>\\"File"})\nSystem ver"...,65536) = 65536 (0x10000)


The cause of crash seems to be the some kind of permission errors, but that seems like a red herring to me. Are you using some kind of resource limits for your jail, or any special sysctl knobs ?

Can you share your jail's rc.conf ? Maybe that can shed some light.


FTR, following is the one from my jailed environment:

λ fgrep execve ejabberd.err
56079: execve("/usr/bin/id",0x800a36810,0x800a36890) EJUSTRETURN
62756: execve("/bin/date",0x800a36138,0x800a36208) EJUSTRETURN
64426: execve("/usr/bin/sed",0x800a362c0,0x800a36498) EJUSTRETURN
70458: execve("/usr/bin/sed",0x800a362d8,0x800a36438) EJUSTRETURN
72781: execve("/usr/local/bin/epmd",0x800a49410,0x800a494a0) EJUSTRETURN
79487: execve("/usr/bin/grep",0x800a49410,0x800a494a0) EJUSTRETURN
83659: execve("/usr/local/bin/erl",0x800a498c8,0x800a49a18) EJUSTRETURN
89909: execve("/usr/bin/sed",0x800a3a188,0x800a3a228) EJUSTRETURN
83659: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erlexec",0x800a3a238,0x800a3a408) EJUSTRETURN
93156: execve("/bin/sh",0x7fffffffe060,0x800a29000) EJUSTRETURN
93156: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/epmd",0x800a480e0,0x800a48208) EJUSTRETURN
 7607: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/beam.smp",0x800a30000,0x800a29000) EJUSTRETURN
11440: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/erl_child_setup",0x7fffffffdae0,0x7fffffffe800) EJUSTRETURN
50748: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
53496: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
55343: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
55774: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
62227: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
66142: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
69352: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
73650: execve("/usr/local/lib/erlang/lib/ejabberd-21.04/lib/eimp-1.0.20/priv/bin/eimp",0x800a1d000,0x800a16000) EJUSTRETURN
78691: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN
78691: execve("/usr/local/lib/erlang/erts-10.3.5.19/bin/inet_gethost",0x800a480b8,0x800a48208) EJUSTRETURN
95310: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN
95310: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN
98622: execve("/sbin/sysctl",0x800a43210,0x800a432a8) EJUSTRETURN
   35: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN
   35: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN
 4407: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN
11970: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN
11970: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN
13251: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN
97556: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN
97556: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN
   35: execve("/sbin/sysctl",0x800a43210,0x800a432a8) EJUSTRETURN
 6123: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN
 6123: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN
12506: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN
19993: execve("/bin/sh",0x7fffffffe590,0x800a1a000) EJUSTRETURN
19993: execve("/bin/sh",0x800a480e0,0x800a48208) EJUSTRETURN
25076: execve("/sbin/sysctl",0x800a43210,0x800a432b0) EJUSTRETURN


Thanks!
Comment 12 Marko Cupać 2021-07-13 20:53:55 UTC
Hi,

both sysctl.conf and loader.conf on jail host are empty (no sysctl modification). No resource limits or special knobs.

---jail's-rc.conf---
clear_tmp_enable="YES"
cron_flags="$cron_flags -J 15"
syslogd_flags="-ss"
sendmail_enable="NO"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"
sshd_enable="YES"
ejabberd_enable="YES"
synapse_enable="YES"
turnserver_enable="YES"
---jail's-rc.conf---

---host's-jail.conf---
path           = "/usr/jail/${host.hostname}";
exec.prestart  = "/sbin/mount_nullfs -o ro /usr/local/etc/letsencrypt \
                  /usr/jail/${host.hostname}/usr/local/etc/letsencrypt";
exec.start     = "/bin/sh /etc/rc";
exec.stop      = "/bin/sh /etc/rc.shutdown jail";
exec.poststop  = "/sbin/umount /usr/jail/${host.hostname}/usr/local/etc/letsencrypt/";
exec.clean;
mount.devfs;

im1_example_org { # MASKED
  host.hostname    = im1.example.org; # MASKED
  host.domainname  = example.org; # MASKED
  ip4.addr         = 'lo2|127.0.2.113/32'; # MASKED
  ip4.addr        += 'lagg0|192.0.2.133/32'; # MASKED
}
---host's-jail.conf---

Same thing happens on two different physical jail hosts (both are older machines, HP gen6 and gen8):

CPU: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz (2266.79-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x106a5  Family=0x6  Model=0x1a  Stepping=5

CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz (1995.04-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d  Stepping=7

I also think that permissions issue is strange. If permissions were wrong, why would everything work with manual start, but not with jail start?

Also, how come there are no problems with 12.2 jails?

Anyway, I created ansible playbook which gives me ability to rebuild identical jail in a matter of seconds.

It is no problem for me to create any combination of 12.2 / 13.0 jail with 2021Q2, 2021Q3 or main ports.

If it can be of any help to take logs from different combinations of jails / ports it would be no problem for me.

I don't have enough knowledge for troubleshooting this, and maybe this will make no sense, but could it be that ejabberd service on 13.0 jail somehow starts too early, before "everything is in place" (in layman's terms)?

Thank you for debugging this, if there's anything else I can do please let me know.
Comment 13 Ashish SHUKLA freebsd_committer 2021-07-20 19:37:18 UTC
Hi,

I tried reproducing on jail in another host (DigitalOcean VM running 13.0-RELEASE with GENERIC kernel/configuration), and unable to reproduce with it. The only difference between my setup, and yours is:

 * no lagg device
 * no public IP address in jail
 * a RFC1918 IP address bound to the ethernet interface
 * pf NAT on host
 * local_unbound in jail used as resolver
 * only ejabberd is running in jail

Is it possible for you to provision a separate jail with a bare minimum configuration with just ejabberd ?

Also on the host are you running non-GENERIC kernel ? Any sysctl.conf(5) customizations you applied on the host ?


> I also think that permissions issue is strange. If permissions were wrong, why would everything work with manual start, but not with jail start?

Those were red-herring, so you can ignore that.

> Also, how come there are no problems with 12.2 jails?

I'm not quite sure about that yet.

Also can you run commands in comment# 9 again ? Maybe this time there will be some success.

Thanks!
Comment 14 Ashish SHUKLA freebsd_committer 2021-07-20 21:58:52 UTC
(In reply to Marko Cupać from comment #12)

Also do other daemons in the jail work fine at startup, i.e. sshd, synapse, turnserver ?
Comment 15 Joseph Mingrone freebsd_committer 2021-07-20 22:10:17 UTC
Here's one more data point.
In a freshly created jail [1] and a nearly pristine ejabberd.conf [2] didn't see any problems with ejabberd starting when the jail was restarted.

[1]
# bsdinstall jail /usr/jails/ejabberd
# cat /etc/jail.conf
ejabberd {
         host.hostname = alaj1.ftfl.ca;
         ip4.addr = 'lo0|127.0.0.2/32';
         ip4.addr += 'em0|192.168.2.9';
         path = "/usr/jails/ejabberd";
         mount.devfs;
         exec.start = "/bin/sh /etc/rc";
         exec.stop = "/bin/sh /etc/rc.shutdown";
         allow.raw_sockets = 1;
}

[2]
# cp /usr/local/etc/ejabberd/ejabberd.yml.example /usr/local/etc/ejabberd/ejabberd.yml
# sed -I -e 's/"::"/"0.0.0.0"/' /usr/local/etc/ejabberd/ejabberd.yml

Could there be a problem with your jail?  Maybe you could create a new jail with similar steps to verify?