| Summary: | net-im/ejabberd: ejabberdctl fails to communicate with ejabberd process | ||
|---|---|---|---|
| Product: | Ports & Packages | Reporter: | neil |
| Component: | Individual Port(s) | Assignee: | Ashish SHUKLA <ashish> |
| Status: | Closed Overcome By Events | ||
| Severity: | Affects Only Me | CC: | cs, miwi |
| Priority: | Normal | ||
| Version: | Latest | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
neil
2012-02-21 13:00:20 UTC
Responsible Changed From-To: freebsd-ports-bugs->ashish Fix synopsis and assign. Hi, I've updated net-im/ejabberd port to 2.1.11. Could you please check if it solves your problem? Thanks -- Ashish SHUKLA | GPG: F682 CDCC 39DC 0FEA E116 20B6 C746 CFA9 E74F A4B0 freebsd.org!ashish | http://people.freebsd.org/~ashish/ Sent from my Emacs Hi Ashish, No, the update to ejabberd-2.1.11 has no effect. The problem still persists. Regards, Neil Darlow Neil Darlow writes: > Hi Ashish, > No, the update to ejabberd-2.1.11 has no effect. The problem still persists. Hi Neil, Okay, thanks for the update. I'm away from my FreeBSD host. I'll look at it when I'm back next week. Thanks -- Ashish SHUKLA | GPG: F682 CDCC 39DC 0FEA E116 20B6 C746 CFA9 E74F A4B0 freebsd.org!ashish | http://people.freebsd.org/~ashish/ Sent from my Emacs The problem with RPC failure appears to be related to hostname resolution.
Rather than using the ejabberd_epmd_address of 127.0.0.1, it works when this
is set to the IP address that your ejabberd@hostname nodename resolves to (the
hostname bit) in my case 192.168.1.2
There are two problems I noticed at shutdown:
1) ejabberd spawns epmd if it is not already running but it does not kill it
at shutdown.
This can be fixed by using erlang's epmd script (add epmd_enable="YES" to a
preferred rc.conf file). To ensure correct startup and shutdown ordering it is
necessary to change ejabberd's # REQUIRE: line as follows:
# REQUIRE: DAEMON epmd
2) epmd will not respond to the -kill option in its shutdown command until all
registered Names are removed (the -relaxed_command_check option is not used
when epmd is started).
This is compounded because, once ejabberd has been signalled to stop by
ejabberdctl, the epmd script shutdown command can be executed before ejabberd
has unregistered its Name with epmd. This causes epmd to fail shutdown.
A solution is to query epmd for ejabberd's Name, during shutdown, repeatedly
for up to 10 seconds (on my 1.6GHz server it takes about 6 seconds) waiting
for ejabberd to unregister itself, after which epmd will shutdown cleanly.
So, in summary:
1) Add epmd_enable="YES" to a preferred rc.conf file
2) Add ejabberd_epmd_address="a.real.ip.address" to a preferred rc.conf file
3) The ejabberd control script requires changes as per this after-install
patch:
--- /usr/local/etc/rc.d/ejabberd.orig 2013-03-28 10:19:25.000000000 +0000
+++ /usr/local/etc/rc.d/ejabberd 2013-03-28 10:20:36.000000000 +0000
@@ -2,7 +2,7 @@
# $FreeBSD: ports/net-im/ejabberd/files/ejabberd.in,v 1.10 2012/11/17 06:00:26
svnexp Exp $
# PROVIDE: ejabberd
-# REQUIRE: DAEMON
+# REQUIRE: DAEMON epmd
# BEFORE: LOGIN
# KEYWORD: shutdown
@@ -58,8 +58,12 @@
{
echo "Stopping $name."
if su $EJABBERDUSER -c "env ERL_EPMD_ADDRESS=\"${ejabberd_epmd_address}\"
$EJABBERDCTL --node $ejabberd_node stop"; then
-# sleep 2
-# killall -u ejabberd -kill
+ SECS=0
+ while /usr/local/bin/epmd -names |grep 'ejabberd' >/dev/null; do
+ sleep 1
+ SECS=$(expr $SECS + 1)
+ [ $SECS -eq 10 ] && exit
+ done
else
_run_rc_notrunning
fi
The timeout implementation could probably be cleaned-up but the logic works
well for me.
Regards,
Neil Darlow
Hi, First of all, thank you for your time in investigating this. On Thu, 28 Mar 2013 11:10:01 GMT, Neil Darlow <neil@darlow.co.uk> said: > The problem with RPC failure appears to be related to hostname resolution. > Rather than using the ejabberd_epmd_address of 127.0.0.1, it works when this > is set to the IP address that your ejabberd@hostname nodename resolves to (the > hostname bit) in my case 192.168.1.2 I'm not sure if that's the real reason. From what I noticed in my testing few minutes ago with ejabberd 2.1.11, and erlang 16.b[1], that if I run system epmd (i.e. $PREFIX/rc.d/epmd), it's able to register its name intermittently, i.e. #v+ % sudo /usr/local/etc/rc.d/epmd start % sudo /usr/local/etc/rc.d/ejabberd start % sudo /usr/local/etc/rc.d/ejabberd status #v- Two instances of epmd were running, system one (the one running under root user) has bound itself to *:4369, whereas the ejabberd one has bound itself to 127.0.0.1. Sometimes above steps succeed (ejabberd is running), and sometimes not (ejabberd is not running), whereas when I stopped all epmd, and then started ejabberd only, i.e.: #v+ % sudo pkill epmd % sudo /usr/local/etc/rc.d/ejabberd start % sudo /usr/local/etc/rc.d/ejabberd status #v- Then it succeeded. And now only one epmd was running, which was spawned by ejabberd. So from what I observed is that because two processes are listening on same port with overlapping addresses, it adds a level of non-determinism on which process is going to handle connection for 127.0.0.1:4369. Probably if we can instruct ejabberd to not start another epmd if one is already running, then may be we can achieve some level of determinism with this thing. It might be possible that epmd shipped with erlang 16b has some fixes which makes things slightly more reliable. > There are two problems I noticed at shutdown: > 1) ejabberd spawns epmd if it is not already running but it does not kill it > at shutdown. > This can be fixed by using erlang's epmd script (add epmd_enable="YES" to a > preferred rc.conf file). To ensure correct startup and shutdown ordering it is > necessary to change ejabberd's # REQUIRE: line as follows: > # REQUIRE: DAEMON epmd Right, but I'm not sure if it'll fix this: #v+ root 90075 0.0 0.0 14228 1620 ?? S 1:06PM 0:00.00 /usr/local/bin/epmd -daemon ejabberd 90115 0.0 0.0 14228 1632 ?? S 1:06PM 0:00.00 /usr/local/lib/erlang/erts-5.10.1/bin/epmd -daemon #v- > 2) epmd will not respond to the -kill option in its shutdown command until all > registered Names are removed (the -relaxed_command_check option is not used > when epmd is started). > This is compounded because, once ejabberd has been signalled to stop by > ejabberdctl, the epmd script shutdown command can be executed before ejabberd > has unregistered its Name with epmd. This causes epmd to fail shutdown. > A solution is to query epmd for ejabberd's Name, during shutdown, repeatedly > for up to 10 seconds (on my 1.6GHz server it takes about 6 seconds) waiting > for ejabberd to unregister itself, after which epmd will shutdown cleanly. > So, in summary: > 1) Add epmd_enable="YES" to a preferred rc.conf file > 2) Add ejabberd_epmd_address="a.real.ip.address" to a preferred rc.conf file > 3) The ejabberd control script requires changes as per this after-install > patch: > --- /usr/local/etc/rc.d/ejabberd.orig 2013-03-28 10:19:25.000000000 +0000 > +++ /usr/local/etc/rc.d/ejabberd 2013-03-28 10:20:36.000000000 +0000 > @@ -2,7 +2,7 @@ > # $FreeBSD: ports/net-im/ejabberd/files/ejabberd.in,v 1.10 2012/11/17 06:00:26 > svnexp Exp $ > # PROVIDE: ejabberd > -# REQUIRE: DAEMON > +# REQUIRE: DAEMON epmd > # BEFORE: LOGIN > # KEYWORD: shutdown > @@ -58,8 +58,12 @@ > { > echo "Stopping $name." > if su $EJABBERDUSER -c "env ERL_EPMD_ADDRESS=\"${ejabberd_epmd_address}\" > $EJABBERDCTL --node $ejabberd_node stop"; then > -# sleep 2 > -# killall -u ejabberd -kill > + SECS=0 > + while /usr/local/bin/epmd -names |grep 'ejabberd' >/dev/null; do > + sleep 1 > + SECS=$(expr $SECS + 1) > + [ $SECS -eq 10 ] && exit > + done > else > _run_rc_notrunning > fi The patch seems good, though I've not tried yet. I'll try it and provide a feedback probably next week, though it can't be committed until ports freeze is over. References: [1] https://svn.redports.org/olgeni/lang/erlang/ Thanks -- Ashish SHUKLA | GPG: F682 CDCC 39DC 0FEA E116 20B6 C746 CFA9 E74F A4B0 Sent from my Emacs On Friday 29 March 2013 18:59:56 Ashish SHUKLA wrote: > First of all, thank you for your time in investigating this. You are welcome. I am aware that maintainers may not be able to reproduce specific problems experienced by users so it is only right that I should assist wherever possible. > I'm not sure if that's the real reason. From what I noticed in my testing > few minutes ago with ejabberd 2.1.11, and erlang 16.b[1], that if I run > system epmd (i.e. $PREFIX/rc.d/epmd), it's able to register its name > intermittently, i.e. > > #v+ > % sudo /usr/local/etc/rc.d/epmd start > % sudo /usr/local/etc/rc.d/ejabberd start > % sudo /usr/local/etc/rc.d/ejabberd status > #v- > > Two instances of epmd were running, system one (the one running under root > user) has bound itself to *:4369, whereas the ejabberd one has bound itself > to 127.0.0.1. I found this also, until I changed "ejabberd_epmd_address" to the physical IP address of my network interface (which is also what my hostname resolves to - coincidence?). > So from what I observed is that because two processes are listening on same > port with overlapping addresses, it adds a level of non-determinism on which > process is going to handle connection for 127.0.0.1:4369. Probably if we > can instruct ejabberd to not start another epmd if one is already running, > then may be we can achieve some level of determinism with this thing. Do you have faster hardware than my 1.6GHz system? Is it possible that epmd has not started before ejabberd starts and it does not see it as running? That might explain why you see two instances of epmd. I must admit, in the heavy testing I performed, this situation was not arising. I did repeated "epmd start; ejabberd start", "ejabberd stop; epmd stop" and reboot sequences. They all resulted in the following: 786 ?? S 0:01.54 /usr/local/bin/epmd -daemon 788 ?? S 0:32.13 /usr/local/sbin/winbindd -s /usr/local/etc/smb.conf 789 ?? S 0:00.62 /usr/local/sbin/smbd -D -s /usr/local/etc/smb.conf 803 ?? I 0:24.54 /usr/local/lib/erlang/erts-5.9.3.1/bin/beam.smp -P 250000 -- -root /usr/local/lib/erlang -progname erl -- -home /var/spool/ejabberd -- -sname ejabberd@bumblebee -noshell -noinput -noshell - noinput -pa /usr/local/lib/erlang/lib/ejabberd-2.1.11/ebin -mnesia dir "/var/spool/ejabberd" -kernel inet_dist_use_interface {192,168,1,2} -s ejabberd -sasl sasl_error_logger {file,"/var/log/ejabberd/erlang.log"} -smp auto start start 807 ?? Is 0:00.02 inet_gethost 4 808 ?? I 0:00.08 inet_gethost 4 This is from a boot invokation. Manual startup results in beam.smp immediately following epmd in the process listing, as you would expect. > Right, but I'm not sure if it'll fix this: > > #v+ > root 90075 0.0 0.0 14228 1620 ?? S 1:06PM 0:00.00 > /usr/local/bin/epmd -daemon ejabberd 90115 0.0 0.0 14228 1632 ?? S > 1:06PM 0:00.00 /usr/local/lib/erlang/erts-5.10.1/bin/epmd -daemon > #v- I have not seen two instances of epmd running since I set ejabberd_epmd_address from its default value of 127.0.0.1. If you have done that then there must be another cause like I mentioned earlier. > The patch seems good, though I've not tried yet. I'll try it and provide a > feedback probably next week, though it can't be committed until ports freeze > is over. Well, we still need to resolve the issue you experience that I currently do not before that commit also. Regards, Neil Darlow Is this PR still relevant? It still appears to be relevant. I apply my patch at each update. Can you please add your patch as an attachment? It is actually part of Comment 5. I cannot usefully attach it as a patch because it modifies files/ejabberd.in It really needs the maintainer to decide whether he wants to adopt the change and if he agrees with the logic behind it. Regards, Neil Darlow Ok. Make sure you hammer the maintainer. If he does not respond, please get back to me. (In reply to neil from comment #11) > It is actually part of Comment 5. > > I cannot usefully attach it as a patch because it modifies files/ejabberd.in > > It really needs the maintainer to decide whether he wants to adopt the > change and if he agrees with the logic behind it. > > Regards, > Neil Darlow Hi Neil, Sorry for the extreme delay on part. The patch works as intended if its pre-requisite conditions (epmd enabled in rc.conf, and ejabberd_epmd_address) are set, which we don't control. So I prepared following diff with the help of your changeset, and prepared following diff, which explicitly kills epmd after all names are unregistered, and does not require erlang's epmd: Index: ejabberd.in =================================================================== --- ejabberd.in (revision 29885) +++ ejabberd.in (working copy) @@ -58,8 +58,13 @@ { echo "Stopping $name." if su $EJABBERDUSER -c "env ERL_EPMD_ADDRESS=\"${ejabberd_epmd_address}\" $EJABBERDCTL --node $ejabberd_node stop"; then -# sleep 2 -# killall -u ejabberd -kill + SECS=0 + while %%LOCALBASE%%/bin/epmd -names |fgrep -q ejabberd; do + sleep 1 + SECS=$(expr $SECS + 1) + [ $SECS -eq 10 ] && exit + done + pkill -j none -u $EJABBERDUSER epmd else _run_rc_notrunning fi How does this look to you ? Thanks, Ashish P.S. I'll only be able to commit it on/after Wednesday September 24, 2014. Hi, The problem I can see with that solution is that it assumes ejabberd spawned the epmd process. What if another erlang-based port is using epmd launched from the epmd RC script? I do not think it is a good idea that ejabberd's RC kills epmd unless it can be sure it is killing an instance launched by itself. (In reply to neil from comment #14) > Hi, > > The problem I can see with that solution is that it assumes ejabberd spawned > the epmd process. > > What if another erlang-based port is using epmd launched from the epmd RC > script? > > I do not think it is a good idea that ejabberd's RC kills epmd unless it can > be sure it is killing an instance launched by itself. Although, it's not guaranteed that it'll only kill the 'epmd' process spawned by ejabberd, but it'll only kill the `epmd' process owned by 'ejabberd' user. The only place this won't work is when user runs multiple 'ejabberd' instances under same 'ejabberd' user[1] on same host. And since this rc.d script is used to manage single 'ejabberd' instance, so this would be useless anyways. Do you still see problem? References: [1] No ideas if it's really possible though, courtesy: $HOME/.erlang.cookie Thanks! Ashish (In reply to Ashish SHUKLA from comment #15) > (In reply to neil from comment #14) > > Hi, > > > > The problem I can see with that solution is that it assumes ejabberd spawned > > the epmd process. > > > > What if another erlang-based port is using epmd launched from the epmd RC > > script? > > > > I do not think it is a good idea that ejabberd's RC kills epmd unless it can > > be sure it is killing an instance launched by itself. > > Although, it's not guaranteed that it'll only kill the 'epmd' process > spawned by ejabberd, but it'll only kill the `epmd' process owned by > 'ejabberd' user. > > The only place this won't work is when user runs multiple 'ejabberd' > instances under same 'ejabberd' user[1] on same host. And since this rc.d > script is used to manage single 'ejabberd' instance, so this would be > useless anyways. > > Do you still see problem? Hi Neil, Sorry to bother you. I'm wondering if you got chance to look at it, and it's fine to commit this diff ? Thanks, Ashish Hi Ashish, I just had a quick look at your patch. Unfortunately it still presents the RPC error message and leaves epmd and ejabberd running. The method I developed still seems to be the most reliable at shutting down ejabberd. Although it won't survive an Erlang update - you have to stop ejabberd before updating Erlang because ERTS version gets encoded into the runtime path. Regards, Neil Darlow (In reply to neil from comment #17) > Hi Ashish, > > I just had a quick look at your patch. Unfortunately it still presents the > RPC error message and leaves epmd and ejabberd running. > > The method I developed still seems to be the most reliable at shutting down > ejabberd. Although it won't survive an Erlang update - you have to stop > ejabberd before updating Erlang because ERTS version gets encoded into the > runtime path. > > Regards, > Neil Darlow Okay, I'm able to reproduce it now as well. I was missing `ejabberd_node' variable from rc.conf(5). I'll see if I can incorporate your solution cleanly in the rc.d script. Ashish Hi, Any progress here? I think we can close this as long as the following proviso is understood. The erts version is encoded into paths used by executables. This means that when an underlying erlang port update changes the erts version it is necessary to stop ejabberd before erlang and ejabberd are updated. If this is not done then the epmd and beam processes will have to be terminated manually. Actually, the epmd process spawned by ejabberd is uncontrolled anyway and requires manual killing whenever a restart of it is required. Close per request. Thank you. |