Bug 202978

Summary: After upgrade of net/syncthing 0.11.18 -> 0.11.23 in jail, the jail causes entire host to crash
Product: Base System Reporter: Niklaas Baudet von Gersdorff <me>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed DUPLICATE    
Severity: Affects Only Me CC: brd, peter, swills, voltagex
Priority: --- Keywords: crash
Version: 10.1-RELEASE   
Hardware: amd64   
OS: Any   
See Also: https://github.com/syncthing/syncthing/issues/2090
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201913
Attachments:
Description Flags
/var/log/messages of the jail and its host with comments none

Description Niklaas Baudet von Gersdorff 2015-09-08 20:06:57 UTC
Created attachment 160845 [details]
/var/log/messages of the jail and its host with comments

In reference to the thread related to this issue in freebsd-questions@freebsd.org, see http://docs.freebsd.org/cgi/getmsg.cgi?fetch=130033+0+current/freebsd-questions.

After upgrading a package (net/syncthing) in a jail I experienced that
the host system rebooted. It seems because the jail starts at each boot
running net/syncthing again, the host system rebooted each time it started
the jail. A vicious circle. Because of this, I was nearly unable to
connect to the server.

By continuously pinging the server and sending a

	ssh <server> ezjail-admin config -r norun <jailname>

once I knew the server is online again, I could stop the circle and
finally log into the server again.

I guess that something's wrong with net/syncthing (or its configuration)
but what worries me more is how a userspace program in a malfunctioning jail can cause an
entire host system to reboot? (Please also see the discussion in the mailinglist.)

The upgrade of net/syncthing was 0.11.18 -> 0.11.23

$ uname -a
FreeBSD tank.<server> 10.1-RELEASE-p19 FreeBSD 10.1-RELEASE-p19 #0: Sat Aug 22 03:55:09 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

I am able to reproduce the behaviour by simply starting the jail. Shortly after that the system crashes. This made it possible for me to get some coredumps:

$ ls -hl /var/crash
total 11552056
-rw-r--r--  1 root  wheel     2B Sep  8 21:15 bounds
-rw-------  1 root  wheel   357K Sep  8 21:09 core.txt.0
-rw-------  1 root  wheel   176K Sep  8 21:12 core.txt.1
-rw-------  1 root  wheel   193K Sep  8 21:16 core.txt.2
-rw-------  1 root  wheel   502B Sep  8 21:08 info.0
-rw-------  1 root  wheel   502B Sep  8 21:11 info.1
-rw-------  1 root  wheel   501B Sep  8 21:15 info.2
lrwxr-xr-x  1 root  wheel     6B Sep  8 21:15 info.last -> info.2
-rw-r--r--  1 root  wheel     5B Jan 16  2014 minfree
-rw-------  1 root  wheel   2.0G Sep  8 21:09 vmcore.0
-rw-------  1 root  wheel   1.9G Sep  8 21:12 vmcore.1
-rw-------  1 root  wheel   1.9G Sep  8 21:15 vmcore.2
lrwxr-xr-x  1 root  wheel     8B Sep  8 21:15 vmcore.last -> vmcore.2

Please find attached `/var/log/messages` of the jail and its host. These are extracted from the very first failure. I have also a backup of `/var/log/messages` of the crash that I just "forced" this evening.

I can also share the coredumps but would need some advice on how to do so without disclosing sensitive information.
Comment 1 Peter Wemm freebsd_committer freebsd_triage 2015-09-08 20:17:17 UTC
syncthing does something strange with its use of local ipv6 multicast and crashes the kernel when it exits.  This is definitely a kernel bug - userland apps should not be able to crash the kernel, even if they are doing something odd.

In the meantime, you can work around it by DISABLING local announce.

If you can't do it through the UI, you can do it by editing the config.xml while syncthing is not running.  Normally it is in ~/.config/syncthing/config.xml but you can configure it to be elsewhere.

change:
        <localAnnounceEnabled>true</localAnnounceEnabled>
to
        <localAnnounceEnabled>false</localAnnounceEnabled>

When you next start up syncthing, it won't do the weird thing with this line:
        <localAnnounceMCAddr>[ff32::5222]:21026</localAnnounceMCAddr>
.. which is causing the kernel panic.
Comment 2 Peter Wemm freebsd_committer freebsd_triage 2015-09-08 20:25:35 UTC
There's a panic trace in bug #201913
Comment 3 Peter Wemm freebsd_committer freebsd_triage 2015-09-08 20:34:22 UTC
There's several upstream bugs, but this one has discussion of using the wrong address: https://github.com/syncthing/syncthing/issues/2090 

There's a few others.. if you search the issues list for multicast, you'll see that you can also delete the contents of <localAnnounceMCAddr> and leave localannounce enabled - that limits syncthing local announce on ipv4, and as a side effect to one daemon per machine - and it can't run like that in a jail anyway as jails can't bind to interface broadcast addresses.

Anyway, I'm mentioning it here because it is possible to do limited ipv4 local announce if you need it.
Comment 4 Niklaas Baudet von Gersdorff 2015-09-09 06:54:40 UTC
Thank you for the workaround. It works.

In case you need further information to debug please don't hesitate to contact me.
Comment 5 Adam Baxter 2016-08-15 11:57:06 UTC
Cannot reproduce with the kernel from 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Have tested using syncthing v0.13.4 (portdowngrade net/syncthing r415988) and syncthing v0.11.23 (portdowngrade net/syncthing r395632, also requires portdowngrade lang/go r395390)

I believe this bug, plus #200846 and #201913 can be closed, although someone with more experience in the kernel itself might want to check when exactly this was resolved, seeing as it's affected 10.1, 10.3 and 11-CURRENT at various stages.
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 12:37:29 UTC
Re-open. Will add more context/information shortly.
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 12:58:54 UTC
Underlying userland causing kernel panic is being tracked in the duplicated (original) bug 200846

*** This bug has been marked as a duplicate of bug 200846 ***