Bug 200846 - Stopping net/syncthing (userland) service causes kernel panic in mld_change_state() on 10.1-10.3-3, 11-CURRENT
Summary: Stopping net/syncthing (userland) service causes kernel panic in mld_change_s...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: Normal Affects Many People
Assignee: George V. Neville-Neil
URL:
Keywords: crash, needs-patch, needs-qa
: 201913 202978 213953 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-06-14 05:27 UTC by Steve Wills
Modified: 2019-09-11 08:51 UTC (History)
14 users (show)

See Also:
koobs: maintainer-feedback? (gnn)
koobs: mfc-stable10?
koobs: mfc-stable11?


Attachments
panic backtrace 10.2-RELEASE / syncthing: 0.12.17 (6.55 KB, text/plain)
2016-08-15 15:07 UTC, Kubilay Kocak
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Wills freebsd_committer 2015-06-14 05:27:04 UTC
I'm running 11-CURRENT r283640. Starting syncthing (net/syncthing) works fine, but killing it produces a panic in mld_change_state(). There is an image of the panic here:

https://people.freebsd.org/~swills/panic_2015061401.jpg

It seems to be reliably reproducible.

Disabling localAnnounceEnabled should avoid it, it's announcing on [ff32::5222]:21026.
Comment 1 freebsdbugzilla 2016-02-28 07:33:42 UTC
This bug is really annoying and makes it close to impossible to keep syncthing in local network. I'm happy to help with debugging if necessary. 

Steps to reproduce:
1. turn on sycnthing
2. local peer discovery is enabled by default
3. turn off syncthing or restart 
4. crashes 9/10
Comment 2 Charles Ross 2016-06-27 15:54:15 UTC
The same problem exists on 10.3-RELEASE-p5 using any recent version of Syncthing (from the version available in the 2016Q2 ports branch/pkg repo to the latest version available on Syncthing's website at the time of this posting). Syncthing developer states this is a FreeBSD kernel bug.

Also confirmed that disabling the 'localAnnounceEnabled' option is a temporary workaround, but makes the product somewhat useless in LAN environments.
Comment 3 freebsdbugzilla 2016-06-27 16:02:25 UTC
Hi Charles, 

it's not longer reproducible on FBSD11 32bits and latest syncthing installed by pkg.
Comment 4 Brad Davis freebsd_committer 2016-08-15 12:33:15 UTC
Fixed as noted in PR 202978.
Comment 5 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 12:36:23 UTC
Re-open. Will add more context/information shortly.
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 12:57:02 UTC
Canonicalize summary including info from bug 202978 and bug 201913 (duplicates)

Note 1: This was reproducible using net/syncthing > 0.11.18 (including 0.11.23), noticed after a port upgrade between those two versions, until some later version when upstream issue #2090 [1] was resolved.

Note 2: The upstream commit [2] *only* changed the default IPv6 multicast address. The underlying userland causing kernel/host crash is what this issue is for.

[1] https://github.com/syncthing/syncthing/issues/2090
[2] https://github.com/syncthing/syncthing/commit/40d01001322a8da682038937c3c1f2b8c17c63d8

This probably needs much more attention given the scope of versions that were reported to be affected, and the apparent triviality of the causing factor.

@George, can you cc individuals / re-assign the issue as necessary please.
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 12:57:58 UTC
*** Bug 201913 has been marked as a duplicate of this bug. ***
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 12:58:54 UTC
*** Bug 202978 has been marked as a duplicate of this bug. ***
Comment 9 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 15:07:10 UTC
Created attachment 173704 [details]
panic backtrace 10.2-RELEASE / syncthing: 0.12.17

Add upstream forum thread reference (containing stacktrace) and attach here for completeness
Comment 10 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-15 15:29:27 UTC
In my comment 6, "until some later version when upstream issue #2090 [1] was resolved" may not be the case, as per Charles comment 2

@Charles, can you please provide more detail on your system configuration that is affected? In particular:

- The latest version of freebsd you have reproduced the issue on
- The latest version of net/syncthing you have reproduced the issue with (Please also specify whether: port, package, latest or quarterly, or upstream)

If you can provide a gdb backtrace as an attachment, that would also be fantastic.
Comment 11 Bruce M Simpson freebsd_committer 2016-08-15 16:04:38 UTC
We don't do anything with the the P (Prefix) or T (Transient) bits in IPv6 multicast in FreeBSD. So it's unclear how this could have affected a resolution of the issue, and this is what makes it very difficult to draw any conclusions from the upstream change for #2090 apparently resolving an issue.

Consider: the multicast address scope does not change; the first 16 bits of the address syncthing use remain: FFX2. The FF denotes multicast; the 2 nibble denotes link-local.

However, the bits in nibble X do change. Link-local groups normally set X=0. syncthing pivots between X=1 (ipv6 group is transient and not well known) and X=3 (transient group, based on unicast prefix).

But nothing I've seen in FreeBSD directly references these bits.

Disclaimer: I haven't observed or reproduced the issue myself, and it's been many, many years since I wrote this code. It seems to me that it could have been triggered by a race elsewhere; obviously, this isn't going to show up in the kernel backtrace posted to syncthing's support forums, as is the nature of races.
Comment 12 Andrey V. Elsukov freebsd_committer 2016-08-15 16:47:36 UTC
Is there someone, who is able reproduce this panic and can provide some debug info?
Several questions:
1. Do you use some sort of VPN that create/destroys interfaces?
2. Can you save a core dump from this panic, and then run 
# kgdb /boot/kernel/kernel /var/crash/vmcore.N
(kgdb) l *0xfffffffxxxxxxx 

where 0xfffffffxxxxxxx - is address from panic message "instruction pointer     = 0x20:0xfffffffxxxxxxx"
Comment 13 Steve Wills freebsd_committer 2017-09-17 21:05:32 UTC
*** Bug 213953 has been marked as a duplicate of this bug. ***