200846 – Stopping net/syncthing (userland) service causes kernel panic in mld_change_state() on 10.1-10.3-3, 11-CURRENT

Bug 200846 - Stopping net/syncthing (userland) service causes kernel panic in mld_change_state() on 10.1-10.3-3, 11-CURRENT

Summary: Stopping net/syncthing (userland) service causes kernel panic in mld_change_s...

Status:	Closed Feedback Timeout

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	Normal Affects Many People
Assignee:	Kubilay Kocak

URL:
Keywords:	crash, needs-patch, needs-qa

Duplicates (3):	201913 202978 213953 (view as bug list)
Depends on:
Blocks:

Reported:	2015-06-14 05:27 UTC by Steve Wills
Modified:	2021-05-20 02:12 UTC (History)
CC List:	14 users (show)

See Also:	https://github.com/syncthing/syncthing/issues/2090 228412

Flags:	koobs: mfc-stable13? koobs: mfc-stable12? koobs: mfc-stable11?

Attachments
panic backtrace 10.2-RELEASE / syncthing: 0.12.17 (6.55 KB, text/plain) 2016-08-15 15:07 UTC, Kubilay Kocak	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Steve Wills freebsd_committer

2015-06-14 05:27:04 UTC

I'm running 11-CURRENT r283640. Starting syncthing (net/syncthing) works fine, but killing it produces a panic in mld_change_state(). There is an image of the panic here:

https://people.freebsd.org/~swills/panic_2015061401.jpg

It seems to be reliably reproducible.

Disabling localAnnounceEnabled should avoid it, it's announcing on [ff32::5222]:21026.

Comment 1 freebsdbugzilla 2016-02-28 07:33:42 UTC

This bug is really annoying and makes it close to impossible to keep syncthing in local network. I'm happy to help with debugging if necessary. 

Steps to reproduce:
1. turn on sycnthing
2. local peer discovery is enabled by default
3. turn off syncthing or restart 
4. crashes 9/10

Comment 2 Charles Ross 2016-06-27 15:54:15 UTC

The same problem exists on 10.3-RELEASE-p5 using any recent version of Syncthing (from the version available in the 2016Q2 ports branch/pkg repo to the latest version available on Syncthing's website at the time of this posting). Syncthing developer states this is a FreeBSD kernel bug.

Also confirmed that disabling the 'localAnnounceEnabled' option is a temporary workaround, but makes the product somewhat useless in LAN environments.

Comment 3 freebsdbugzilla 2016-06-27 16:02:25 UTC

Hi Charles, 

it's not longer reproducible on FBSD11 32bits and latest syncthing installed by pkg.

Comment 4 Brad Davis freebsd_committer

2016-08-15 12:33:15 UTC

Fixed as noted in PR 202978.

Comment 5 Kubilay Kocak freebsd_committer

2016-08-15 12:36:23 UTC

Re-open. Will add more context/information shortly.

Comment 6 Kubilay Kocak freebsd_committer

2016-08-15 12:57:02 UTC

Canonicalize summary including info from bug 202978 and bug 201913 (duplicates)

Note 1: This was reproducible using net/syncthing > 0.11.18 (including 0.11.23), noticed after a port upgrade between those two versions, until some later version when upstream issue #2090 [1] was resolved.

Note 2: The upstream commit [2] *only* changed the default IPv6 multicast address. The underlying userland causing kernel/host crash is what this issue is for.

[1] https://github.com/syncthing/syncthing/issues/2090
[2] https://github.com/syncthing/syncthing/commit/40d01001322a8da682038937c3c1f2b8c17c63d8

This probably needs much more attention given the scope of versions that were reported to be affected, and the apparent triviality of the causing factor.

@George, can you cc individuals / re-assign the issue as necessary please.

Comment 7 Kubilay Kocak freebsd_committer

2016-08-15 12:57:58 UTC

*** Bug 201913 has been marked as a duplicate of this bug. ***

Comment 8 Kubilay Kocak freebsd_committer

2016-08-15 12:58:54 UTC

*** Bug 202978 has been marked as a duplicate of this bug. ***

Comment 9 Kubilay Kocak freebsd_committer

2016-08-15 15:07:10 UTC

Created attachment 173704 [details]
panic backtrace 10.2-RELEASE / syncthing: 0.12.17

Add upstream forum thread reference (containing stacktrace) and attach here for completeness

Comment 10 Kubilay Kocak freebsd_committer

2016-08-15 15:29:27 UTC

In my comment 6, "until some later version when upstream issue #2090 [1] was resolved" may not be the case, as per Charles comment 2

@Charles, can you please provide more detail on your system configuration that is affected? In particular:

- The latest version of freebsd you have reproduced the issue on
- The latest version of net/syncthing you have reproduced the issue with (Please also specify whether: port, package, latest or quarterly, or upstream)

If you can provide a gdb backtrace as an attachment, that would also be fantastic.

Comment 11 Bruce M Simpson freebsd_committer

2016-08-15 16:04:38 UTC

We don't do anything with the the P (Prefix) or T (Transient) bits in IPv6 multicast in FreeBSD. So it's unclear how this could have affected a resolution of the issue, and this is what makes it very difficult to draw any conclusions from the upstream change for #2090 apparently resolving an issue.

Consider: the multicast address scope does not change; the first 16 bits of the address syncthing use remain: FFX2. The FF denotes multicast; the 2 nibble denotes link-local.

However, the bits in nibble X do change. Link-local groups normally set X=0. syncthing pivots between X=1 (ipv6 group is transient and not well known) and X=3 (transient group, based on unicast prefix).

But nothing I've seen in FreeBSD directly references these bits.

Disclaimer: I haven't observed or reproduced the issue myself, and it's been many, many years since I wrote this code. It seems to me that it could have been triggered by a race elsewhere; obviously, this isn't going to show up in the kernel backtrace posted to syncthing's support forums, as is the nature of races.

Comment 12 Andrey V. Elsukov freebsd_committer

2016-08-15 16:47:36 UTC

Is there someone, who is able reproduce this panic and can provide some debug info?
Several questions:
1. Do you use some sort of VPN that create/destroys interfaces?
2. Can you save a core dump from this panic, and then run 
# kgdb /boot/kernel/kernel /var/crash/vmcore.N
(kgdb) l *0xfffffffxxxxxxx 

where 0xfffffffxxxxxxx - is address from panic message "instruction pointer     = 0x20:0xfffffffxxxxxxx"

Comment 13 Steve Wills freebsd_committer

2017-09-17 21:05:32 UTC

*** Bug 213953 has been marked as a duplicate of this bug. ***

Comment 14 Kubilay Kocak freebsd_committer

2021-05-20 02:12:38 UTC

^Triage: Resolving feedback timeout. If this issue is still reproducible with current (latest) versions of syncthing and supported (non-EoL) FreeBSD versions, please re-open the issue with additional details