Bug 257302 - net/syncthing: Panic in in6_getmulti at /usr/src/sys/netinet6/in6_mcast.c:451
Summary: net/syncthing: Panic in in6_getmulti at /usr/src/sys/netinet6/in6_mcast.c:451
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2021-07-21 01:50 UTC by Alex Vasylenko
Modified: 2021-07-24 00:33 UTC (History)
5 users (show)

See Also:
koobs: maintainer-feedback? (shurd)
koobs: mfc-stable13?
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments
proposed patch (untested) (648 bytes, patch)
2021-07-22 12:40 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Vasylenko 2021-07-21 01:50:39 UTC
It appears that syncthing (installed from port net/syncthing) manages to panic the system shortly after start-up as follows:

Jul 20 13:43:02 foam syslogd: kernel boot file is /boot/kernel/kernel
Jul 20 13:43:02 foam kernel:
Jul 20 13:43:02 foam syslogd: last message repeated 1 times
Jul 20 13:43:02 foam kernel: Fatal trap 12: page fault while in kernel mode
Jul 20 13:43:02 foam kernel: cpuid = 1; apic id = 01
Jul 20 13:43:02 foam kernel: fault virtual address      = 0x28
Jul 20 13:43:02 foam kernel: fault code         = supervisor read data, page not present
Jul 20 13:43:02 foam kernel: instruction pointer        = 0x20:0xffffffff80e04a0e
Jul 20 13:43:02 foam kernel: stack pointer              = 0x28:0xfffffe005c61cfa0
Jul 20 13:43:02 foam kernel: frame pointer              = 0x28:0xfffffe005c61d060
Jul 20 13:43:02 foam kernel: code segment               = base rx0, limit 0xfffff, type 0x1b
Jul 20 13:43:02 foam kernel:                    = DPL 0, pres 1, long 1, def32 0, gran 1
Jul 20 13:43:02 foam kernel: processor eflags   = interrupt enabled, resume, IOPL = 0
Jul 20 13:43:02 foam kernel: current process            = 1226 (syncthing)
Jul 20 13:43:02 foam kernel: trap number                = 12
Jul 20 13:43:02 foam kernel: panic: page fault
Jul 20 13:43:02 foam kernel: cpuid = 1
Jul 20 13:43:02 foam kernel: time = 1626802891
Jul 20 13:43:02 foam kernel: KDB: stack backtrace:
Jul 20 13:43:02 foam kernel: #0 0xffffffff80c0ae35 at kdb_backtrace+0x65
Jul 20 13:43:02 foam kernel: #1 0xffffffff80bbf0eb at vpanic+0x17b
Jul 20 13:43:02 foam kernel: #2 0xffffffff80bbef63 at panic+0x43
Jul 20 13:43:02 foam kernel: #3 0xffffffff8108f941 at trap_fatal+0x391
Jul 20 13:43:02 foam kernel: #4 0xffffffff8108f99f at trap_pfault+0x4f
Jul 20 13:43:02 foam kernel: #5 0xffffffff8108efe6 at trap+0x286
Jul 20 13:43:02 foam kernel: #6 0xffffffff81066d48 at calltrap+0x8
Jul 20 13:43:02 foam kernel: #7 0xffffffff80e06e3d at ip6_setmoptions+0x101d
Jul 20 13:43:02 foam kernel: #8 0xffffffff80e13929 at ip6_ctloutput+0x229
Jul 20 13:43:02 foam kernel: #9 0xffffffff80c584d6 at sosetopt+0xe6
Jul 20 13:43:02 foam kernel: #10 0xffffffff80c5da70 at kern_setsockopt+0xb0
Jul 20 13:43:02 foam kernel: #11 0xffffffff80c5d9b4 at sys_setsockopt+0x24
Jul 20 13:43:02 foam kernel: #12 0xffffffff810904f7 at amd64_syscall+0x387

(kgdb) where
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bbed05 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bbf143 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880
#4  0xffffffff80bbef63 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:807
#5  0xffffffff8108f941 in trap_fatal (frame=0xfffffe005c3ceee0, eva=40) at /usr/src/sys/amd64/amd64/trap.c:921
#6  0xffffffff8108f99f in trap_pfault (frame=0xfffffe005c3ceee0, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:739
#7  0xffffffff8108efe6 in trap (frame=0xfffffe005c3ceee0) at /usr/src/sys/amd64/amd64/trap.c:405
#8  <signal handler called>
#9  0xffffffff80e04a0e in in6_getmulti (ifp=<optimized out>, group=0xfffffe005c3cf118, pinm=<optimized out>) at /usr/src/sys/netinet6/in6_mcast.c:451
#10 in6_joingroup_locked (ifp=<optimized out>, mcaddr=0xfffffe005c3cf118, imf=0xfffff80012748dc0, pinm=0xfffff80012748dd8, delay=0) at /usr/src/sys/netinet6/in6_mcast.c:1241
#11 0xffffffff80e06e3d in in6p_join_group (inp=0xfffff800129673d0, sopt=<optimized out>) at /usr/src/sys/netinet6/in6_mcast.c:2089
#12 ip6_setmoptions (inp=0xfffff800129673d0, sopt=<optimized out>) at /usr/src/sys/netinet6/in6_mcast.c:2685
#13 0xffffffff80e13929 in ip6_ctloutput (so=0xfffff80012986a38, sopt=0xfffffe005c3cfb98) at /usr/src/sys/netinet6/ip6_output.c:1929
#14 0xffffffff80c584d6 in sosetopt (so=0xfffff80012986a38, sopt=0xfffffe005c3cfb98) at /usr/src/sys/kern/uipc_socket.c:2761
#15 0xffffffff80c5da70 in kern_setsockopt (td=0xfffff800126f9000, s=<optimized out>, level=<optimized out>, name=<optimized out>, val=<optimized out>, valseg=<optimized out>, valsize=136) at /usr/src/sys/kern/uipc_syscalls.c:1272
#16 0xffffffff80c5d9b4 in sys_setsockopt (td=0xfffff80003023b40, uap=<optimized out>) at /usr/src/sys/kern/uipc_syscalls.c:1233
#17 0xffffffff810904f7 in syscallenter (td=0xfffff800126f9000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#18 amd64_syscall (td=0xfffff800126f9000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1163
#19 <signal handler called>
#20 0x00000000004c310a in ?? ()

(kgdb) f 9
#9  0xffffffff80e04a0e in in6_getmulti (ifp=<optimized out>, group=0xfffffe005c3cf118, pinm=<optimized out>) at /usr/src/sys/netinet6/in6_mcast.c:451
451             inm->in6m_mli = MLD_IFINFO(ifp);

(kgdb) p inm->in6m_ifp.if_dname
$1 = 0xffffffff8249c850 <ipfwname> "ipfw"

I suspect the code path in the app is:

- list all network interfaces having MULTICAST flag:
https://github.com/syncthing/syncthing/blob/main/lib/upnp/upnp.go#L91
https://github.com/syncthing/syncthing/blob/main/lib/upnp/upnp.go#L103

- join
https://github.com/syncthing/syncthing/blob/main/lib/upnp/upnp.go#L163
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2021-07-21 02:04:41 UTC
^Triage: Request feedback from Stephen who may be able to shed light on this area of the code
Comment 2 Alex Vasylenko 2021-07-21 02:27:38 UTC
I was mistaken attributing this to mcast join on ipfw0 -- my simple program doing just that worked fine.

Although it's unclear what user code is doing, we could look at crash dump to understand what's going on in the kernel
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2021-07-21 02:41:02 UTC
Thanks Alex. It's fine for the summary to be the first relevent frame in the backtrace for now, unless/until isolated further.

And thank you for your report
Comment 4 Alex Vasylenko 2021-07-21 04:55:14 UTC
The following golang program reproduces the issue, no root required; it was a mcast join on ipfw0 after all, but has to be an IPv6 join.

$ cat mcast6_join.go
package main

import (
        "fmt"
        "net"
        "golang.org/x/net/ipv6"
)

func main() {
        addr := "[ff12::8384]:21027"

        gaddr, err := net.ResolveUDPAddr("udp6", addr)
        if err != nil {
                fmt.Println(err)
                return
        }

        conn, err := net.ListenPacket("udp6", addr)
        if err != nil {
                fmt.Println(err)
                return
        }
        defer conn.Close()

        intf, err := net.InterfaceByName("ipfw0")
        if err != nil {
                fmt.Println(err)
                return
        }

        pconn := ipv6.NewPacketConn(conn)

        result := pconn.JoinGroup(intf, &net.UDPAddr{IP: gaddr.IP})
        if result != nil {
                fmt.Println("IPv6 join", intf.Name, "failed:", result)
        } else {
                fmt.Println("IPv6 join", intf.Name, "success")
        }
}

net/syncthing has something like the above in https://github.com/syncthing/syncthing/blob/main/lib/beacon/multicast.go#L101 (except they perform join on all interfaces in a loop ignoring multicast flag, but that's beside the point)

A relevant part from vmcore that I failed to include in the original report was this:
(kgdb) f 14
#14 0xffffffff80c584d6 in sosetopt (so=0xfffff80012986a38, sopt=0xfffffe005c3cfb98) at /usr/src/sys/kern/uipc_socket.c:2761
2761    error = (*so->so_proto->pr_ctloutput)(so, sopt);
(kgdb) set print pretty
(kgdb) p *sopt
$1 = {
  sopt_dir = SOPT_SET,
  sopt_level = 41,
  sopt_name = 80,
  sopt_val = 0xc000320900,
  sopt_valsize = 136,
  sopt_td = 0xfffff800126f9000
}

optname 80 is MCAST_JOIN_GROUP
Comment 5 Andrey V. Elsukov freebsd_committer 2021-07-22 12:18:59 UTC
(In reply to Alex Vasylenko from comment #0)
>Jul 20 13:43:02 foam kernel: fault virtual address      = 0x28
>Jul 20 13:43:02 foam kernel: fault code         = supervisor read data, page not present
>#9  0xffffffff80e04a0e in in6_getmulti (ifp=<optimized out>, group=0xfffffe005c3cf118, pinm=<optimized out>) at /usr/src/sys/netinet6/in6_mcast.c:451

It is NULL pointer dereference in the line:
inm->in6m_mli = MLD_IFINFO(ifp);

MLD_IFINFO() macro tries to dereference if_afdata[AF_INET6]->mld_info. 0x28 corresponds to mld_ifinfo field:

(kgdb) p/x offsetof(struct in6_ifextra, mld_ifinfo)
$1 = 0x28

ipfw0 interface does not have properly initialized if_afdata since IFT_PFLOG interfaces do not support IPv6 (look at in6_domifattach()).

Thus I think we need to add somewhere the check that adapter doesn't support IPv6 multicasts.
Comment 6 Andrey V. Elsukov freebsd_committer 2021-07-22 12:40:19 UTC
Created attachment 226609 [details]
proposed patch (untested)

Can you try this patch?
Comment 7 Steve Wills freebsd_committer 2021-07-22 18:51:41 UTC
Is this the same issue that was originally reported in bug #200846 ?
Comment 8 Alex Vasylenko 2021-07-22 20:47:26 UTC
(In reply to Steve Wills from comment #7)
looks similar - it has to do with local peer discovery as in that report
Comment 9 Alex Vasylenko 2021-07-24 00:33:37 UTC
(In reply to Andrey V. Elsukov from comment #5)
Thanks Andrey - the patch works. My test program now runs to completion:

% ./SyncThingPanic
IPv6 join ipfw0 failed: setsockopt: operation not supported by device
% uname -r
13.0-STABLE