Bug 242784 - arp: segfault on service netif restart
Summary: arp: segfault on service netif restart
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Eugene Grosbein
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2019-12-21 22:54 UTC by corvid
Modified: 2020-01-25 02:59 UTC (History)
1 user (show)

See Also:
koobs: mfc-stable12+
koobs: mfc-stable11+
koobs: mfc-stable10+


Attachments
dmesg.boot (10.55 KB, text/plain)
2019-12-29 23:33 UTC, corvid
no flags Details
proposed fix (508 bytes, patch)
2020-01-01 07:17 UTC, Eugene Grosbein
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description corvid 2019-12-21 22:54:31 UTC
arp segfaulted after I ran "service netif restart". lo0 was up, and wlan0 was active but I was having problems with it. em0 was there but not connected to anything.

lldb backtrace gave me:

* thread #1, name = 'arp', stop reason = signal SIGSEGV
  * frame #0: 0x0000000000202bb0 arp`___lldb_unnamed_symbol7$$arp + 304
    frame #1: 0x0000000000202a13 arp`___lldb_unnamed_symbol6$$arp + 451
    frame #2: 0x00000000002024a4 arp`___lldb_unnamed_symbol4$$arp + 452
    frame #3: 0x000000000020210f arp`___lldb_unnamed_symbol1$$arp + 271

which didn't tell me a lot.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-12-24 00:00:48 UTC
@Corvid

Can you provide some additional information, including:

- Exact FreeBSD version (uname -a)
- /var/run/dmesg.boot (as an attachment)
- Complete network configuration (/etc/rc.conf and others, sanitized where necessary)

Also, can you describe the reproducibility of this issue. Is it always reproducible? Sometimes? Once?
Comment 2 corvid 2019-12-29 23:33:54 UTC
Created attachment 210316 [details]
dmesg.boot
Comment 3 corvid 2019-12-29 23:36:10 UTC
FreeBSD 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC  amd64
Comment 4 corvid 2019-12-29 23:47:38 UTC
rc.conf
wlans_iwn0="wlan0"
ifconfig_wlan0="NOAUTO WPA DHCP"
ifconfig_em0="NOAUTO DHCP"
dhclient_program="/usr/local/sbin/dhclient"

for wpa_supplicant, I can't guess what network I would have been on.

reproducibility: I hadn’t had dns strangeness again, so I hadn’t tried again, figuring the conditions were different, but let’s just go ahead and see what happens right now.

Well. Silly me. It _is_ reproducible. Here’s what happened to show up in the system log:
Dec 29 23:40:53 <kern.info> kernel: in6_purgeaddr: err=65, destination address delete failed
Dec 29 23:40:53 <kern.info> kernel: lo0: link state changed to DOWN
Dec 29 23:40:53 <daemon.info> dhclient[90852]: DHCPRELEASE of [...] on wlan0 to [...] port 67
Dec 29 23:40:53 <kern.info> kernel: wlan0: deletion failed: 3
Dec 29 23:40:53 <kern.info> kernel: wlan0: link state changed to DOWN
Dec 29 23:40:53 <daemon.notice> wpa_supplicant[69228]: wlan0: CTRL-EVENT-DISCONNECTED bssid=[...] reason=3 locally_generated=1
Dec 29 23:40:53 <daemon.notice> wpa_supplicant[69228]: wlan0: CTRL-EVENT-TERMINATING 
Dec 29 23:40:53 <kern.info> kernel: wlan0: bpf attached
Dec 29 23:40:53 <kern.info> syslogd: last message repeated 1 times
Dec 29 23:40:53 <kern.info> kernel: wlan0: Ethernet address: [...]
Dec 29 23:40:53 <kern.info> kernel: lo0: link state changed to UP
Dec 29 23:40:54 <daemon.err> dhclient[90852]: receive_packet failed on wlan0: No error: 0
Dec 29 23:40:54 <kern.info> kernel: pid 43492 (arp), jid 0, uid 0: exited on signal 11 (core dumped)
Comment 5 Eugene Grosbein freebsd_committer freebsd_triage 2019-12-30 07:39:08 UTC
> Dec 29 23:40:54 <kern.info> kernel: pid 43492 (arp), jid 0, uid 0: exited on signal 11 (core dumped)

You have coredump, so this should be easy to fix if you rebuild /usr/sbin/arp binary with debugging symbols. Provided you have sources installed:

cd /usr/src/usr.sbin/arp && make clean obj depend && make "DEBUG_FLAGS=-O0 -g" && install /usr/obj/usr/src/usr.sbin/arp/arp /usr/sbin/

Then reproduce the problem to make new coredump and post backtrace:

gdb /usr/sbin/arp arp.core
backtrace
Comment 6 corvid 2019-12-31 18:58:23 UTC
It gets into print_entry(), and there’s a loop with careless errors
in the loop condition:

        for (p = ifnameindex; p && ifnameindex->if_index &&
            ifnameindex->if_name; p++) {
                if (p->if_index == sdl->sdl_index) {
                        xo_emit(" on {:interface/%s}", p->if_name);
                        break;
                }
        }

sdl->sdl_index is 3, and the list of interfaces has indices
1, 2, 4, 0, some big randomish number, etc. So p just keeps running
along until it happens to detect a 3 or segfault.
Comment 7 Eugene Grosbein freebsd_committer freebsd_triage 2020-01-01 07:12:42 UTC
There is a check for zero index in the code. Zero index means end of list. Can you please share the backtrace and debugging arp binary with coredump, too?
Comment 8 Eugene Grosbein freebsd_committer freebsd_triage 2020-01-01 07:17:09 UTC
Created attachment 210368 [details]
proposed fix

Nevermind. Please try this patch instead.
Comment 9 corvid 2020-01-02 21:37:15 UTC
with the patch, I no longer get segfaults :)
Comment 10 Eugene Grosbein freebsd_committer freebsd_triage 2020-01-09 12:17:52 UTC
(In reply to corvid from comment #9)

I've committed the fix and will merge it in a week.
Comment 11 commit-hook freebsd_committer freebsd_triage 2020-01-16 08:11:56 UTC
A commit references this bug:

Author: eugen
Date: Thu Jan 16 08:11:45 UTC 2020
New revision: 356778
URL: https://svnweb.freebsd.org/changeset/base/356778

Log:
  MFC r356551: arp(8): avoid segfaulting due to out-of-bounds memory access

  Fix obvious mistake that sometimes results in reading memory
  past end of an array.

  PR:		242784

Changes:
_U  stable/12/
  stable/12/usr.sbin/arp/arp.c
Comment 12 commit-hook freebsd_committer freebsd_triage 2020-01-16 08:16:57 UTC
A commit references this bug:

Author: eugen
Date: Thu Jan 16 08:16:12 UTC 2020
New revision: 356779
URL: https://svnweb.freebsd.org/changeset/base/356779

Log:
  MFC r356551: arp(8): avoid segfaulting due to out-of-bounds memory access

  Fix obvious mistake that sometimes results in reading memory
  past end of an array.

  PR:             242784

Changes:
_U  stable/11/
  stable/11/usr.sbin/arp/arp.c
Comment 13 commit-hook freebsd_committer freebsd_triage 2020-01-16 08:27:59 UTC
A commit references this bug:

Author: eugen
Date: Thu Jan 16 08:27:31 UTC 2020
New revision: 356780
URL: https://svnweb.freebsd.org/changeset/base/356780

Log:
  MFC r356551: arp(8): avoid segfaulting due to out-of-bounds memory access

  Fix obvious mistake that sometimes results in reading memory
  past end of an array.

  PR:             242784

Changes:
_U  stable/10/
  stable/10/usr.sbin/arp/arp.c
Comment 14 Eugene Grosbein freebsd_committer freebsd_triage 2020-01-16 08:29:19 UTC
Fixed in all branches downto stable/10. Thank you for the report!
Comment 15 Kubilay Kocak freebsd_committer freebsd_triage 2020-01-25 02:59:00 UTC
^Triage: Track merges