arp segfaulted after I ran "service netif restart". lo0 was up, and wlan0 was active but I was having problems with it. em0 was there but not connected to anything. lldb backtrace gave me: * thread #1, name = 'arp', stop reason = signal SIGSEGV * frame #0: 0x0000000000202bb0 arp`___lldb_unnamed_symbol7$$arp + 304 frame #1: 0x0000000000202a13 arp`___lldb_unnamed_symbol6$$arp + 451 frame #2: 0x00000000002024a4 arp`___lldb_unnamed_symbol4$$arp + 452 frame #3: 0x000000000020210f arp`___lldb_unnamed_symbol1$$arp + 271 which didn't tell me a lot.
@Corvid Can you provide some additional information, including: - Exact FreeBSD version (uname -a) - /var/run/dmesg.boot (as an attachment) - Complete network configuration (/etc/rc.conf and others, sanitized where necessary) Also, can you describe the reproducibility of this issue. Is it always reproducible? Sometimes? Once?
Created attachment 210316 [details] dmesg.boot
FreeBSD 12.1-RELEASE-p1 FreeBSD 12.1-RELEASE-p1 GENERIC amd64
rc.conf wlans_iwn0="wlan0" ifconfig_wlan0="NOAUTO WPA DHCP" ifconfig_em0="NOAUTO DHCP" dhclient_program="/usr/local/sbin/dhclient" for wpa_supplicant, I can't guess what network I would have been on. reproducibility: I hadn’t had dns strangeness again, so I hadn’t tried again, figuring the conditions were different, but let’s just go ahead and see what happens right now. Well. Silly me. It _is_ reproducible. Here’s what happened to show up in the system log: Dec 29 23:40:53 <kern.info> kernel: in6_purgeaddr: err=65, destination address delete failed Dec 29 23:40:53 <kern.info> kernel: lo0: link state changed to DOWN Dec 29 23:40:53 <daemon.info> dhclient[90852]: DHCPRELEASE of [...] on wlan0 to [...] port 67 Dec 29 23:40:53 <kern.info> kernel: wlan0: deletion failed: 3 Dec 29 23:40:53 <kern.info> kernel: wlan0: link state changed to DOWN Dec 29 23:40:53 <daemon.notice> wpa_supplicant[69228]: wlan0: CTRL-EVENT-DISCONNECTED bssid=[...] reason=3 locally_generated=1 Dec 29 23:40:53 <daemon.notice> wpa_supplicant[69228]: wlan0: CTRL-EVENT-TERMINATING Dec 29 23:40:53 <kern.info> kernel: wlan0: bpf attached Dec 29 23:40:53 <kern.info> syslogd: last message repeated 1 times Dec 29 23:40:53 <kern.info> kernel: wlan0: Ethernet address: [...] Dec 29 23:40:53 <kern.info> kernel: lo0: link state changed to UP Dec 29 23:40:54 <daemon.err> dhclient[90852]: receive_packet failed on wlan0: No error: 0 Dec 29 23:40:54 <kern.info> kernel: pid 43492 (arp), jid 0, uid 0: exited on signal 11 (core dumped)
> Dec 29 23:40:54 <kern.info> kernel: pid 43492 (arp), jid 0, uid 0: exited on signal 11 (core dumped) You have coredump, so this should be easy to fix if you rebuild /usr/sbin/arp binary with debugging symbols. Provided you have sources installed: cd /usr/src/usr.sbin/arp && make clean obj depend && make "DEBUG_FLAGS=-O0 -g" && install /usr/obj/usr/src/usr.sbin/arp/arp /usr/sbin/ Then reproduce the problem to make new coredump and post backtrace: gdb /usr/sbin/arp arp.core backtrace
It gets into print_entry(), and there’s a loop with careless errors in the loop condition: for (p = ifnameindex; p && ifnameindex->if_index && ifnameindex->if_name; p++) { if (p->if_index == sdl->sdl_index) { xo_emit(" on {:interface/%s}", p->if_name); break; } } sdl->sdl_index is 3, and the list of interfaces has indices 1, 2, 4, 0, some big randomish number, etc. So p just keeps running along until it happens to detect a 3 or segfault.
There is a check for zero index in the code. Zero index means end of list. Can you please share the backtrace and debugging arp binary with coredump, too?
Created attachment 210368 [details] proposed fix Nevermind. Please try this patch instead.
with the patch, I no longer get segfaults :)
(In reply to corvid from comment #9) I've committed the fix and will merge it in a week.
A commit references this bug: Author: eugen Date: Thu Jan 16 08:11:45 UTC 2020 New revision: 356778 URL: https://svnweb.freebsd.org/changeset/base/356778 Log: MFC r356551: arp(8): avoid segfaulting due to out-of-bounds memory access Fix obvious mistake that sometimes results in reading memory past end of an array. PR: 242784 Changes: _U stable/12/ stable/12/usr.sbin/arp/arp.c
A commit references this bug: Author: eugen Date: Thu Jan 16 08:16:12 UTC 2020 New revision: 356779 URL: https://svnweb.freebsd.org/changeset/base/356779 Log: MFC r356551: arp(8): avoid segfaulting due to out-of-bounds memory access Fix obvious mistake that sometimes results in reading memory past end of an array. PR: 242784 Changes: _U stable/11/ stable/11/usr.sbin/arp/arp.c
A commit references this bug: Author: eugen Date: Thu Jan 16 08:27:31 UTC 2020 New revision: 356780 URL: https://svnweb.freebsd.org/changeset/base/356780 Log: MFC r356551: arp(8): avoid segfaulting due to out-of-bounds memory access Fix obvious mistake that sometimes results in reading memory past end of an array. PR: 242784 Changes: _U stable/10/ stable/10/usr.sbin/arp/arp.c
Fixed in all branches downto stable/10. Thank you for the report!
^Triage: Track merges