Bug 209682 - [panic] [netinet] arptimer race
Summary: [panic] [netinet] arptimer race
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Andrey V. Elsukov
URL:
Keywords: crash, patch
Depends on:
Blocks:
 
Reported: 2016-05-21 17:43 UTC by Dmitry Chagin
Modified: 2018-10-17 10:05 UTC (History)
7 users (show)

See Also:
koobs: mfc-stable11+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Chagin freebsd_committer freebsd_triage 2016-05-21 17:43:01 UTC
Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer     = 0x20:0xffffffff80774d1d
stack pointer           = 0x28:0xfffffe032dfe9420
frame pointer           = 0x28:0xfffffe032dfe94f0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 11 (swi4: clock (0))


#9  0xffffffff80cc289a in trap_check (frame=0xfffffe032dfe9370) at /home/git/head/sys/amd64/amd64/trap.c:658
#10 <signal handler called>
#11 0xffffffff80774d1d in _rw_wlock_cookie (c=0xdeadc0dedeadc2de,
    file=0xffffffff80e40db0 "/home/git/head/sys/netinet/if_ether.c", line=0x10a)
    at /home/git/head/sys/kern/kern_rwlock.c:264
#12 0xffffffff80a0c0a3 in arptimer (arg=0xfffff8000ffc3600) at /home/git/head/sys/netinet/if_ether.c:266
#13 0xffffffff807a663c in softclock_call_cc (c=0xfffff8000ffc36b8, cc=0xffffffff815f1280 <cc_cpu>, direct=0x0)
    at /home/git/head/sys/kern/kern_timeout.c:723
#14 0xffffffff807a6e6c in softclock (arg=0xffffffff815f1280 <cc_cpu>) at /home/git/head/sys/kern/kern_timeout.c:861
#15 0xffffffff80724c48 in intr_event_execute_handlers (p=0xfffff80003df8000, ie=0xfffff80003e17500)
    at /home/git/head/sys/kern/kern_intr.c:1262
#16 0xffffffff80725a37 in ithread_execute_handlers (p=0xfffff80003df8000, ie=0xfffff80003e17500)
    at /home/git/head/sys/kern/kern_intr.c:1275
#17 0xffffffff80725866 in ithread_loop (arg=0xfffff80003e2e040) at /home/git/head/sys/kern/kern_intr.c:1356
#18 0xffffffff807201f9 in fork_exit (callout=0xffffffff807256f0 <ithread_loop>, arg=0xfffff80003e2e040,
    frame=0xfffffe032dfe9ac0) at /home/git/head/sys/kern/kern_fork.c:1035
#19 <signal handler called>

(kgdb) up 12
#12 0xffffffff80a0c0a3 in arptimer (arg=0xfffff8000ffc3600) at /home/git/head/sys/netinet/if_ether.c:266
266             IF_AFDATA_LOCK(ifp);
(kgdb) l
261
262             callout_stop(&lle->lle_timer);
263
264             /* XXX: LOR avoidance. We still have ref on lle. */
265             LLE_WUNLOCK(lle);
266             IF_AFDATA_LOCK(ifp);
267             LLE_WLOCK(lle);
268
269             /* Guard against race with other llentry_free(). */
270             if (lle->la_flags & LLE_LINKED) {
(kgdb) p *ifp
Cannot access memory at address 0xdeadc0dedeadc0de
Comment 1 Hans Petter Selasky freebsd_committer freebsd_triage 2016-05-21 22:01:03 UTC
Fix is possibly here - waiting for review:

https://reviews.freebsd.org/D4605

Known issue.

--HPS
Comment 2 Hans Petter Selasky freebsd_committer freebsd_triage 2016-10-06 13:26:42 UTC
Reproducable with recent 11-STABLE doing ifconfig up/down in sequence.

--HPS

270                     LLE_REMREF(lle);
271                     lltable_unlink_entry(lle->lle_tbl, lle);
272             }
273             IF_AFDATA_UNLOCK(ifp);
274
275             size_t pkts_dropped = llentry_free(lle);
276
277             ARPSTAT_ADD(dropped, pkts_dropped);
278             ARPSTAT_INC(timeouts);
279
(kgdb) bt
#0  doadump (textdump=1) at pcpu.h:221
#1  0xffffffff80ae755e in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80ae7b2b in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80ae7963 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80fc0361 in trap_fatal (frame=0xfffffe0f92146720, eva=952) at /usr/src/sys/amd64/amd64/trap.c:841
#5  0xffffffff80fc0553 in trap_pfault (frame=0xfffffe0f92146720, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:691
#6  0xffffffff80fbfafc in trap (frame=0xfffffe0f92146720) at /usr/src/sys/amd64/amd64/trap.c:442
#7  0xffffffff80fa2e41 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff80ae2abb in __rw_wlock_hard (c=<value optimized out>, tid=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /usr/src/sys/kern/kern_rwlock.c:830
#9  0xffffffff80c60086 in arptimer (arg=<value optimized out>) at /usr/src/sys/netinet/if_ether.c:265
#10 0xffffffff80b01d6a in softclock_call_cc (c=<value optimized out>, cc=<value optimized out>, direct=<value optimized out>) at /usr/src/sys/kern/kern_timeout.c:729
#11 0xffffffff80b022e4 in softclock (arg=<value optimized out>) at /usr/src/sys/kern/kern_timeout.c:867
#12 0xffffffff80aa239f in intr_event_execute_handlers (p=<value optimized out>, ie=<value optimized out>) at /usr/src/sys/kern/kern_intr.c:1262
#13 0xffffffff80aa29b6 in ithread_loop (arg=<value optimized out>) at /usr/src/sys/kern/kern_intr.c:1275
#14 0xffffffff80a9efe5 in fork_exit (callout=0xffffffff80aa28f0 <ithread_loop>, arg=0xfffff8000ad70700, frame=0xfffffe0f92146ac0) at /usr/src/sys/kern/kern_fork.c:1038
#15 0xffffffff80fa337e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611
#16 0x0000000000000000 in ?? ()
Comment 3 Andriy Voskoboinyk freebsd_committer freebsd_triage 2016-12-04 00:49:17 UTC
> _rw_wlock_cookie

Was seen on r309481 (only once, ifconfig(8) was not touched).
Comment 4 Dave Baukus 2018-01-23 20:15:57 UTC
I have hit this 2x in the last 4 days on stable 11.1:
I'm still looking for a reliable way to recreate.


#7  0xffffffff807835de in trap (frame=0xfffffe000038a7b0) at pcpu.h:222
#8  0xffffffff80767781 in calltrap () at /usr/home/daveb/p4work/SpectraBSD_stable/sys/amd64/amd64/exception.S:236
#9  0xffffffff804ba395 in _rw_wlock_cookie (c=0xdeadc0dedeadc2de, file=0xffffffff808839e0 "/usr/home/daveb/p4work/SpectraBSD_stable/sys/netinet/if_ether.c", line=287) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/kern/kern_rwlock.c:272
#10 0xffffffff805d348f in arptimer (arg=0xfffff8014c37a200) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/netinet/if_ether.c:287
#11 0xffffffff804d5fb8 in softclock_call_cc (c=0xfffff8014c37a2b8, cc=<value optimized out>, direct=<value optimized out>) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/kern/kern_timeout.c:729
#12 0xffffffff804d639c in softclock (arg=0xffffffff80ed1100) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/kern/kern_timeout.c:867
#13 0xffffffff80483fc9 in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff8011f130000) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/kern/kern_intr.c:1262
#14 0xffffffff80484666 in ithread_loop (arg=0xfffff8011f0fd5c0) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/kern/kern_intr.c:1275
#15 0xffffffff804816b4 in fork_exit (callout=0xffffffff804845b0 <ithread_loop>, arg=0xfffff8011f0fd5c0, frame=0xfffffe000038aac0) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/kern/kern_fork.c:1042
#16 0xffffffff80767cbe in fork_trampoline () at /usr/home/daveb/p4work/SpectraBSD_stable/sys/amd64/amd64/exception.S:611
#17 0x0000000000000000 in ?? ()
(kgdb) frame 10
#10 0xffffffff805d348f in arptimer (arg=0xfffff8014c37a200) at /usr/home/daveb/p4work/SpectraBSD_stable/sys/netinet/if_ether.c:287
287             IF_AFDATA_LOCK(ifp);


(kgdb) set $lle = (struct llentry *)arg
(kgdb) p $lle
$6 = (struct llentry *) 0xfffff8014c37a200
(kgdb) p *$lle
$7 = {
  lle_next = {
    le_next = 0x0, 
    le_prev = 0xfffff80c29487a00
  }, 
  r_l3addr = {
    addr4 = {
      s_addr = 0
    }, 
    addr6 = {
      __u6_addr = {
        __u6_addr8 = 0xfffff8014c37a210 "", 
        __u6_addr16 = 0xfffff8014c37a210, 
        __u6_addr32 = 0xfffff8014c37a210
      }
    }
  }, 
  r_linkdata = 0xfffff8014c37a220 "$\001�#\234�", 
  r_hdrlen = 14 '\016', 
  spare0 = 0xfffff8014c37a239 "", 
  r_flags = 1, 
  r_skip_req = 1, 
  lle_tbl = 0xfffff80c6a454d00, 
  lle_head = 0xfffff80c29487a00, 
  lle_free = 0xffffffff805db670 <in_lltable_destroy_lle>, 
  la_hold = 0x0, 
  la_numheld = 0, 
  la_expire = 11654, 
  la_flags = 1, 
  la_asked = 0, 
  la_preempt = 5, 
  ln_state = 2, 
  ln_router = 0, 
  ln_ntick = 0, 
  lle_remtime = 0, 
  lle_hittime = 0, 
  lle_refcnt = 1, 
  ll_addr = 0xfffff8014c37a220 "$\001�#\234�", 
  lle_chain = {
    le_next = 0x0, 
    le_prev = 0x0
  }, 
  lle_timer = {
    c_links = {
      le = {
        le_next = 0x0, 
        le_prev = 0xffffffff80ed1218
      }, 
      sle = {
        sle_next = 0x0
      }, 
      tqe = {
        tqe_next = 0x0, 
        tqe_prev = 0xffffffff80ed1218
      }
    }, 
    c_time = 50057632637741, 
    c_precision = 268435437, 
    c_arg = 0xfffff8014c37a200, 
    c_func = 0xffffffff805d31f0 <arptimer>, 
    c_lock = 0x0, 
    c_flags = 0, 
    c_iflags = 144, 
    c_cpu = 0
  }, 
  lle_lock = {
    lock_object = {
      lo_name = 0xffffffff8087f95e "lle", 
      lo_flags = 90374144, 
      lo_data = 0, 
      lo_witness = 0xfffffe00009fe180
    }, 
    rw_lock = 1
  }, 
  req_mtx = {
    lock_object = {
      lo_name = 0xffffffff808848ce "lle req", 
      lo_flags = 16973824, 
      lo_data = 0, 
      lo_witness = 0xfffffe00009fe200
    }, 
    mtx_lock = 4
  }
}
(kgdb) p *$lle->lle_tbl
$9 = {
  llt_link = {
    sle_next = 0xdeadc0dedeadc0de
  }, 
  llt_af = -559038242, 
  llt_hsize = -559038242, 
  lle_head = 0xdeadc0dedeadc0de, 
  llt_ifp = 0xdeadc0dedeadc0de, 
  llt_lookup = 0xdeadc0dedeadc0de, 
  llt_alloc_entry = 0xdeadc0dedeadc0de, 
  llt_delete_entry = 0xdeadc0dedeadc0de, 
  llt_prefix_free = 0xdeadc0dedeadc0de, 
  llt_dump_entry = 0xdeadc0dedeadc0de, 
  llt_hash = 0xdeadc0dedeadc0de, 
  llt_match_prefix = 0xdeadc0dedeadc0de, 
  llt_free_entry = 0xdeadc0dedeadc0de, 
  llt_foreach_entry = 0xdeadc0dedeadc0de, 
  llt_link_entry = 0xdeadc0dedeadc0de, 
  llt_unlink_entry = 0xdeadc0dedeadc0de, 
  llt_fill_sa_entry = 0xdeadc0dedeadc0de, 
  llt_free_tbl = 0xdeadc0dedeadc0de
}
(kgdb)
Comment 5 Hans Petter Selasky freebsd_committer freebsd_triage 2018-01-23 22:42:39 UTC
I think you can reproduce like this:

DEF_ROUTE=x.x.x.x # replace with IP of your default route
vmstat -m | grep llt
arp -d $DEF_ROUTE
ping $DEF_ROUTE
vmstat -m | grep llt
arp -d $DEF_ROUTE
ping $DEF_ROUTE
Comment 6 Dave Baukus 2018-01-24 18:18:05 UTC
(In reply to Hans Petter Selasky from comment #5)

I am unable to reproduce with that test.
Comment 7 Hans Petter Selasky freebsd_committer freebsd_triage 2018-01-24 18:41:24 UTC
OK, but do you see the memory leakage. Try to run some other software which causes kernel memory to be allocated and freed.
Comment 8 Hans Petter Selasky freebsd_committer freebsd_triage 2018-01-24 18:41:57 UTC
... while running the script above.
Comment 9 Dave Baukus 2018-01-26 00:17:46 UTC
(In reply to Hans Petter Selasky from comment #8)
I've tried your test and more in an infinite loop; all I get is various crashes in the IPv6 code (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225438). I even recompiled without INET6; no arptimer() crashes yet.
Comment 10 Hans Petter Selasky freebsd_committer freebsd_triage 2018-01-26 07:43:08 UTC
Does applying the fix at:

https://reviews.freebsd.org/D4605

resolve those IPv6 based panics?

It might be you're just lucky that the IPv4 arptimer is not panicing.

--HPS
Comment 11 Dave Baukus 2018-03-01 23:03:00 UTC
After 38 days of quiet, this bug has reemerged in our CI tests.
I have hit this again 4 times in the past 24 hours (2x on stable 11.0 and 2x on stable 11.1). 

I'm going to try Hans Peter Salasky's proposed patch ...
Comment 12 Dave Baukus 2018-06-29 16:24:15 UTC
Back in March, 2018, I added hselasky's patch to our 11.1 base. I and have not seen this issues since. I did not add the patch to our 11.0 release; this bug reared its head in our 11.0 based release stream.

I do know what triggers this event; I can not reproduce at will, but its seems that Hans Petter's patch is worthy.

#6  0xffffffff8081908c in trap (frame=0xfffffe07c4ff37b0) at sys/amd64/amd64/trap.c:442
#7  0xffffffff807fd341 in calltrap () at sys/amd64/amd64/exception.S:236
#8  0xffffffff80625e64 in arptimer (arg=<value optimized out>) at sys/netinet/if_ether.c:265
#9  0xffffffff8050dd0a in softclock_call_cc (c=<value optimized out>, cc=<value optimized out>, direct=<value optimized out>)

Again the ifp from lle->lle_tbl->llt_ifp is invalid:

p *$lle->lle_tbl
$4 = {
  llt_link = {
    sle_next = 0x0
  }, 
  llt_af = 0, 
  llt_hsize = 0, 
  lle_head = 0xfffff800286a6000, 
  llt_ifp = 0x2, 
  llt_lookup = 0xc1, 
  llt_alloc_entry = 0, 
  llt_delete_entry = 0x9, 
  llt_prefix_free = 0xc0, 
  llt_dump_entry = 0x1667, 
  llt_hash = 0, 
  llt_match_prefix = 0x1d97, 
  llt_free_entry = 0x1a667b, 
  llt_foreach_entry = 0x1, 
  llt_link_entry = 0, 
  llt_unlink_entry = 0, 
  llt_fill_sa_entry = 0, 
  llt_free_tbl = 0
}
Comment 13 Hans Petter Selasky freebsd_committer freebsd_triage 2018-07-11 20:48:41 UTC
Alexander: Any news? When will the patches discussed hit the tree?
Comment 14 commit-hook freebsd_committer freebsd_triage 2018-07-17 11:34:05 UTC
A commit references this bug:

Author: ae
Date: Tue Jul 17 11:33:23 UTC 2018
New revision: 336405
URL: https://svnweb.freebsd.org/changeset/base/336405

Log:
  Move invoking of callout_stop(&lle->lle_timer) into llentry_free().

  This deduplicates the code a bit, and also implicitly adds missing
  callout_stop() to in[6]_lltable_delete_entry() functions.

  PR:		209682, 225927
  Submitted by:	hselasky (previous version)
  MFC after:	2 weeks
  Differential Revision:	https://reviews.freebsd.org/D4605

Changes:
  head/sys/net/if_llatbl.c
  head/sys/netinet/in.c
  head/sys/netinet6/in6.c
Comment 15 Kubilay Kocak freebsd_committer freebsd_triage 2018-07-18 09:54:58 UTC
Assign to committer resolving
Comment 16 commit-hook freebsd_committer freebsd_triage 2018-08-08 16:10:13 UTC
A commit references this bug:

Author: ae
Date: Wed Aug  8 16:09:29 UTC 2018
New revision: 337460
URL: https://svnweb.freebsd.org/changeset/base/337460

Log:
  MFC r336405:
    Move invoking of callout_stop(&lle->lle_timer) into llentry_free().

    This deduplicates the code a bit, and also implicitly adds missing
    callout_stop() to in[6]_lltable_delete_entry() functions.

    PR:		209682, 225927

Changes:
_U  stable/11/
  stable/11/sys/net/if_llatbl.c
  stable/11/sys/netinet/in.c
  stable/11/sys/netinet6/in6.c
Comment 17 Andrey V. Elsukov freebsd_committer freebsd_triage 2018-10-15 08:36:30 UTC
Fixed in head/ and stable/11. Thanks!