Bug 279653 - Page fault in in6_selecthlim
Summary: Page fault in in6_selecthlim
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-06-10 19:56 UTC by Daniel Ponte
Modified: 2024-12-07 08:20 UTC (History)
9 users (show)

See Also:
tom+fbsdbugzilla: needs_errata?


Attachments
test nh_ifp (499 bytes, patch)
2024-06-17 09:35 UTC, Zhenlei Huang
no flags Details | Diff
shell script that reproduces the crash (1.11 KB, application/x-shellscript)
2024-06-22 03:38 UTC, takahiro.kurosawa
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Ponte 2024-06-10 19:56:13 UTC
14-STABLE eff27c3872300e594e0b410364a02302fc555121 built 4 June.

This machine is a gateway and does indeed use ipv6. It runs dns/blocky (a filtering resolver, like pi-hole written in go) in a jail that lives on ZFS. The rest of the system is on UFS. I had just rolled back the jail to an old snapshot when this happened, but I'm not positive that is related, even though it appears to have crashed after I hit enter on the zfs rollback command. It looks like it crashed when blocky went to close a TCP connection (the upstream resolver is DNS-over-https using ipv6).

Message buffer:
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x10
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80b10416
stack pointer           = 0x28:0xfffffe00b4245980
frame pointer           = 0x28:0xfffffe00b42459b0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 11116 (blocky)
rdi: fffff8004c742000 rsi: 000000000000001c rdx: fffff801dba0a278
rcx: fffff8004c742000  r8: 00000000ffffffbd  r9: 0000000000000018
rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00b42459b0
r10: fffff8004ca20e20 r11: fffff8005ec6b880 r12: fffff8003fb4e898
r13: 0000000000000000 r14: fffffe00b424598c r15: 0000000000010480
trap number             = 12
panic: page fault
cpuid = 3
time = 1718033759
KDB: stack backtrace:
#0 0xffffffff808b899d at kdb_backtrace+0x5d
#1 0xffffffff8086b701 at vpanic+0x131
#2 0xffffffff8086b5c3 at panic+0x43
#3 0xffffffff80d6325b at trap_fatal+0x40b
#4 0xffffffff80d632a6 at trap_pfault+0x46
#5 0xffffffff80d3b718 at calltrap+0x8
#6 0xffffffff80adda9a at tcp_default_output+0x1cda
#7 0xffffffff80aef193 at tcp_usr_disconnect+0x83
#8 0xffffffff8090ff05 at soclose+0x75
#9 0xffffffff8080a5c1 at _fdrop+0x11
#10 0xffffffff8080d82a at closef+0x24a
#11 0xffffffff8080cee6 at fdescfree+0x4e6
#12 0xffffffff8081fa2e at exit1+0x49e
#13 0xffffffff8081f58d at sys_exit+0xd
#14 0xffffffff80d63b15 at amd64_syscall+0x115
#15 0xffffffff80d3c02b at fast_syscall_common+0xf8

kgdb backtrace:
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff8086b297 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523
#3  0xffffffff8086b76e in vpanic (fmt=0xffffffff80e79c24 "%s", ap=ap@entry=0xfffffe00b42457e0) at /usr/src/sys/kern/kern_shutdown.c:967
#4  0xffffffff8086b5c3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891
#5  0xffffffff80d6325b in trap_fatal (frame=0xfffffe00b42458c0, eva=16) at /usr/src/sys/amd64/amd64/trap.c:952
#6  0xffffffff80d632a6 in trap_pfault (frame=<unavailable>, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760
#7  <signal handler called>
#8  0xffffffff80b10416 in in6_selecthlim (inp=inp@entry=0xfffff8005ea2b540, ifp=ifp@entry=0x0) at /usr/src/sys/netinet6/in6_src.c:850
#9  0xffffffff80adda9a in tcp_default_output (tp=0xfffff8005ea2b540) at /usr/src/sys/netinet/tcp_output.c:1444
#10 0xffffffff80aef193 in tcp_usr_disconnect (so=<optimized out>) at /usr/src/sys/netinet/tcp_usrreq.c:705
#11 0xffffffff8090ff05 in sodisconnect (so=0xfffff80136b683c0) at /usr/src/sys/kern/uipc_socket.c:1436
#12 soclose (so=0xfffff80136b683c0) at /usr/src/sys/kern/uipc_socket.c:1271
#13 0xffffffff8080a5c1 in fo_close (fp=0xfffff8004c742000, fp@entry=0xfffff8019bc50730, td=0x1c, td@entry=0xfffff8019bc50730) at /usr/src/sys/sys/file.h:392
#14 _fdrop (fp=0xfffff8004c742000, fp@entry=0xfffff8019bc50730, td=0x1c, td@entry=0xfffff801db4cb000) at /usr/src/sys/kern/kern_descrip.c:3670
#15 0xffffffff8080d82a in closef (fp=fp@entry=0xfffff8019bc50730, td=td@entry=0xfffff801db4cb000) at /usr/src/sys/kern/kern_descrip.c:2843
#16 0xffffffff8080cee6 in fdescfree_fds (td=0xfffff801db4cb000, fdp=0xfffffe00b1260860) at /usr/src/sys/kern/kern_descrip.c:2566
#17 fdescfree (td=td@entry=0xfffff801db4cb000) at /usr/src/sys/kern/kern_descrip.c:2609
#18 0xffffffff8081fa2e in exit1 (td=0xfffff801db4cb000, rval=<optimized out>, signo=signo@entry=0) at /usr/src/sys/kern/kern_exit.c:404
#19 0xffffffff8081f58d in sys_exit (td=0xfffff8004c742000, uap=<optimized out>) at /usr/src/sys/kern/kern_exit.c:210
#20 0xffffffff80d63b15 in syscallenter (td=0xfffff801db4cb000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:191
#21 amd64_syscall (td=0xfffff801db4cb000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1194
#22 <signal handler called>
#23 0x000000000047398b in ?? ()
Backtrace stopped: Cannot access memory at address 0x8702b7ee8
Comment 1 Zhenlei Huang freebsd_committer freebsd_triage 2024-06-11 01:51:22 UTC
(In reply to Daniel Ponte from comment #0)
The stack trace is weird. The caller `sys/netinet/tcp_output.c`
```
1444                 ip6->ip6_hlim = in6_selecthlim(inp, NULL);
```

The callee, `sys/netinet6/in6_src.c`:

```
843 int
844 in6_selecthlim(struct inpcb *inp, struct ifnet *ifp)
845 {
846 
847         if (inp && inp->in6p_hops >= 0)
848                 return (inp->in6p_hops);
849         else if (ifp)
850                 return (ND_IFINFO(ifp)->chlim);
851         else if (inp && !IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_faddr)) {
...
    }
```

The line 850 of should never hit as `ifp` is NULL, the backtrace also shows that clearly.

That is quite odd ... Is it possible that kgdb report the wrong line number ?
Comment 2 Andrey V. Elsukov freebsd_committer freebsd_triage 2024-06-11 10:09:57 UTC
(In reply to Zhenlei Huang from comment #1)

fault virtual address = 0x10 corresponds to offset of nd_ifinfo field in struct in6_ifextra that is returned by if_getafdata().
Comment 3 Zhenlei Huang freebsd_committer freebsd_triage 2024-06-15 01:12:45 UTC
(In reply to Andrey V. Elsukov from comment #2)
Emm, I guess we have to disassemble the kernel file to figure out what happens behind, if this can not be repeated.
Comment 4 Daniel Ponte 2024-06-15 21:28:50 UTC
ffffffff80b10380 <in6_selecthlim>:
ffffffff80b10380: 55                    pushq   %rbp
ffffffff80b10381: 48 89 e5              movq    %rsp, %rbp
ffffffff80b10384: 41 56                 pushq   %r14
ffffffff80b10386: 53                    pushq   %rbx
ffffffff80b10387: 48 83 ec 20           subq    $0x20, %rsp
ffffffff80b1038b: 48 85 ff              testq   %rdi, %rdi
ffffffff80b1038e: 74 74                 je      0xffffffff80b10404 <in6_selecthlim+0x84>
ffffffff80b10390: 0f b7 87 04 01 00 00  movzwl  0x104(%rdi), %eax
ffffffff80b10397: 66 85 c0              testw   %ax, %ax
ffffffff80b1039a: 0f 89 9a 00 00 00     jns     0xffffffff80b1043a <in6_selecthlim+0xba>
ffffffff80b103a0: 48 85 f6              testq   %rsi, %rsi
ffffffff80b103a3: 75 64                 jne     0xffffffff80b10409 <in6_selecthlim+0x89>
ffffffff80b103a5: 83 bf 94 00 00 00 00  cmpl    $0x0, 0x94(%rdi)
ffffffff80b103ac: 75 1b                 jne     0xffffffff80b103c9 <in6_selecthlim+0x49>
ffffffff80b103ae: 83 bf 98 00 00 00 00  cmpl    $0x0, 0x98(%rdi)
ffffffff80b103b5: 75 12                 jne     0xffffffff80b103c9 <in6_selecthlim+0x49>
ffffffff80b103b7: 83 bf 9c 00 00 00 00  cmpl    $0x0, 0x9c(%rdi)
ffffffff80b103be: 75 09                 jne     0xffffffff80b103c9 <in6_selecthlim+0x49>
ffffffff80b103c0: 83 bf a0 00 00 00 00  cmpl    $0x0, 0xa0(%rdi)
ffffffff80b103c7: 74 57                 je      0xffffffff80b10420 <in6_selecthlim+0xa0>
ffffffff80b103c9: 0f b7 9f 8e 00 00 00  movzwl  0x8e(%rdi), %ebx
ffffffff80b103d0: 48 81 c7 94 00 00 00  addq    $0x94, %rdi
ffffffff80b103d7: 4c 8d 75 dc           leaq    -0x24(%rbp), %r14
ffffffff80b103db: 48 8d 55 ec           leaq    -0x14(%rbp), %rdx
ffffffff80b103df: 4c 89 f6              movq    %r14, %rsi
ffffffff80b103e2: e8 19 dd 01 00        callq   0xffffffff80b2e100 <in6_splitscope>
ffffffff80b103e7: 8b 55 ec              movl    -0x14(%rbp), %edx
ffffffff80b103ea: 89 df                 movl    %ebx, %edi
ffffffff80b103ec: 4c 89 f6              movq    %r14, %rsi
ffffffff80b103ef: 31 c9                 xorl    %ecx, %ecx
ffffffff80b103f1: 45 31 c0              xorl    %r8d, %r8d
ffffffff80b103f4: e8 07 46 ff ff        callq   0xffffffff80b04a00 <fib6_lookup>
ffffffff80b103f9: 48 85 c0              testq   %rax, %rax
ffffffff80b103fc: 74 22                 je      0xffffffff80b10420 <in6_selecthlim+0xa0>
ffffffff80b103fe: 48 8b 78 20           movq    0x20(%rax), %rdi
ffffffff80b10402: eb 08                 jmp     0xffffffff80b1040c <in6_selecthlim+0x8c>
ffffffff80b10404: 48 85 f6              testq   %rsi, %rsi
ffffffff80b10407: 74 17                 je      0xffffffff80b10420 <in6_selecthlim+0xa0>
ffffffff80b10409: 48 89 f7              movq    %rsi, %rdi
ffffffff80b1040c: be 1c 00 00 00        movl    $0x1c, %esi
ffffffff80b10411: e8 0a a3 e8 ff        callq   0xffffffff8099a720 <if_getafdata>
ffffffff80b10416: 48 8b 40 10           movq    0x10(%rax), %rax
ffffffff80b1041a: 0f b6 40 1c           movzbl  0x1c(%rax), %eax
ffffffff80b1041e: eb 1a                 jmp     0xffffffff80b1043a <in6_selecthlim+0xba>
ffffffff80b10420: 65 48 8b 04 25 00 00 00 00    movq    %gs:0x0, %rax
ffffffff80b10429: 48 8b 80 90 06 00 00  movq    0x690(%rax), %rax
ffffffff80b10430: 48 8b 40 28           movq    0x28(%rax), %rax
ffffffff80b10434: 8b 80 48 5c 33 81     movl    -0x7ecca3b8(%rax), %eax
ffffffff80b1043a: 48 83 c4 20           addq    $0x20, %rsp
ffffffff80b1043e: 5b                    popq    %rbx
ffffffff80b1043f: 41 5e                 popq    %r14
ffffffff80b10441: 5d                    popq    %rbp
ffffffff80b10442: c3                    retq
ffffffff80b10443: 66 66 66 66 2e 0f 1f 84 00 00 00 00 00        nopw    %cs:(%rax,%rax)
Comment 5 Zhenlei Huang freebsd_committer freebsd_triage 2024-06-17 09:35:40 UTC
Created attachment 251522 [details]
test nh_ifp
Comment 6 Zhenlei Huang freebsd_committer freebsd_triage 2024-06-17 09:36:55 UTC
(In reply to Daniel Ponte from comment #4)
I do not see any problems with the disassembled code from my limited x86-64 ASM knowledge.

There're only two entries that will run to ffffffff80b10416, one is
> ffffffff80b103a0: 48 85 f6              testq   %rsi, %rsi
> ffffffff80b103a3: 75 64                 jne     0xffffffff80b10409 <in6_selecthlim+0x89>

, the another one is
> ffffffff80b103fe: 48 8b 78 20           movq    0x20(%rax), %rdi
> ffffffff80b10402: eb 08                 jmp     0xffffffff80b1040c <in6_selecthlim+0x8c>

So I suspect the line number 850 by kgdb is wrong, and the correct one should be 861.

I have no evidences but may you please have a try with the patch ?
Comment 7 Zhenlei Huang freebsd_committer freebsd_triage 2024-06-17 10:01:08 UTC
The C calling convention for x86-64 I refer: https://people.freebsd.org/~obrien/amd64-elf-abi.pdf
Comment 8 takahiro.kurosawa 2024-06-17 23:08:27 UTC
(In reply to Daniel Ponte from comment #4)
Can you show me the output of
  print ((struct ifnet *)0xfffff8004c742000)->if_afdata[28]
  print *(struct ifnet *)0xfffff8004c742000 
on kgdb?

Probably %rdi still held the ifnet pointer at the fatal fault because if_getafdata() was a tiny function  
(I can confirm if the disassemble output of if_getafdata is provided).
Comment 9 Daniel Ponte 2024-06-18 13:40:15 UTC
kgdb output:

(kgdb) print ((struct ifnet *)0xfffff8004c742000)->if_afdata[28]
$1 = (void *) 0x0
(kgdb) print *(struct ifnet *)0xfffff8004c742000
$2 = {if_link = {cstqe_next = 0x0}, if_clones = {le_next = 0x0, le_prev = 0xfffff8004c897828}, if_groups = {cstqh_first = 0x0, cstqh_last = 0xfffff8004c742018}, if_alloctype = 6 '\006',
  if_numa_domain = 255 '\377', if_softc = 0x0, if_llsoftc = 0x0, if_l2com = 0x0, if_dname = 0xffffffff834e2000 <epairname> "epair", if_dunit = 0, if_index = 23, if_idxgen = 0,
  if_xname = "epair0b\000\000\000\000\000\000\000\000", if_description = 0x0, if_flags = 2131970, if_drv_flags = 0, if_capabilities = 8, if_capabilities2 = 0, if_capenable = 8, if_capenable2 = 0,
  if_linkmib = 0x0, if_linkmiblen = 0, if_refcount = 4, if_type = 6 '\006', if_addrlen = 6 '\006', if_hdrlen = 14 '\016', if_link_state = 1 '\001', if_mtu = 1500, if_metric = 0, if_baudrate = 10000000000,
  if_hwassist = 0, if_epoch = 77, if_lastchange = {tv_sec = 1718033759, tv_usec = 498647}, if_snd = {ifq_head = 0x0, ifq_tail = 0x0, ifq_len = 0, ifq_maxlen = 50, ifq_mtx = {lock_object = {
        lo_name = 0xfffff8004c742058 "epair0b", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, ifq_drv_head = 0x0, ifq_drv_tail = 0x0, ifq_drv_len = 0, ifq_drv_maxlen = 50, altq_type = 0,
    altq_flags = 1, altq_disc = 0x0, altq_ifp = 0xfffff8004c742000, altq_enqueue = 0x0, altq_dequeue = 0x0, altq_request = 0x0, altq_tbr = 0x0, altq_cdnr = 0x0}, if_linktask = {ta_link = {stqe_next = 0x0},
    ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 '\000', ta_func = 0xffffffff8099ab60 <do_link_state_change>, ta_context = 0xfffff8004c742000}, if_addmultitask = {ta_link = {stqe_next = 0x0},
    ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 '\000', ta_func = 0xffffffff8099add0 <if_siocaddmulti>, ta_context = 0xfffff8004c742000}, if_addr_lock = {lock_object = {
      lo_name = 0xffffffff80e985c6 "if_addr_lock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, if_addrhead = {cstqh_first = 0x0, cstqh_last = 0xfffff8004c7421c0}, if_multiaddrs = {
    cstqh_first = 0x0, cstqh_last = 0xfffff8004c7421d0}, if_amcount = 0, if_addr = 0xfffff8004c921000, if_hw_addr = 0xfffff80007d7e7d0,
  if_broadcastaddr = 0xffffffff80fa0530 <etherbroadcastaddr> "\377\377\377\377\377\377", if_afdata_lock = {lock_object = {lo_name = 0xffffffff80eea36d "if_afdata", lo_flags = 16973824, lo_data = 0,
      lo_witness = 0x0}, mtx_lock = 0}, if_afdata = {0x0 <repeats 44 times>}, if_afdata_initialized = 0, if_fib = 0, if_vnet = 0xfffff80016c43580, if_home_vnet = 0xfffff800010af9c0, if_vlantrunk = 0x0,
  if_bpf = 0xffffffff80f9f0b0 <dead_bpf_if>, if_pcount = 0, if_bridge = 0x0, if_lagg = 0x0, if_pf_kif = 0x0, if_carp = 0x0, if_label = 0x0, if_netmap = 0x0, if_output = 0xffffffff809a3760 <ifdead_output>,
  if_input = 0xffffffff809a3780 <ifdead_input>, if_bridge_input = 0x0, if_bridge_output = 0x0, if_bridge_linkstate = 0x0, if_start = 0xffffffff809a3790 <ifdead_start>,
  if_ioctl = 0xffffffff809a37a0 <ifdead_ioctl>, if_init = 0xffffffff834e1020 <epair_init>, if_resolvemulti = 0xffffffff809a37b0 <ifdead_resolvemulti>, if_qflush = 0xffffffff809a37d0 <ifdead_qflush>,
  if_transmit = 0xffffffff809a37e0 <ifdead_transmit>, if_reassign = 0xffffffff809a5070 <ether_reassign>, if_get_counter = 0xffffffff809a3800 <ifdead_get_counter>,
  if_requestencap = 0xffffffff809a4fa0 <ether_requestencap>, if_counters = {0xfffffe012c2c88b8, 0xfffffe012c2c88b0, 0xfffffe012c2c8878, 0xfffffe012c2c8870, 0xfffffe012c2c8868, 0xfffffe012c2c8860,
    0xfffffe012c2c8858, 0xfffffe012c2c8850, 0xfffffe012c2c8848, 0xfffffe012c2c8840, 0xfffffe012c2c8838, 0xfffffe012c2c8830}, if_hw_tsomax = 65518, if_hw_tsomaxsegcount = 35, if_hw_tsomaxsegsize = 2048,
  if_snd_tag_alloc = 0xffffffff809a3810 <ifdead_snd_tag_alloc>, if_ratelimit_query = 0xffffffff809a3820 <ifdead_ratelimit_query>, if_ratelimit_setup = 0x0, if_pcp = 255 '\377', if_debugnet_methods = 0x0,
  if_epoch_ctx = {data = {0x0, 0x0}}, if_ispare = {0, 0, 0, 0}}

As far as testing the patch, I can build with it, but this probably won't be reproducible anyway. I'm not totally certain what was happening when it crashed.
Comment 10 Andrey V. Elsukov freebsd_committer freebsd_triage 2024-06-19 09:16:07 UTC
(In reply to Daniel Ponte from comment #9)

it looks like an epair(4) device was detached and some packets were going to send through, but were delayed. Then afdata[AF_INET6] was freed due to epair ifnet detach and access to this data triggers panic.
Comment 11 Kristof Provost freebsd_committer freebsd_triage 2024-06-19 09:52:22 UTC
We've been running into this in pfSense for a while as well: https://redmine.pfsense.org/issues/14431

We wound up applying this band-aid: https://github.com/pfsense/FreeBSD-src/commit/9834d8bb0d3344cd82552c3cd16e5b2d84543d8f 
That's very much not a fix, but it does seem to mitigate the panics.
Comment 12 takahiro.kurosawa 2024-06-20 03:07:26 UTC
I have not reproduced the crash but I guess the following patch for
if_detach_internal() would fix the problem:

----
--- a/sys/net/if.c
+++ b/sys/net/if.c
@@ -1235,6 +1235,8 @@ if_detach_internal(struct ifnet *ifp, bool vmove)
 #ifdef VIMAGE
 finish_vnet_shutdown:
 #endif
+       epoch_wait_preempt(net_epoch_preempt);
+       NET_EPOCH_DRAIN_CALLBACKS();
        /*
         * We cannot hold the lock over dom_ifdetach calls as they might
         * sleep, for example trying to drain a callout, thus open up the
----

The routing entries that are related with the detaching ifnet are removed
in if_purgeaddrs() and rt_flushifroutes().  It seems that the transport
layer protects itself from freeing objects with NET_EPOCH_ENTER/EXIT.
So there should be no threads that still reference nhop_objects
related to the ifnet after rt_flushifroutes() + epoch_wait_preempt().
I am not sure that NET_EPOCH_DRAIN_CALLBACKS() is required but it is
probably harmless.
Comment 13 takahiro.kurosawa 2024-06-22 03:38:13 UTC
Created attachment 251613 [details]
shell script that reproduces the crash

I have been able to reproduce the crash with the attached script (repro.sh).
With the patch in comment #12 applied, the crash does not occur so far.
Comment 14 takahiro.kurosawa 2024-06-22 04:26:30 UTC
Patch posted on the Phabricator, review D45690.
Comment 15 Zhenlei Huang freebsd_committer freebsd_triage 2024-06-23 01:35:03 UTC
Xref another report by bz@ https://lists.freebsd.org/archives/freebsd-net/2024-May/004981.html
Comment 16 takahiro.kurosawa 2024-07-20 07:22:56 UTC
I've retracted review D45690 because it does not completely fix the problem.
Comment 17 tom+fbsdbugzilla 2024-10-19 19:57:22 UTC
What about upstreaming PFSense's workaround until a proper solution can be found? https://github.com/pfsense/FreeBSD-src/commit/9834d8bb0d3344cd82552c3cd16e5b2d84543d8f