Bug 203409

Summary:

page fault in tcp_do_segment (r287759 suspected)

Product:

Base System

Reporter:

Andriy Gapon <avg>

Component:

kern

Assignee:

George V. Neville-Neil <gnn>

Status:

Closed FIXED

Severity:

Affects Only Me

CC:

gnn, hiren, jch

Priority:

---

Version:

CURRENT

Hardware:

Any

OS:

Any

Attachments:

Description	Flags
fix / work-around	none

Description Andriy Gapon freebsd_committer

2015-09-28 11:19:01 UTC

I got the following panic on amd64 r288066:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x10
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8073dd68
stack pointer           = 0x28:0xfffffe02b4cb9640
frame pointer           = 0x28:0xfffffe02b4cb9700
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq260: re0)
trap number             = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8041c9fb = db_trace_self_wrapper+0x2b/frame
0xfffffe02b4cb9110
kdb_backtrace() at 0xffffffff80668239 = kdb_backtrace+0x39/frame 0xfffffe02b4cb91c0
vpanic() at 0xffffffff806334d2 = vpanic+0x152/frame 0xfffffe02b4cb9200
panic() at 0xffffffff80633213 = panic+0x43/frame 0xfffffe02b4cb9260
trap_fatal() at 0xffffffff8081fdc3 = trap_fatal+0x343/frame 0xfffffe02b4cb92c0
trap_pfault() at 0xffffffff8081fff6 = trap_pfault+0x206/frame 0xfffffe02b4cb9350
trap() at 0xffffffff8081f70a = trap+0x4ca/frame 0xfffffe02b4cb9560
trap_check() at 0xffffffff8082011a = trap_check+0x2a/frame 0xfffffe02b4cb9580
calltrap() at 0xffffffff80807db3 = calltrap+0x8/frame 0xfffffe02b4cb9580
--- trap 0xc, rip = 0xffffffff8073dd68, rsp = 0xfffffe02b4cb9650, rbp =
0xfffffe02b4cb9700 ---
tcp_do_segment() at 0xffffffff8073dd68 = tcp_do_segment+0xbc8/frame
0xfffffe02b4cb9700
tcp_input() at 0xffffffff8073c899 = tcp_input+0x999/frame 0xfffffe02b4cb9810
ip_input() at 0xffffffff80733cbe = ip_input+0xbe/frame 0xfffffe02b4cb9860
netisr_dispatch_src() at 0xffffffff807121fe = netisr_dispatch_src+0x17e/frame
0xfffffe02b4cb98d0
netisr_dispatch() at 0xffffffff80712481 = netisr_dispatch+0x11/frame
0xfffffe02b4cb98e0
ether_demux() at 0xffffffff8070984b = ether_demux+0x13b/frame 0xfffffe02b4cb9910
ether_input_internal() at 0xffffffff8070a3ec = ether_input_internal+0x32c/frame
0xfffffe02b4cb9950
ether_nh_input() at 0xffffffff8070a093 = ether_nh_input+0x23/frame
0xfffffe02b4cb9960
netisr_dispatch_src() at 0xffffffff807121fe = netisr_dispatch_src+0x17e/frame
0xfffffe02b4cb99d0
netisr_dispatch() at 0xffffffff80712481 = netisr_dispatch+0x11/frame
0xfffffe02b4cb99e0
ether_input() at 0xffffffff80709b3c = ether_input+0x2c/frame 0xfffffe02b4cb9a00
re_rxeof() at 0xffffffff8049b858 = re_rxeof+0x228/frame 0xfffffe02b4cb9a60
re_intr_msi() at 0xffffffff8049d78f = re_intr_msi+0xbf/frame 0xfffffe02b4cb9aa0
intr_event_execute_handlers() at 0xffffffff805fec1f =
intr_event_execute_handlers+0x12f/frame 0xfffffe02b4cb9b00
ithread_execute_handlers() at 0xffffffff805ff74c =
ithread_execute_handlers+0x2c/frame 0xfffffe02b4cb9b20
ithread_loop() at 0xffffffff805ff5bb = ithread_loop+0x5b/frame 0xfffffe02b4cb9b80
fork_exit() at 0xffffffff805fc23b = fork_exit+0xdb/frame 0xfffffe02b4cb9bf0
fork_trampoline() at 0xffffffff808082ee = fork_trampoline+0xe/frame
0xfffffe02b4cb9bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

(kgdb) fr 9
#9  0xffffffff8073dd68 in tcp_do_segment (m=<value optimized out>,
th=0xfffff80052387022, so=0xfffff80207149000, tp=<value optimized out>,
drop_hdrlen=60, tlen=<value optimized out>, iptos=<value optimized out>,
ti_locked=Cannot access memory at address 0x1
)
    at /usr/src/sys/netinet/tcp_input.c:3103
3103                    INP_INFO_RUNLOCK(&V_tcbinfo);
(kgdb) list
3098                    tcp_dropwithreset(m, th, NULL, tlen, rstreason);
3099            return;
3100
3101    drop:
3102            if (ti_locked == TI_RLOCKED) {
3103                    INP_INFO_RUNLOCK(&V_tcbinfo);
3104                    ti_locked = TI_UNLOCKED;
3105            }
3106    #ifdef INVARIANTS
3107

But judging from the disassembly the fault happens right after calling rw_runlock():

0xffffffff8073dd50 <tcp_do_segment+2992>:       mov    $0xffffffff80ff84d0,%rdi
0xffffffff8073dd57 <tcp_do_segment+2999>:       mov    $0xffffffff809ba28f,%rsi
0xffffffff8073dd5e <tcp_do_segment+3006>:       mov    $0xc1f,%edx
0xffffffff8073dd63 <tcp_do_segment+3011>:       callq  0xffffffff806309c0
<_rw_runlock_cookie>
0xffffffff8073dd68 <tcp_do_segment+3016>:       mov    0x10(%r12),%rdx
0xffffffff8073dd6d <tcp_do_segment+3021>:       mov    %r15,%rdi
0xffffffff8073dd70 <tcp_do_segment+3024>:       mov    %r14,%rsi

That code actually looks like the following DTrace probe a few lines below:

TCP_PROBE3(debug__input, tp, th, mtod(m, const char *));

So, it seems like 'm' could be NULL here.
I see two places in tcp_do_segment() where m gets assigned with NULL followed by
goto drop.  If I had to guess then my guess would be that one of those code
paths was taken.
Since those NULL assignments were there for more than a year, then I would guess
that the addition of the probe is to blame:
https://svnweb.freebsd.org/base?view=revision&revision=287759

Comment 1 Andriy Gapon freebsd_committer

2015-09-28 11:34:32 UTC

Created attachment 161483 [details]
fix / work-around

The attached trivial patch seems to make the issue go away.
But I am not sure if that's a correct fix as not firing the probe at all could result in an incomplete event trail.

As a side note, in my opinion the use of mtod() with the SDT probes in tcp_do_segment() is slightly against the recommended SDT usage.  Typically an SDT probe's arguments are values that are actually used near the probe and thus have a high chance of being in CPU registers or in the L1 cache.  In tcp_do_segment() there does not seem to be any access to m_data, so the probes have a bigger overhead because of the extra memory access.  m_data's value might still be in the L1 cache, though.

Comment 2 Hiren Panchasara freebsd_committer

2015-09-28 16:47:59 UTC

(In reply to Andriy Gapon from comment #1)
Assigning this bug to George.

Andriy: Your patch is any day better than hitting page faults. :-) 
I'd say go ahead and commit this patch if George/others don't have any immediate better fix.

You raise a good/valid point about the efficiency. I hope we could prove/disprove the theory somehow.

Comment 3 Andriy Gapon freebsd_committer

2015-10-23 08:08:37 UTC

ping

Comment 4 Hiren Panchasara freebsd_committer

2016-02-25 04:43:17 UTC

Is this still an issue that needs fixing?

Comment 5 Andriy Gapon freebsd_committer

2016-02-25 07:37:26 UTC

I believe so.

Comment 6 Hiren Panchasara freebsd_committer

2017-01-03 19:07:01 UTC

Ping. Still an issue? if so, we should fix it with proposed patch or ask someone more familiar with dtrace to come-up with a better way.

Comment 7 George V. Neville-Neil freebsd_committer

2017-01-03 19:54:53 UTC

Actually the safest answer is a bit more complex, we can always pass 0 to TCP_PROBE3 in the right place  I'll implement a more complete solution and put up a review..

Comment 8 commit-hook freebsd_committer

2017-01-04 02:20:15 UTC

A commit references this bug:

Author: gnn
Date: Wed Jan  4 02:19:13 UTC 2017
New revision: 311225
URL: https://svnweb.freebsd.org/changeset/base/311225

Log:
  Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and
  dangerous.  Those wanting data from an mbuf should use DTrace itself to get
  the data.

  PR:	203409
  Reviewed by:	hiren
  MFC after:	1 week
  Sponsored by:	Limelight Networks
  Differential Revision:	https://reviews.freebsd.org/D9035

Changes:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_subr.c

Comment 9 Eitan Adler freebsd_committer

2018-05-28 19:46:33 UTC

batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.