203409 – page fault in tcp_do_segment (r287759 suspected)

Bug 203409 - page fault in tcp_do_segment (r287759 suspected)

Summary: page fault in tcp_do_segment (r287759 suspected)

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	George V. Neville-Neil

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-09-28 11:19 UTC by Andriy Gapon
Modified:	2018-05-29 07:40 UTC (History)
CC List:	3 users (show)

See Also:

Attachments
fix / work-around (386 bytes, patch) 2015-09-28 11:34 UTC, Andriy Gapon	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andriy Gapon freebsd_committer

2015-09-28 11:19:01 UTC

I got the following panic on amd64 r288066:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x10
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8073dd68
stack pointer           = 0x28:0xfffffe02b4cb9640
frame pointer           = 0x28:0xfffffe02b4cb9700
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq260: re0)
trap number             = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff8041c9fb = db_trace_self_wrapper+0x2b/frame
0xfffffe02b4cb9110
kdb_backtrace() at 0xffffffff80668239 = kdb_backtrace+0x39/frame 0xfffffe02b4cb91c0
vpanic() at 0xffffffff806334d2 = vpanic+0x152/frame 0xfffffe02b4cb9200
panic() at 0xffffffff80633213 = panic+0x43/frame 0xfffffe02b4cb9260
trap_fatal() at 0xffffffff8081fdc3 = trap_fatal+0x343/frame 0xfffffe02b4cb92c0
trap_pfault() at 0xffffffff8081fff6 = trap_pfault+0x206/frame 0xfffffe02b4cb9350
trap() at 0xffffffff8081f70a = trap+0x4ca/frame 0xfffffe02b4cb9560
trap_check() at 0xffffffff8082011a = trap_check+0x2a/frame 0xfffffe02b4cb9580
calltrap() at 0xffffffff80807db3 = calltrap+0x8/frame 0xfffffe02b4cb9580
--- trap 0xc, rip = 0xffffffff8073dd68, rsp = 0xfffffe02b4cb9650, rbp =
0xfffffe02b4cb9700 ---
tcp_do_segment() at 0xffffffff8073dd68 = tcp_do_segment+0xbc8/frame
0xfffffe02b4cb9700
tcp_input() at 0xffffffff8073c899 = tcp_input+0x999/frame 0xfffffe02b4cb9810
ip_input() at 0xffffffff80733cbe = ip_input+0xbe/frame 0xfffffe02b4cb9860
netisr_dispatch_src() at 0xffffffff807121fe = netisr_dispatch_src+0x17e/frame
0xfffffe02b4cb98d0
netisr_dispatch() at 0xffffffff80712481 = netisr_dispatch+0x11/frame
0xfffffe02b4cb98e0
ether_demux() at 0xffffffff8070984b = ether_demux+0x13b/frame 0xfffffe02b4cb9910
ether_input_internal() at 0xffffffff8070a3ec = ether_input_internal+0x32c/frame
0xfffffe02b4cb9950
ether_nh_input() at 0xffffffff8070a093 = ether_nh_input+0x23/frame
0xfffffe02b4cb9960
netisr_dispatch_src() at 0xffffffff807121fe = netisr_dispatch_src+0x17e/frame
0xfffffe02b4cb99d0
netisr_dispatch() at 0xffffffff80712481 = netisr_dispatch+0x11/frame
0xfffffe02b4cb99e0
ether_input() at 0xffffffff80709b3c = ether_input+0x2c/frame 0xfffffe02b4cb9a00
re_rxeof() at 0xffffffff8049b858 = re_rxeof+0x228/frame 0xfffffe02b4cb9a60
re_intr_msi() at 0xffffffff8049d78f = re_intr_msi+0xbf/frame 0xfffffe02b4cb9aa0
intr_event_execute_handlers() at 0xffffffff805fec1f =
intr_event_execute_handlers+0x12f/frame 0xfffffe02b4cb9b00
ithread_execute_handlers() at 0xffffffff805ff74c =
ithread_execute_handlers+0x2c/frame 0xfffffe02b4cb9b20
ithread_loop() at 0xffffffff805ff5bb = ithread_loop+0x5b/frame 0xfffffe02b4cb9b80
fork_exit() at 0xffffffff805fc23b = fork_exit+0xdb/frame 0xfffffe02b4cb9bf0
fork_trampoline() at 0xffffffff808082ee = fork_trampoline+0xe/frame
0xfffffe02b4cb9bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

(kgdb) fr 9
#9  0xffffffff8073dd68 in tcp_do_segment (m=<value optimized out>,
th=0xfffff80052387022, so=0xfffff80207149000, tp=<value optimized out>,
drop_hdrlen=60, tlen=<value optimized out>, iptos=<value optimized out>,
ti_locked=Cannot access memory at address 0x1
)
    at /usr/src/sys/netinet/tcp_input.c:3103
3103                    INP_INFO_RUNLOCK(&V_tcbinfo);
(kgdb) list
3098                    tcp_dropwithreset(m, th, NULL, tlen, rstreason);
3099            return;
3100
3101    drop:
3102            if (ti_locked == TI_RLOCKED) {
3103                    INP_INFO_RUNLOCK(&V_tcbinfo);
3104                    ti_locked = TI_UNLOCKED;
3105            }
3106    #ifdef INVARIANTS
3107

But judging from the disassembly the fault happens right after calling rw_runlock():

0xffffffff8073dd50 <tcp_do_segment+2992>:       mov    $0xffffffff80ff84d0,%rdi
0xffffffff8073dd57 <tcp_do_segment+2999>:       mov    $0xffffffff809ba28f,%rsi
0xffffffff8073dd5e <tcp_do_segment+3006>:       mov    $0xc1f,%edx
0xffffffff8073dd63 <tcp_do_segment+3011>:       callq  0xffffffff806309c0
<_rw_runlock_cookie>
0xffffffff8073dd68 <tcp_do_segment+3016>:       mov    0x10(%r12),%rdx
0xffffffff8073dd6d <tcp_do_segment+3021>:       mov    %r15,%rdi
0xffffffff8073dd70 <tcp_do_segment+3024>:       mov    %r14,%rsi

That code actually looks like the following DTrace probe a few lines below:

TCP_PROBE3(debug__input, tp, th, mtod(m, const char *));

So, it seems like 'm' could be NULL here.
I see two places in tcp_do_segment() where m gets assigned with NULL followed by
goto drop.  If I had to guess then my guess would be that one of those code
paths was taken.
Since those NULL assignments were there for more than a year, then I would guess
that the addition of the probe is to blame:
https://svnweb.freebsd.org/base?view=revision&revision=287759

Comment 1 Andriy Gapon freebsd_committer

2015-09-28 11:34:32 UTC

Created attachment 161483 [details]
fix / work-around

The attached trivial patch seems to make the issue go away.
But I am not sure if that's a correct fix as not firing the probe at all could result in an incomplete event trail.

As a side note, in my opinion the use of mtod() with the SDT probes in tcp_do_segment() is slightly against the recommended SDT usage.  Typically an SDT probe's arguments are values that are actually used near the probe and thus have a high chance of being in CPU registers or in the L1 cache.  In tcp_do_segment() there does not seem to be any access to m_data, so the probes have a bigger overhead because of the extra memory access.  m_data's value might still be in the L1 cache, though.

Comment 2 Hiren Panchasara freebsd_committer

2015-09-28 16:47:59 UTC

(In reply to Andriy Gapon from comment #1)
Assigning this bug to George.

Andriy: Your patch is any day better than hitting page faults. :-) 
I'd say go ahead and commit this patch if George/others don't have any immediate better fix.

You raise a good/valid point about the efficiency. I hope we could prove/disprove the theory somehow.

Comment 3 Andriy Gapon freebsd_committer

2015-10-23 08:08:37 UTC

ping

Comment 4 Hiren Panchasara freebsd_committer

2016-02-25 04:43:17 UTC

Is this still an issue that needs fixing?

Comment 5 Andriy Gapon freebsd_committer

2016-02-25 07:37:26 UTC

I believe so.

Comment 6 Hiren Panchasara freebsd_committer

2017-01-03 19:07:01 UTC

Ping. Still an issue? if so, we should fix it with proposed patch or ask someone more familiar with dtrace to come-up with a better way.

Comment 7 George V. Neville-Neil freebsd_committer

2017-01-03 19:54:53 UTC

Actually the safest answer is a bit more complex, we can always pass 0 to TCP_PROBE3 in the right place  I'll implement a more complete solution and put up a review..

Comment 8 commit-hook freebsd_committer

2017-01-04 02:20:15 UTC

A commit references this bug:

Author: gnn
Date: Wed Jan  4 02:19:13 UTC 2017
New revision: 311225
URL: https://svnweb.freebsd.org/changeset/base/311225

Log:
  Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary and
  dangerous.  Those wanting data from an mbuf should use DTrace itself to get
  the data.

  PR:	203409
  Reviewed by:	hiren
  MFC after:	1 week
  Sponsored by:	Limelight Networks
  Differential Revision:	https://reviews.freebsd.org/D9035

Changes:
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_subr.c

Comment 9 Eitan Adler freebsd_committer

2018-05-28 19:46:33 UTC

batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.