Bug 215715 - cond_test:cond_timedwait_race crashes with SIGSEGV periodically when calling binuptime
Summary: cond_test:cond_timedwait_race crashes with SIGSEGV periodically when calling ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2017-01-02 17:46 UTC by Enji Cooper
Modified: 2017-07-27 20:38 UTC (History)
2 users (show)

See Also:


Attachments
test program, corefile, and debug symbols (52.94 KB, application/x-gzip)
2017-01-02 18:44 UTC, Enji Cooper
no flags Details
Be extra careful to not unmap any /dev/hpet mappings possibly seen by other threads (2.70 KB, patch)
2017-01-03 12:18 UTC, Konstantin Belousov
no flags Details | Diff
A test program illustrating bug with two threads executing gettimeofday() concurrently (2.40 KB, text/plain)
2017-01-04 12:10 UTC, Konstantin Belousov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Enji Cooper freebsd_committer freebsd_triage 2017-01-02 17:46:55 UTC
I've seen lib/libthr/cond_test:cond_timedwait_race fail periodically:

cond_test:cond_timedwait_race  ->  broken: Premature exit; test case received signal 11 (core dumped)  [0.264s]

I'm not sure why it happens, yet.
Comment 1 Enji Cooper freebsd_committer freebsd_triage 2017-01-02 17:59:07 UTC
Looks like an issue with the gettimeofday change when calculating the result of binuptime:

Standard output
                            
   Test case did not write anything to stdout.

Standard error

 Process with PID 7058 exited with signal 11 and dumped core; attempting to gather stack trace
 [New LWP 100585]
 [New LWP 100200]
 [New LWP 100586]
 [New LWP 100587]
 [New LWP 100595]
 [New LWP 100599]
 [New LWP 100607]
 Core was generated by `/usr/tests/lib/libthr/cond_test -vunprivileged-user=tests -r/tmp/kyua.e5qVqp/6/r'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  __vdso_gettc (th=, tc=) at /usr/src/svn/lib/libc/x86/sys/__vdso_gettc.c:231
 231                     *tc = *(volatile uint32_t *)(hpet_dev_map + HPET_MAIN_COUNTER);
 [Current thread is 1 (LWP 100585)]
 #0  __vdso_gettc (th=, tc=) at /usr/src/svn/lib/libc/x86/sys/__vdso_gettc.c:231
 #1  0x0000000800da3785 in tc_delta (th=0x3, delta=) at /usr/src/svn/lib/libc/sys/__vdso_gettimeofday.c:43
 #2  binuptime (bt=, tk=, abs=1) at /usr/src/svn/lib/libc/sys/__vdso_gettimeofday.c:74
 #3  0x0000000800da38b6 in __vdso_clock_gettime (clock_id=0, ts=0x7fffdfffdf90) at /usr/src/svn/lib/libc/sys/__vdso_gettimeofday.c:149
 #4  0x0000000800da3001 in __clock_gettime (clock_id=0, ts=0x7fffdfffdf90) at /usr/src/svn/lib/libc/sys/clock_gettime.c:46
 #5  0x000000000040267d in pthread_cond_timedwait_func (arg=) at /usr/src/svn/contrib/netbsd-tests/lib/libpthread/t_cond.c:316
 #6  0x0000000800a44ba5 in thread_start (curthread=) at /usr/src/svn/lib/libthr/thread/thr_create.c:289
 #7  0x0000000000000000 in ?? ()
 Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
 GDB exited successfully
 Files left in work directory after failure: cond_test.core
Comment 2 Enji Cooper freebsd_committer freebsd_triage 2017-01-02 18:09:44 UTC
$ sysctl kern.eventtimer
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0)
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.LAPIC.quality: 600
kern.eventtimer.et.LAPIC.frequency: 2195235000
kern.eventtimer.et.LAPIC.flags: 7
$ uname -a
FreeBSD fbsd12 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r310984: Sat Dec 31 13:07:36 PST 2016     ngie@fbsd12:/usr/obj/usr/src/svn/sys/GENERIC  amd64

Attaching the core and test program soon.
Comment 3 Enji Cooper freebsd_committer freebsd_triage 2017-01-02 18:44:55 UTC
Created attachment 178455 [details]
test program, corefile, and debug symbols
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2017-01-03 12:17:28 UTC
I do not see how the backtrace you posted has anything to do with eventtimers.  But it might be relevant for timecounter HPET hardware.  Please show sysctl kern.timecounter output on your test machine.

In fact, I do see an issue in multithreaded program which happens to execute e.g. gettimeofday(2) in two threads in parallel, regardless of your test program.  If your machine uses HPET for timecounter, please test the patch attached.

In fact, using LAPIC for eventtimers and HPET for timecounter is very weird configuration.  Also weird is that there is no HPET in the eventtimers hw list. Is it VM ? Show verbose dmesg.

BTW, core file does not have any use outside of your machine.
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2017-01-03 12:18:06 UTC
Created attachment 178471 [details]
Be extra careful to not unmap any /dev/hpet mappings possibly seen by other threads
Comment 6 Konstantin Belousov freebsd_committer freebsd_triage 2017-01-04 12:10:37 UTC
Created attachment 178503 [details]
A test program illustrating bug with two threads executing gettimeofday() concurrently

The program makes an easy reproduction case for the bug I fixed in the patch.  The timecounter should be set to HPET, then the program run.  It forks children which try to execute gettimeofday() concurrently, and at the end print stats of normally exited and signalled children.
Comment 7 commit-hook freebsd_committer freebsd_triage 2017-01-04 16:11:27 UTC
A commit references this bug:

Author: kib
Date: Wed Jan  4 16:10:52 UTC 2017
New revision: 311287
URL: https://svnweb.freebsd.org/changeset/base/311287

Log:
  __vdso_gettc(): be extra careful with /dev/hpet mappings, never unmap
  the mapping which might be accessed by other threads.

  If a pointer to the /dev/hpet register page mapping was stored into
  the hpet_dev_map, other threads might access the page at any time.
  Never unmap it, instead, keep track of mappings for all hpet units in
  smal array.  Store pointer to the newly mapped registers page using
  CAS, to detect parallel mappings.

  It appeared relatively easy to demonstrate the problem by arranging
  two threads which perform gettimeofday(2) concurently, first time in
  the process address space, when HPET is used for timecounter.

  PR:	215715
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week

Changes:
  head/lib/libc/x86/sys/__vdso_gettc.c
Comment 8 Ed Maste freebsd_committer freebsd_triage 2017-07-27 02:07:38 UTC
Does this still need MFC?
Comment 9 Konstantin Belousov freebsd_committer freebsd_triage 2017-07-27 07:18:33 UTC
I believe that this is fixed but I never received the confirmation.
Comment 10 Enji Cooper freebsd_committer freebsd_triage 2017-07-27 20:38:22 UTC
(In reply to Konstantin Belousov from comment #9)

Yes, it's been fixed for a while (apologies for not following up on the bug): https://ci.freebsd.org/job/FreeBSD-head-amd64-test/3646/testReport/lib.libthr/cond_test/cond_timedwait_race/ .