Bug 187238 - [amd64] [patch] vm.pmap.pcid_enabled="1" causes Java to coredump in FBSD 10
Summary: [amd64] [patch] vm.pmap.pcid_enabled="1" causes Java to coredump in FBSD 10
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Konstantin Belousov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-04 01:00 UTC by Craig Rodrigues
Modified: 2022-01-13 07:35 UTC (History)
4 users (show)

See Also:


Attachments
pr187238.patch (1.71 KB, patch)
2014-03-23 12:03 UTC, Henrik Gulbrandsen
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Craig Rodrigues freebsd_committer freebsd_triage 2014-03-04 01:00:01 UTC
Hi,

As part of the Jenkins deployment in the FreeBSD cluster
the jenkins-admin team observed that when running Jenkins on FreeBSD 10,
the Java virtual machine from the openjdk6 or openjdk7 port
would coredump regularly.

See item #10 here:  https://wiki.freebsd.org/Jenkins

On the advice of Jung-uk Kim, I put the following in /boot/loader.conf:

vm.pmap.pcid_enabled="0"

and rebooted.

After that, the Java coredumping problems went away.

Can someone with VM expertise look into this problem and suggest a fix?

There are many reports of Java coredumping on FreeBSD 10, such as this
one:  http://lists.freebsd.org/pipermail/freebsd-java/2014-March/010606.html

It would be good to fix this, so that Java works "out of the box" on
FreeBSD 10.  It's not good when kernel tunables need to be set
so that Java can work. :(

Thanks.
Comment 1 Henrik Gulbrandsen 2014-03-23 12:03:00 UTC
This is the most time-consuming bug I've encountered in my life, and not
only because I started looking for it in the JVM, but now it seems to 
have
been hiding in plain sight. I'm pretty sure that pmap->pm_save is 
handled
incorrectly in the current kernel. Judging from the code, it's supposed 
to
include all CPUs where the pmap has been active since the latest call to
pmap_invalidate_all(...). However, that means that it should always be a
superset of pmap->pm_active, since any CPU where the pmap is active may
cache pmap information at any time. Currently, this is not the case, and
since only CPUs in pmap->pm_save are targeted in the TLB shootdown, we
are left with inconsistencies that crash the process soon afterwards.

The attached patch solves this by only clearing a CPU from pmap->pm_save
if it is not currently included in pmap->pm_active. As far as I can 
tell,
that eliminates the bug. The patch is against STABLE, since that's what
I'm currently running, but CURRENT should be pretty close, except for 
the
default setting of pmap_pcid_enabled.

By the way, the logic in the invalidation functions is a bit messy now
and can probably be simplified. Also, is there a good reason for 
ignoring
the pmap argument in smp_masked_invltlb(...)?

/Henrik

P.S. After five days it turns out that mx1.FreeBSD.org has been 
rejecting
this email due to a slight misconfiguration of my mail server. I hope 
that
I haven't caused too many hours of frustration by this failure to report
the bug fix in due time. Anyway, in the meantime my test (java/openjdk6
building itself) has been running continuously in the background. It 
used
to fail almost every single time, but has now gone through 765 
iterations
without a single crash. I believe that indicates that the bug is fixed.
Comment 2 Craig Rodrigues freebsd_committer freebsd_triage 2014-03-23 18:45:18 UTC
Responsible Changed
From-To: freebsd-bugs->kib

kib, can you please take a look?
Comment 3 Mark Linimon 2014-03-23 19:22:32 UTC
----- Forwarded message from Craig Rodrigues <rodrigc@FreeBSD.org> -----

Date: Sun, 23 Mar 2014 11:49:25 -0700
From: Craig Rodrigues <rodrigc@FreeBSD.org>
To: Henrik Gulbrandsen <henrik@gulbra.net>
Cc: Alan Cox <alc@freebsd.org>, Konstantin Belousov <kib@freebsd.org>, freebsd-java <freebsd-java@freebsd.org>
Subject: Re: kern/187238: vm.pmap.pcid_enabled="1" causes Java to coredump in FBSD 10

Henrik,

Thanks for your persistence in analyzing the problem and coming up with a patch.
I've assigned PR 187238 to kib@ for review.

--
Craig

----- End forwarded message -----
Comment 4 Kostik Belousov 2014-03-24 19:11:38 UTC
On Sun, Mar 23, 2014 at 01:03:00PM +0100, Henrik Gulbrandsen wrote:
> This is the most time-consuming bug I've encountered in my life, and not
> only because I started looking for it in the JVM, but now it seems to 
> have
> been hiding in plain sight. I'm pretty sure that pmap->pm_save is 
> handled
> incorrectly in the current kernel. Judging from the code, it's supposed 
> to
> include all CPUs where the pmap has been active since the latest call to
> pmap_invalidate_all(...). However, that means that it should always be a
> superset of pmap->pm_active, since any CPU where the pmap is active may
> cache pmap information at any time. Currently, this is not the case, and
> since only CPUs in pmap->pm_save are targeted in the TLB shootdown, we
> are left with inconsistencies that crash the process soon afterwards.
> 
> The attached patch solves this by only clearing a CPU from pmap->pm_save
> if it is not currently included in pmap->pm_active. As far as I can 
> tell,
> that eliminates the bug. The patch is against STABLE, since that's what
> I'm currently running, but CURRENT should be pretty close, except for 
> the
> default setting of pmap_pcid_enabled.

Yes, I think that the analysis and the patch (for stable/10) is correct.
Thank you for tracking this down.

> 
> By the way, the logic in the invalidation functions is a bit messy now
> and can probably be simplified. Also, is there a good reason for 
> ignoring
> the pmap argument in smp_masked_invltlb(...)?

I think this was an ommission.

I adopted the patch to HEAD, and apparantly the rewrite of the page
invalidation IPI handlers in C introduced a regression, so it took
some more time to find the issue. Apparently, the invlrng_handler()
did the checks for special cases in the wrong order, doing the
invpcid(INVPCID_ADDR) before checking for special PCIDs.

Show me the first lines of the verbose dmesg for your machine.

Below is the cumulative patch for HEAD.  If somebody can test this
with jdk build on HEAD, it would be useful.
For testing, the tunable vm.pmap.pcid_enabled must be set to 1
from the loader prompt.

diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c
index 80b4e12..afc1bef 100644
--- a/sys/amd64/amd64/mp_machdep.c
+++ b/sys/amd64/amd64/mp_machdep.c
@@ -1257,7 +1257,7 @@ smp_masked_invltlb(cpuset_t mask, pmap_t pmap)
 {
 
 	if (smp_started) {
-		smp_targeted_tlb_shootdown(mask, IPI_INVLTLB, NULL, 0, 0);
+		smp_targeted_tlb_shootdown(mask, IPI_INVLTLB, pmap, 0, 0);
 #ifdef COUNT_XINVLTLB_HITS
 		ipi_masked_global++;
 #endif
@@ -1517,6 +1517,7 @@ void
 invltlb_pcid_handler(void)
 {
 	uint64_t cr3;
+	u_int cpuid;
 #ifdef COUNT_XINVLTLB_HITS
 	xhits_gbl[PCPU_GET(cpuid)]++;
 #endif /* COUNT_XINVLTLB_HITS */
@@ -1524,14 +1525,13 @@ invltlb_pcid_handler(void)
 	(*ipi_invltlb_counts[PCPU_GET(cpuid)])++;
 #endif /* COUNT_IPIS */
 
-	cr3 = rcr3();
 	if (smp_tlb_invpcid.pcid != (uint64_t)-1 &&
 	    smp_tlb_invpcid.pcid != 0) {
-
 		if (invpcid_works) {
 			invpcid(&smp_tlb_invpcid, INVPCID_CTX);
 		} else {
 			/* Otherwise reload %cr3 twice. */
+			cr3 = rcr3();
 			if (cr3 != pcid_cr3) {
 				load_cr3(pcid_cr3);
 				cr3 |= CR3_PCID_SAVE;
@@ -1541,8 +1541,11 @@ invltlb_pcid_handler(void)
 	} else {
 		invltlb_globpcid();
 	}
-	if (smp_tlb_pmap != NULL)
-		CPU_CLR_ATOMIC(PCPU_GET(cpuid), &smp_tlb_pmap->pm_save);
+	if (smp_tlb_pmap != NULL) {
+		cpuid = PCPU_GET(cpuid);
+		if (!CPU_ISSET(cpuid, &smp_tlb_pmap->pm_active))
+			CPU_CLR_ATOMIC(cpuid, &smp_tlb_pmap->pm_save);
+	}
 
 	atomic_add_int(&smp_tlb_wait, 1);
 }
@@ -1608,7 +1611,10 @@ invlpg_range(vm_offset_t start, vm_offset_t end)
 void
 invlrng_handler(void)
 {
+	struct invpcid_descr d;
 	vm_offset_t addr;
+	uint64_t cr3;
+	u_int cpuid;
 #ifdef COUNT_XINVLTLB_HITS
 	xhits_rng[PCPU_GET(cpuid)]++;
 #endif /* COUNT_XINVLTLB_HITS */
@@ -1618,15 +1624,7 @@ invlrng_handler(void)
 
 	addr = smp_tlb_invpcid.addr;
 	if (pmap_pcid_enabled) {
-		if (invpcid_works) {
-			struct invpcid_descr d;
-
-			d = smp_tlb_invpcid;
-			do {
-				invpcid(&d, INVPCID_ADDR);
-				d.addr += PAGE_SIZE;
-			} while (d.addr < smp_tlb_addr2);
-		} else if (smp_tlb_invpcid.pcid == 0) {
+		if (smp_tlb_invpcid.pcid == 0) {
 			/*
 			 * kernel pmap - use invlpg to invalidate
 			 * global mapping.
@@ -1635,12 +1633,18 @@ invlrng_handler(void)
 		} else if (smp_tlb_invpcid.pcid == (uint64_t)-1) {
 			invltlb_globpcid();
 			if (smp_tlb_pmap != NULL) {
-				CPU_CLR_ATOMIC(PCPU_GET(cpuid),
-				    &smp_tlb_pmap->pm_save);
+				cpuid = PCPU_GET(cpuid);
+				if (!CPU_ISSET(cpuid, &smp_tlb_pmap->pm_active))
+					CPU_CLR_ATOMIC(cpuid,
+					    &smp_tlb_pmap->pm_save);
 			}
+		} else if (invpcid_works) {
+			d = smp_tlb_invpcid;
+			do {
+				invpcid(&d, INVPCID_ADDR);
+				d.addr += PAGE_SIZE;
+			} while (d.addr <= smp_tlb_addr2);
 		} else {
-			uint64_t cr3;
-
 			cr3 = rcr3();
 			if (cr3 != pcid_cr3)
 				load_cr3(pcid_cr3 | CR3_PCID_SAVE);
diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
index 51ac9be..183a456 100644
--- a/sys/amd64/amd64/pmap.c
+++ b/sys/amd64/amd64/pmap.c
@@ -838,7 +838,7 @@ pmap_bootstrap(vm_paddr_t *firstaddr)
 	kernel_pmap->pm_pml4 = (pdp_entry_t *)PHYS_TO_DMAP(KPML4phys);
 	kernel_pmap->pm_cr3 = KPML4phys;
 	CPU_FILL(&kernel_pmap->pm_active);	/* don't allow deactivation */
-	CPU_ZERO(&kernel_pmap->pm_save);
+	CPU_FILL(&kernel_pmap->pm_save);	/* always superset of pm_active */
 	TAILQ_INIT(&kernel_pmap->pm_pvchunk);
 	kernel_pmap->pm_flags = pmap_flags;
 
@@ -1494,7 +1494,8 @@ pmap_invalidate_all(pmap_t pmap)
 		} else {
 			invltlb_globpcid();
 		}
-		CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
+		if (!CPU_ISSET(cpuid, &pmap->pm_active))
+			CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
 		smp_invltlb(pmap);
 	} else {
 		other_cpus = all_cpus;
@@ -1528,7 +1529,8 @@ pmap_invalidate_all(pmap_t pmap)
 			}
 		} else if (CPU_ISSET(cpuid, &pmap->pm_active))
 			invltlb();
-		CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
+		if (!CPU_ISSET(cpuid, &pmap->pm_active))
+			CPU_CLR_ATOMIC(cpuid, &pmap->pm_save);
 		if (pmap_pcid_enabled)
 			CPU_AND(&other_cpus, &pmap->pm_save);
 		else
Comment 5 Henrik Gulbrandsen 2014-03-25 11:35:01 UTC
On 2014-03-24 20:11, Konstantin Belousov wrote:
> Yes, I think that the analysis and the patch (for stable/10) is 
> correct.
> Thank you for tracking this down.

I'm glad to hear that. I was a bit worried when I encountered a very
similar crash in another Java port yesterday, but that was independent
of the vm.pmap.pcid_enabled tunable and turned out to be a simple
ABI version mismatch.

> Show me the first lines of the verbose dmesg for your machine.

I'm sorry - was this a request for me or Craig? I'm not quite sure
what information you're looking for.

> Below is the cumulative patch for HEAD.  If somebody can test this
> with jdk build on HEAD, it would be useful.
> For testing, the tunable vm.pmap.pcid_enabled must be set to 1
> from the loader prompt.

I'd be happy to test, but at the moment I'm stuck doing some real work
on my only suitable machine, so it will have to wait. If anyone else
wants to do it - feel free to contribute! :-)

/Henrik
Comment 6 Peter Holm 2014-03-25 11:45:30 UTC
On Tue, Mar 25, 2014 at 12:35:01PM +0100, Henrik Gulbrandsen wrote:
> On 2014-03-24 20:11, Konstantin Belousov wrote:
> > Yes, I think that the analysis and the patch (for stable/10) is 
> > correct.
> > Thank you for tracking this down.
> 
> I'm glad to hear that. I was a bit worried when I encountered a very
> similar crash in another Java port yesterday, but that was independent
> of the vm.pmap.pcid_enabled tunable and turned out to be a simple
> ABI version mismatch.
> 
> > Show me the first lines of the verbose dmesg for your machine.
> 
> I'm sorry - was this a request for me or Craig? I'm not quite sure
> what information you're looking for.
> 
> > Below is the cumulative patch for HEAD.  If somebody can test this
> > with jdk build on HEAD, it would be useful.
> > For testing, the tunable vm.pmap.pcid_enabled must be set to 1
> > from the loader prompt.
> 
> I'd be happy to test, but at the moment I'm stuck doing some real work
> on my only suitable machine, so it will have to wait. If anyone else
> wants to do it - feel free to contribute! :-)
> 
> /Henrik

I'm testing this right now. Uptime is 3 hours so far.
-- 
Peter
Comment 7 Craig Rodrigues freebsd_committer freebsd_triage 2014-03-28 17:58:33 UTC
On Mon, Mar 24, 2014 at 12:11 PM, Konstantin Belousov
<kostikbel@gmail.com> wrote:
>
> Show me the first lines of the verbose dmesg for your machine.
>

Thank you to Henrik and Konstantin for looking at this.  I haven't had
time this week
to test this but will get back to you in the next few days.  Here is
the machine in the FreeBSD cluster
where I was able to reproduce the problem:

CPU: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz (2400.13-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16571891712 (15804 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads


--
Craig
Comment 8 pguyot 2014-03-31 21:08:40 UTC
Thank you for this patch. Indeed, we experienced random crashes with =
Erlang (both R16B and R17-RC2) on FreeBSD 10-RELEASE under some stress. =
The crashes were very random at first, and the JVM also crashed randomly =
when compiling some packages of ours (this is how I came with this PR). =
Eventually, today, the crash became regular: a virtual machine on one of =
our 10-RELEASE boxes, with a lot of work and putting the server under =
high load (12) and stress (1160% CPU on 12-cores) would crash every 60 =
seconds.

pmap_pcid_enabled=3D0 worked, and so does Konstantin's patch applied to =
10/stable.

The process has been running for 20 minutes so far.

Thanks again.

Paul=
Comment 9 pguyot 2014-04-01 08:55:14 UTC
I am sorry to report that crashes still happen, although much less =
frequently, with Konstantin's patch.

Here is an excerpt of grep -E 'FreeBSD 10|signal' /var/log/messages:

Mar 29 22:18:20 capensis syslogd: exiting on signal 15
Mar 29 22:21:08 capensis kernel: FreeBSD 10.0-RELEASE #0 r263922M: Sat =
Mar 29 22:04:03 UTC 2014
Mar 29 22:40:02 capensis kernel: pid 19800 (beam.smp), uid 1001: exited =
on signal 11 (core dumped)
Mar 30 20:32:01 capensis kernel: pid 18345 (mecab_drv), uid 1004: exited =
on signal 11 (core dumped)
Mar 30 20:32:01 capensis kernel: pid 18346 (mecab_drv), uid 1004: exited =
on signal 11 (core dumped)
Mar 30 20:32:01 capensis kernel: pid 18347 (mecab_drv), uid 1004: exited =
on signal 11 (core dumped)
Mar 30 20:36:11 capensis kernel: pid 19739 (java), uid 1004: exited on =
signal 6 (core dumped)
Mar 30 20:46:44 capensis kernel: pid 20695 (java), uid 1004: exited on =
signal 6 (core dumped)
Mar 30 21:07:19 capensis kernel: pid 24094 (beam.smp), uid 1004: exited =
on signal 11 (core dumped)
Mar 31 06:51:32 capensis kernel: pid 72144 (collectd), uid 0: exited on =
signal 6 (core dumped)
Mar 31 08:00:24 capensis kernel: pid 51990 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:00:40 capensis kernel: pid 78257 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:01:01 capensis kernel: pid 78386 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:01:22 capensis kernel: pid 78488 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:01:44 capensis kernel: pid 78643 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:02:15 capensis kernel: pid 78774 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:02:46 capensis kernel: pid 79051 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:03:23 capensis kernel: pid 79407 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:04:04 capensis kernel: pid 79778 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:04:53 capensis kernel: pid 79946 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:05:05 capensis kernel: pid 79694 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:05:48 capensis kernel: pid 80098 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:06:45 capensis kernel: pid 80283 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:07:50 capensis kernel: pid 80441 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:07:59 capensis kernel: pid 80200 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:09:03 capensis kernel: pid 80605 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:09:28 capensis kernel: pid 80829 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:09:43 capensis kernel: pid 80705 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:09:55 capensis kernel: pid 80958 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:10:24 capensis kernel: pid 81126 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:10:58 capensis kernel: pid 81252 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:11:36 capensis kernel: pid 81393 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:12:08 capensis kernel: pid 81059 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:12:22 capensis kernel: pid 81522 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:13:11 capensis kernel: pid 81737 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:13:52 capensis kernel: pid 81651 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:14:09 capensis kernel: pid 81867 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:15:11 capensis kernel: pid 82081 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:16:03 capensis kernel: pid 81996 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:16:21 capensis kernel: pid 82219 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:16:41 capensis kernel: pid 82387 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:16:43 capensis kernel: pid 82449 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:17:05 capensis kernel: pid 82594 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:17:19 capensis kernel: pid 82582 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:17:30 capensis kernel: pid 82757 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:17:59 capensis kernel: pid 82912 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:18:24 capensis kernel: pid 82857 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:18:30 capensis kernel: pid 83042 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:19:06 capensis kernel: pid 83250 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:19:58 capensis kernel: pid 83194 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:20:20 capensis kernel: pid 83389 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:21:10 capensis kernel: pid 83605 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:21:25 capensis kernel: pid 83549 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:22:04 capensis kernel: pid 83763 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:22:39 capensis kernel: pid 83943 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:23:14 capensis kernel: pid 84040 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:23:29 capensis kernel: pid 84257 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:23:43 capensis kernel: pid 84170 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:23:47 capensis kernel: pid 84387 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:24:21 capensis kernel: pid 84544 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:24:52 capensis kernel: pid 84758 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:25:07 capensis kernel: pid 84488 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:25:26 capensis kernel: pid 84906 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:26:04 capensis kernel: pid 85092 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:26:41 capensis kernel: pid 85222 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:27:19 capensis kernel: pid 85352 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:27:39 capensis kernel: pid 85008 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:28:05 capensis kernel: pid 85482 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:28:45 capensis kernel: pid 85642 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:29:11 capensis kernel: pid 85719 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:30:14 capensis kernel: pid 86038 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:30:28 capensis kernel: pid 85973 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:31:21 capensis kernel: pid 86205 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:32:35 capensis kernel: pid 86420 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:33:00 capensis kernel: pid 86575 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:33:25 capensis kernel: pid 86716 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:33:54 capensis kernel: pid 86845 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:34:09 capensis kernel: pid 86333 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:34:27 capensis kernel: pid 86953 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:35:04 capensis kernel: pid 87137 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:35:12 capensis kernel: pid 87075 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:35:46 capensis kernel: pid 87269 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:36:33 capensis kernel: pid 87477 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:37:04 capensis kernel: pid 87391 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:37:26 capensis kernel: pid 87607 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:38:26 capensis kernel: pid 87799 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:38:58 capensis kernel: pid 87736 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:39:34 capensis kernel: pid 87951 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:40:50 capensis kernel: pid 88196 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:41:22 capensis kernel: pid 88383 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:41:36 capensis kernel: pid 88517 (beam.smp), uid 1004: exited =
on signal 6
Mar 31 08:41:57 capensis kernel: pid 88080 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:42:10 capensis kernel: pid 88645 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:42:30 capensis kernel: pid 88831 (beam.smp), uid 1004: exited =
on signal 6
Mar 31 08:42:50 capensis kernel: pid 88747 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:43:11 capensis kernel: pid 88931 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:43:59 capensis kernel: pid 89120 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:44:13 capensis kernel: pid 89060 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:44:56 capensis kernel: pid 89299 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:45:11 capensis kernel: pid 89421 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:45:28 capensis kernel: pid 89989 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:46:04 capensis kernel: pid 89751 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:46:19 capensis kernel: pid 90277 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:46:37 capensis kernel: pid 90411 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:47:00 capensis kernel: pid 90512 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:47:21 capensis kernel: pid 90642 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:47:35 capensis kernel: pid 90086 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:47:48 capensis kernel: pid 90750 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:48:06 capensis kernel: pid 90872 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:48:16 capensis kernel: pid 90930 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:48:50 capensis kernel: pid 91115 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:49:07 capensis kernel: pid 91058 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:49:28 capensis kernel: pid 91251 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:50:16 capensis kernel: pid 91489 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:50:24 capensis kernel: pid 91383 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 08:51:05 capensis kernel: pid 91667 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:51:58 capensis kernel: pid 98018 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:52:22 capensis kernel: pid 91822 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:53:10 capensis kernel: pid 98164 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:53:37 capensis kernel: pid 8564 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:53:58 capensis kernel: pid 8817 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:54:22 capensis kernel: pid 9075 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:54:44 capensis kernel: pid 9447 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:55:10 capensis kernel: pid 9543 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:55:48 capensis kernel: pid 10021 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:56:31 capensis kernel: pid 10693 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:57:31 capensis kernel: pid 10996 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:58:17 capensis kernel: pid 9759 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:58:32 capensis kernel: pid 11406 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 08:59:41 capensis kernel: pid 11887 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:00:02 capensis kernel: pid 12375 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:00:26 capensis kernel: pid 12666 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:00:56 capensis kernel: pid 12874 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:01:23 capensis kernel: pid 11707 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:01:32 capensis kernel: pid 13921 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:01:56 capensis kernel: pid 14181 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:02:13 capensis kernel: pid 14305 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:02:53 capensis kernel: pid 14506 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:03:00 capensis kernel: pid 14639 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:04:04 capensis kernel: pid 14977 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:04:21 capensis kernel: pid 14898 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:05:08 capensis kernel: pid 16103 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 09:07:04 capensis kernel: pid 17116 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:07:23 capensis kernel: pid 16283 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:07:52 capensis kernel: pid 18422 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:08:38 capensis kernel: pid 18590 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:08:49 capensis kernel: pid 18736 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:09:45 capensis kernel: pid 19205 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:11:03 capensis kernel: pid 28347 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:11:23 capensis kernel: pid 29734 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:11:49 capensis kernel: pid 32253 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:12:19 capensis kernel: pid 36057 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:12:58 capensis kernel: pid 44017 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:13:38 capensis kernel: pid 46303 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 09:14:28 capensis kernel: pid 46513 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:15:21 capensis kernel: pid 46788 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:16:24 capensis kernel: pid 47230 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:17:36 capensis kernel: pid 47544 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:18:01 capensis kernel: pid 48293 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:18:32 capensis kernel: pid 48520 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:19:07 capensis kernel: pid 48841 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:19:46 capensis kernel: pid 49740 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:20:37 capensis kernel: pid 50077 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:20:55 capensis kernel: pid 18984 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:21:38 capensis kernel: pid 50389 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:22:21 capensis kernel: pid 50581 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:22:49 capensis kernel: pid 50957 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:23:25 capensis kernel: pid 52011 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:23:44 capensis kernel: pid 51858 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 09:23:50 capensis kernel: pid 53145 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:24:23 capensis kernel: pid 54037 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:24:59 capensis kernel: pid 54258 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:25:40 capensis kernel: pid 54499 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:25:56 capensis kernel: pid 53965 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:26:36 capensis kernel: pid 54893 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:27:11 capensis kernel: pid 55074 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:27:50 capensis kernel: pid 55275 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:28:09 capensis kernel: pid 55749 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:28:37 capensis kernel: pid 55898 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:29:06 capensis kernel: pid 56348 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:29:38 capensis kernel: pid 56643 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:30:14 capensis kernel: pid 56947 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:31:05 capensis kernel: pid 57851 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 09:32:04 capensis kernel: pid 58856 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:33:17 capensis kernel: pid 59191 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:33:37 capensis kernel: pid 59588 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:34:05 capensis kernel: pid 59773 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:34:39 capensis kernel: pid 59976 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:34:57 capensis kernel: pid 55574 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:35:19 capensis kernel: pid 60197 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:36:02 capensis kernel: pid 60459 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:36:51 capensis kernel: pid 60609 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:37:43 capensis kernel: pid 60767 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 09:38:44 capensis kernel: pid 60924 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 15:39:08 capensis kernel: pid 71364 (beam.smp), uid 1004: exited =
on signal 11 (core dumped)
Mar 31 15:58:09 capensis kernel: pid 73674 (java), uid 1004: exited on =
signal 6 (core dumped)
Mar 31 16:39:22 capensis kernel: pid 76478 (beam.smp), uid 1004: exited =
on signal 11 (core dumped)
Mar 31 16:56:50 capensis kernel: pid 78202 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 19:14:56 capensis kernel: pid 85209 (java), uid 1004: exited on =
signal 6 (core dumped)
Mar 31 19:28:22 capensis syslogd: exiting on signal 15
Mar 31 19:31:15 capensis kernel: FreeBSD 10.0-RELEASE #0 r263922M: Sat =
Mar 29 22:04:03 UTC 2014
Mar 31 19:31:17 capensis kernel: pid 1389 (memcached), uid 65534: exited =
on signal 11
Mar 31 19:48:51 capensis syslogd: exiting on signal 15
Mar 31 19:51:39 capensis kernel: FreeBSD 10.0-RELEASE #1 r263922M: Mon =
Mar 31 19:45:04 UTC 2014
Mar 31 19:51:41 capensis kernel: pid 1379 (memcached), uid 65534: exited =
on signal 11
Mar 31 20:10:51 capensis kernel: pid 1389 (collectd), uid 0: exited on =
signal 6 (core dumped)
Mar 31 21:58:56 capensis kernel: pid 1629 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 21:59:04 capensis kernel: pid 15195 (beam.smp), uid 1004: exited =
on signal 11 (core dumped)
Mar 31 21:59:45 capensis kernel: pid 15169 (beam.smp), uid 1004: exited =
on signal 10
Mar 31 22:00:44 capensis kernel: pid 15321 (beam.smp), uid 1004: exited =
on signal 11
Mar 31 22:00:53 capensis syslogd: exiting on signal 15
Mar 31 22:03:41 capensis kernel: FreeBSD 10.0-RELEASE #1 r263922M: Mon =
Mar 31 19:45:04 UTC 2014
Mar 31 22:03:43 capensis kernel: pid 1389 (memcached), uid 65534: exited =
on signal 11

As you can see, on 10/stable, beam.smp (Erlang) crashed very often. Once =
per minute when the stressing process was running (from 08:00 to 09:38).
At 15:39, I experienced crashes with Erlang and java while *compiling* =
code (actually compiling Erlang R17 RC2, not even our own code).
At 19:28, I restarted with vm.pmap.pcid_enabled=3D0 and started the =
process. No crash, except memcached but this might be totally unrelated.
At 19:48, I restarted with vm.pmap.pcid_enabled=3D1 and Konstantin's =
patch and started the process again. No crash except memcached and =
collectd at first, but eventually around 22:00 a lot of crashes =
happened.
At 22:00, I restarted with vm.pmap.pcid_enabled=3D0 and started the =
process again. No crash since.

Unfortunately, all the stressful work is now done and the process is =
using much less CPU now.

All times are UTC, no crash for 9 hours with vm.pmap.pcid_enabled=3D0.

On another box (same hardware), with less work:

Mar 31 17:03:31 blacki kernel: pid 645 (beam.smp), uid 1004: exited on =
signal 11
Mar 31 17:05:29 blacki kernel: pid 2535 (beam.smp), uid 1004: exited on =
signal 11
Mar 31 17:08:11 blacki kernel: pid 2907 (beam.smp), uid 1004: exited on =
signal 11
Mar 31 21:03:00 blacki kernel: pid 28897 (beam.smp), uid 1004: exited on =
signal 11
Mar 31 21:21:21 blacki syslogd: exiting on signal 15
Mar 31 21:24:41 blacki kernel: FreeBSD 10.0-RELEASE #1 r262886M: Mon Mar =
31 21:09:43 UTC 2014
Mar 31 21:24:44 blacki kernel: pid 1443 (memcached), uid 65534: exited =
on signal 11
Mar 31 21:38:57 blacki kernel: pid 1678 (beam.smp), uid 1004: exited on =
signal 11
Apr  1 01:29:05 blacki kernel: pid 38101 (beam.smp), uid 1004: exited on =
signal 11
Apr  1 01:38:05 blacki kernel: pid 38387 (beam.smp), uid 1004: exited on =
signal 11

This box is running the Konstantin's patch since 21:24:41 and Erlang =
still crashes, yet much less frequently.
Comment 10 Vick Khera 2014-06-16 17:24:02 UTC
I can reliably reproduce the crash using poudriere to build openjdk7 on 10.0-p5. The bootstrap jdk crashes every time.
Comment 11 Ed Maste freebsd_committer freebsd_triage 2014-06-24 20:43:54 UTC
This is now addressed (by disabling PCID) in 10.0-p6
http://svnweb.freebsd.org/base?view=revision&revision=267829