Bug 261338 - [PATCH] kernel panic "bad pte" under heavy CPU load on 12.2 and 12.3 (i386)
Summary: [PATCH] kernel panic "bad pte" under heavy CPU load on 12.2 and 12.3 (i386)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.3-RELEASE
Hardware: i386 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash, patch
: 261284 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-01-19 16:15 UTC by Dmitry K.
Modified: 2022-10-12 00:50 UTC (History)
8 users (show)

See Also:


Attachments
Panic screenshot (185.19 KB, image/png)
2022-01-19 16:15 UTC, Dmitry K.
no flags Details
Proposed patch (323 bytes, patch)
2022-01-19 16:16 UTC, Dmitry K.
no flags Details | Diff
proposed patch (431 bytes, patch)
2022-01-20 10:31 UTC, Andriy Gapon
no flags Details | Diff
Proposed patch (i386 & amd64) (578 bytes, patch)
2022-01-20 11:58 UTC, Dmitry K.
no flags Details | Diff
Proposed patch v2 (i386 & amd64) (1.37 KB, patch)
2022-01-23 05:48 UTC, Dmitry K.
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry K. 2022-01-19 16:15:23 UTC
Created attachment 231160 [details]
Panic screenshot

After updating to 12.2p12 and 12.3p1 I noticed kernel panic under heavy multi-core CPU load.
As an example of heavy load is building kernel in multi-threaded mode.

Affected systems:
- 12.2p12 i386
- 12.3p1 i386

12.X amd64 is not affected, 13.0 is not affected at all.

Tested hardware:
- Virtual machine 8 vCPU 4 GB vRAM under VMWare ESXi 6.7
- HP MicroServer Gen8 Intel Xeon E3-1265Lv2 16 GB RAM
- PC Intel Core i5-4690 16 GB RAM

Steps to reproduce:
# cd /usr/src
# make -s -j`sysctl -n hw.ncpu` KERNCONF=GENERIC buildkernel

And after some time the system hangs with panic like:
TPTE at 0x2857f14  IS ZERO @ VA 247c5000
panic: bad pte
cpuid = 7
time = 1642334372
KDB: stack backtrace:
#0 0x10438ee at kdb_backtrace+0x4e
#1 0xffdb68 at vpanic+0x118
#2 0xffda44 at panic+0x14
#3 0x155b6d5 at pmap_remove_pages+0x5a5
#4 0x12fceb4 at vmspace_exit+0x94
#5 0xfbe0f3 at exit1+0x593
#6 0xfbdb52 at sys_sys_exit+0x12
#7 0x1561b79 at syscall+0x3e9
#8 0xffc033e7 at PTDpde+0x43ef

Additional stack info:
#0  0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370
370		savectx(&dumppcb);
(kgdb) #0  0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370
#1  0x00ffd831 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:452
#2  0x00ffdbbf in vpanic (fmt=0x15d448a "bad pte", ap=0x1ff80a10 "")
    at /usr/src/sys/kern/kern_shutdown.c:881
#3  0x00ffda44 in panic (fmt=0x15d448a "bad pte")
    at /usr/src/sys/kern/kern_shutdown.c:808
#4  0x0155b6d5 in pmap_remove_pages (pmap=0x22a0354c)
    at /usr/src/sys/i386/i386/pmap.c:4845
#5  0x012fceb4 in vmspace_exit (td=0x1bb57380) at /usr/src/sys/vm/vm_map.c:411
#6  0x00fbe0f3 in exit1 (td=0x1bb57380, rval=0, signo=0)
    at /usr/src/sys/kern/kern_exit.c:399
#7  0x00fbdb52 in sys_sys_exit (td=0x1bb57380, uap=0x1bb57604)
    at /usr/src/sys/kern/kern_exit.c:176
#8  0x01561b79 in syscall (frame=0x1ff80ba8)
    at src/sys/i386/i386/../../kern/subr_syscall.c:144
#9  0xffc033e7 in ?? ()
#10 0x00000033 in ?? ()

I made some research on the kernel code and found the problem appeared in the recent changes of SMP processing in mp_x86.c:
https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd561a16ac54fdd8#diff-b34ee41e14f87fb2b18fdf77337237f336830ae88aac2a02e1c32aa45e43b4de
https://reviews.freebsd.org/D33413

The problem is in the function smp_targeted_tlb_shootdown():
-	sched_pin();
+	KASSERT(curthread->td_pinned > 0, ("curthread not pinned"));
Under some circumstances the function is not pinned, which later causes PTE panic.
I recompiled GENERIC kernel with INVARIANTS options and added the function name to the assertion text for additional info and got an immediate panic during the boot (see attached image panic_not_pinned.png).

So the fix is to revert this line back:
-	KASSERT(curthread->td_pinned > 0, ("curthread not pinned"));
+	sched_pin();

I attached the patch mp_x86.c.patch to fix the problem.
After recompiling the kernel with this patch, I no longer see panics on both 12.2 and 12.3 when recompiling the kernel further.
Comment 1 Dmitry K. 2022-01-19 16:16:00 UTC
Created attachment 231161 [details]
Proposed patch
Comment 2 Andriy Gapon freebsd_committer freebsd_triage 2022-01-19 16:44:35 UTC
That's strange.
In i386 pmap.c there is a single call to smp_masked_invlpg (from the stack in the screenshot). It's in pmap_invalidate_page() and there are sched_pin() and sched_unpin() calls around the smp_masked_invlpg() call.
Comment 3 Dmitry K. 2022-01-19 17:32:15 UTC
Function smp_targeted_tlb_shootdown() has calls to sched_unpin(). So sched_pin() and sched_unpin() are not paired during multiple calls. Is it the problem?
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2022-01-19 17:52:28 UTC
(In reply to Dmitry K. from comment #3)
Yes, I think that's the problem.
When creating a patch for stable/12 I totally missed the fact that in main the smp_*_shootdown functions got split into i386 and amd64 variants.  The i386 smp_targeted_tlb_shootdown has a paired sched_pin / sched_unpin while the amd64 smp_targeted_tlb_shootdown has only sched_unpin and expects the current thread to be already pinned.

So, I merged the amd64 variant into the shared x86 smp shootdown code of stable/12.  That works well with the merged amd64 pmap code, but it broke the i386 pmap code.

Let me think how to fix the breakage.
It seems that maybe it would be sufficient to make the sched_unpin calls in smp_targeted_tlb_shootdown specific to amd64.  If I am not mistaken, on i386 smp_targeted_tlb_shootdown is always called with the current thread pinned, so internal pinning (which got removed by me) and unpinning should be unnecessary.
Comment 5 Dmitry K. 2022-01-20 04:34:14 UTC
smp_targeted_tlb_shootdown() is invoked in smp_masked_invl*() and smp_cache_flush(). 

As I can see smp_masked_invl*() are invoked with correct sched_pin() / sched_unpin() order in pmap_*().

But smp_cache_flush() is invoked by pmap_invalidate_cache(), which is called in multiple places in pmap_*(). Some functions do not have outer sched_pin() / sched_unpin() guarding when calling pmap_invalidate_cache(), for example, in pmap_flush_page(). And it leads to the wrong order of pins in smp_targeted_tlb_shootdown().

So I would suggest to call sched_pin() / sched_unpin() explicitly in smp_targeted_tlb_shootdown() to make sure (as it was in the previous version).
Comment 6 Dmitry K. 2022-01-20 04:42:26 UTC
Besides, is ok to have nested paired calls if they occur:
sched_pin()
...
sched_pin()
...
sched_unpin()
..
sched_unpin()
Comment 7 Andriy Gapon freebsd_committer freebsd_triage 2022-01-20 10:29:08 UTC
(In reply to Dmitry K. from comment #6)
Thank you for the additional analysis.

The nested pinning is indeed not a problem at all.
I just thought that it was redundant but apparently it is only partially so.

So, your proposed patch looks mostly good to me except that it would break amd64.
Comment 8 Andriy Gapon freebsd_committer freebsd_triage 2022-01-20 10:31:38 UTC
Created attachment 231173 [details]
proposed patch

Could you please test this patch?

Would you be able to test it on a system (e.g. a VM) with a single CPU (with SMP enabled kernel)?
Comment 9 Dmitry K. 2022-01-20 10:36:06 UTC
You are right, it would break amd64.
I will test it with the following configs:
- 1 core i386
- N cores i386
- 1 core amd64
- N cores amd64
It will take additional time.
Comment 10 Richard Frewin 2022-01-20 11:23:10 UTC
Could the panics seemingly introduced with the upgrade from the 12.2-RELEASE-p7 to 12.2-RELEASE-p12 kernel reported in Bug 261233 be related to this or are they a different issue ?
Comment 11 Andriy Gapon freebsd_committer freebsd_triage 2022-01-20 11:24:43 UTC
(In reply to Richard Frewin from comment #10)
Looks sufficiently similar.
Could you please test the patch as well?
Comment 12 Richard Frewin 2022-01-20 11:34:28 UTC
Will take some time as I don't have an i386 build environment set up - been relying on freebsd-update ...
Comment 13 Dmitry K. 2022-01-20 11:58:50 UTC
Created attachment 231174 [details]
Proposed patch (i386 & amd64)
Comment 14 Dmitry K. 2022-01-20 12:01:20 UTC
(In reply to Andriy Gapon from comment #8)
I modified patch for i386 and amd64. Assertion for amd64 is moved before if statement to detect unpinned calls and calling sched_unpin() during debug with INVARIANTS.
Will test it later.
Comment 15 Dmitry K. 2022-01-20 12:06:09 UTC
"and PREVENT calling sched_unpin()"
Comment 16 Andriy Gapon freebsd_committer freebsd_triage 2022-01-20 12:11:25 UTC
(In reply to Dmitry K. from comment #14)
I don't think that the change in position of that KASSERT actually matters.
Also, it does not need to be guarded by ifdef amd64 as long as it's i386's sched_pin, but that does not hurt either.
Comment 17 Dmitry K. 2022-01-20 16:11:38 UTC
(In reply to Andriy Gapon from comment #16)
Changing position of KASSERT will help to debug the function under amd64. Imagine for some reason smp_targeted_tlb_shootdown() is called unpinned and the first condition is satisfied (active kernel debug or booting). In this case we go to local_cb and shed_unpin() is called. If KASSERT is placed after the if statement, we will not receive the descriptive panic message "curthread not pinned" and unpinning will expose strange panics again.

My tests are finished successfully.
I tested the latest patch under i386 and amd64 with 1 and 4 vCPUs building the kernel with -j4.
Comment 18 Andriy Gapon freebsd_committer freebsd_triage 2022-01-21 07:06:33 UTC
(In reply to Dmitry K. from comment #17)
Thank you for all your work on this issue!
Comment 19 Richard Frewin 2022-01-21 12:57:22 UTC
I installed a VirtualBox VM from 12.2 i386 ISO and used freebsd-update to update to 12.2-p12.  At this point I tried to recreate problem but could not get it to panic as I'm seeing on bare-metal systems.

Then built a custom kernel with the patch in Attachment 231174 [details].  By the way, that applied with "Hunk #1 succeeded at 1663 (offset -6 lines)."

I've copied the resultant /boot/kernel to a production system and after a reboot (caused by another panic) it's now running the patched kernel.  Initial tests suggest it's fixed the issue, as it didn't panic during a couple of forced backups.
Comment 20 Martin Birgmeier 2022-01-21 19:47:32 UTC
The patch in attachment 231174 [details] also seems to fix bug #261284, a installkernel/installworld now finished successfully.

However, could it be that multiprocessor networking has taken a hit? Doing a lot of parallel RPC calls (in fact, showmount -e) from the i386 VM to multiple external hosts produces a load spike which I do not remember to have seen previously.

-- Martin
Comment 21 Martin Birgmeier 2022-01-22 07:43:54 UTC
*** Bug 261284 has been marked as a duplicate of this bug. ***
Comment 22 Dmitry K. 2022-01-23 05:48:16 UTC
Created attachment 231245 [details]
Proposed patch v2 (i386 & amd64)
Comment 23 Dmitry K. 2022-01-23 05:53:06 UTC
(In reply to Andriy Gapon from comment #18)
Andriy, could you take a look at the latest patch #231245?
I moved sched_pin() from pmap functions to smp_targeted_tlb_shootdown().
It looks more consistent now without ifdefs.
All the test are ok.
Comment 24 Dmitry K. 2022-01-23 06:10:17 UTC
(In reply to Martin Birgmeier from comment #20)
I think it would be a good idea to create a new bug report with additional info about before and after the regression.
Comment 25 Andriy Gapon freebsd_committer freebsd_triage 2022-01-23 10:16:07 UTC
(In reply to Dmitry K. from comment #23)
This version is no go.
amd64 has pinning in pmap for a good reason.
If you want to make the patch "cleaner", then it would be better to move in the opposite direction: add missing sched pins in the i386 code.
Comment 26 Dmitry K. 2022-01-23 13:57:11 UTC
Comment on attachment 231245 [details]
Proposed patch v2 (i386 & amd64)

NOT WORKING
Comment 27 Andriy Gapon freebsd_committer freebsd_triage 2022-01-25 10:38:20 UTC
(In reply to Martin Birgmeier from comment #20)
> However, could it be that multiprocessor networking has taken a hit?

What versions of FreeBSD are you comparing?
Could you please re-test and confirm?
Comment 28 commit-hook freebsd_committer freebsd_triage 2022-01-25 10:39:26 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c

commit e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c
Author:     Andriy Gapon <avg@FreeBSD.org>
AuthorDate: 2022-01-25 10:34:08 +0000
Commit:     Andriy Gapon <avg@FreeBSD.org>
CommitDate: 2022-01-25 10:34:08 +0000

    smp_targeted_tlb_shootdown has to pin the CPU on i386

    This should fix a regression in 1820ca215461 which happened
    because pmap -> shootdown contracts on amd64 and i386 diverged.
    On amd64 the pmap code always pins the CPU before calling the shootdown
    code and expects it to unpin on return.
    On i386 the pmap code either has pins and unpins around the shootdown
    calls or does not pin at all.
    This change should account for that difference.

    In main and stable/13 the contracts are also different, but the
    shootdown code is split into the i386 and amd64 variants and each
    variant is tailored towards the platform's pmap.

    PR:             261338
    Reported by:    Dmitry K. <thedix@yandex.ru>
    Debugged by:    Dmitry K. <thedix@yandex.ru>
    Tested by:      Dmitry K. <thedix@yandex.ru>
    Fixes:  1820ca215461 MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations
    Reviewed by:    kib
    X-Pointyhat to: avg
    Differential Revision:  https://reviews.freebsd.org/D33980

 sys/x86/x86/mp_x86.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 29 Andriy Gapon freebsd_committer freebsd_triage 2022-01-25 10:42:24 UTC
We want to issue an erratum for this.
Comment 30 Dmitry K. 2022-01-25 11:06:53 UTC
(In reply to Andriy Gapon from comment #25)
Regarding making the patch more "cleaner".
I'm not sure I can make these changes in i386 fast enough.
Perhaps this can be rewritten in the future.

As I see you committed changes to stable/12.
Are you considering to move KASSERT before if statement for the reason described in comment #17?
Comment 31 Andriy Gapon freebsd_committer freebsd_triage 2022-01-25 11:11:33 UTC
(In reply to Dmitry K. from comment #30)
No. I don't think that that is important enough.
Comment 32 Martin Birgmeier 2022-01-25 16:30:02 UTC
(In reply to Andriy Gapon from comment #27)

Please forget it. It was only a subjective impression and most likely not really substantial.

Sorry for the noise.

-- Martin
Comment 33 commit-hook freebsd_committer freebsd_triage 2022-02-01 19:12:29 UTC
A commit in branch releng/12.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=feac4f1bdf22742885fcc94aad5b96e053e93fa9

commit feac4f1bdf22742885fcc94aad5b96e053e93fa9
Author:     Andriy Gapon <avg@FreeBSD.org>
AuthorDate: 2022-01-25 10:34:08 +0000
Commit:     Gordon Tetlow <gordon@FreeBSD.org>
CommitDate: 2022-02-01 17:54:53 +0000

    smp_targeted_tlb_shootdown has to pin the CPU on i386

    This should fix a regression in 1820ca215461 which happened
    because pmap -> shootdown contracts on amd64 and i386 diverged.
    On amd64 the pmap code always pins the CPU before calling the shootdown
    code and expects it to unpin on return.
    On i386 the pmap code either has pins and unpins around the shootdown
    calls or does not pin at all.
    This change should account for that difference.

    In main and stable/13 the contracts are also different, but the
    shootdown code is split into the i386 and amd64 variants and each
    variant is tailored towards the platform's pmap.

    PR:             261338
    Reported by:    Dmitry K. <thedix@yandex.ru>
    Debugged by:    Dmitry K. <thedix@yandex.ru>
    Tested by:      Dmitry K. <thedix@yandex.ru>
    Fixes:  1820ca215461 MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations
    Reviewed by:    kib
    X-Pointyhat to: avg
    Differential Revision:  https://reviews.freebsd.org/D33980

    (cherry picked from commit e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c)

    Approved by:    so
    Security:       FreeBSD-EN-22:08.i386

 sys/x86/x86/mp_x86.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 34 commit-hook freebsd_committer freebsd_triage 2022-02-01 19:13:32 UTC
A commit in branch releng/12.2 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1e71fa9e4150431d665ab9a82251aada6933cf4a

commit 1e71fa9e4150431d665ab9a82251aada6933cf4a
Author:     Andriy Gapon <avg@FreeBSD.org>
AuthorDate: 2022-01-25 10:34:08 +0000
Commit:     Gordon Tetlow <gordon@FreeBSD.org>
CommitDate: 2022-02-01 17:50:43 +0000

    smp_targeted_tlb_shootdown has to pin the CPU on i386

    This should fix a regression in 1820ca215461 which happened
    because pmap -> shootdown contracts on amd64 and i386 diverged.
    On amd64 the pmap code always pins the CPU before calling the shootdown
    code and expects it to unpin on return.
    On i386 the pmap code either has pins and unpins around the shootdown
    calls or does not pin at all.
    This change should account for that difference.

    In main and stable/13 the contracts are also different, but the
    shootdown code is split into the i386 and amd64 variants and each
    variant is tailored towards the platform's pmap.

    PR:             261338
    Reported by:    Dmitry K. <thedix@yandex.ru>
    Debugged by:    Dmitry K. <thedix@yandex.ru>
    Tested by:      Dmitry K. <thedix@yandex.ru>
    Fixes:  1820ca215461 MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations
    Reviewed by:    kib
    X-Pointyhat to: avg
    Differential Revision:  https://reviews.freebsd.org/D33980

    (cherry picked from commit e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c)

    Approved by:    so
    Security:       FreeBSD-EN-22:08.i386

 sys/x86/x86/mp_x86.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 35 Andriy Gapon freebsd_committer freebsd_triage 2022-02-01 20:37:24 UTC
Thank you very much, Gordon.