Created attachment 231160 [details] Panic screenshot After updating to 12.2p12 and 12.3p1 I noticed kernel panic under heavy multi-core CPU load. As an example of heavy load is building kernel in multi-threaded mode. Affected systems: - 12.2p12 i386 - 12.3p1 i386 12.X amd64 is not affected, 13.0 is not affected at all. Tested hardware: - Virtual machine 8 vCPU 4 GB vRAM under VMWare ESXi 6.7 - HP MicroServer Gen8 Intel Xeon E3-1265Lv2 16 GB RAM - PC Intel Core i5-4690 16 GB RAM Steps to reproduce: # cd /usr/src # make -s -j`sysctl -n hw.ncpu` KERNCONF=GENERIC buildkernel And after some time the system hangs with panic like: TPTE at 0x2857f14 IS ZERO @ VA 247c5000 panic: bad pte cpuid = 7 time = 1642334372 KDB: stack backtrace: #0 0x10438ee at kdb_backtrace+0x4e #1 0xffdb68 at vpanic+0x118 #2 0xffda44 at panic+0x14 #3 0x155b6d5 at pmap_remove_pages+0x5a5 #4 0x12fceb4 at vmspace_exit+0x94 #5 0xfbe0f3 at exit1+0x593 #6 0xfbdb52 at sys_sys_exit+0x12 #7 0x1561b79 at syscall+0x3e9 #8 0xffc033e7 at PTDpde+0x43ef Additional stack info: #0 0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370 370 savectx(&dumppcb); (kgdb) #0 0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370 #1 0x00ffd831 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0x00ffdbbf in vpanic (fmt=0x15d448a "bad pte", ap=0x1ff80a10 "") at /usr/src/sys/kern/kern_shutdown.c:881 #3 0x00ffda44 in panic (fmt=0x15d448a "bad pte") at /usr/src/sys/kern/kern_shutdown.c:808 #4 0x0155b6d5 in pmap_remove_pages (pmap=0x22a0354c) at /usr/src/sys/i386/i386/pmap.c:4845 #5 0x012fceb4 in vmspace_exit (td=0x1bb57380) at /usr/src/sys/vm/vm_map.c:411 #6 0x00fbe0f3 in exit1 (td=0x1bb57380, rval=0, signo=0) at /usr/src/sys/kern/kern_exit.c:399 #7 0x00fbdb52 in sys_sys_exit (td=0x1bb57380, uap=0x1bb57604) at /usr/src/sys/kern/kern_exit.c:176 #8 0x01561b79 in syscall (frame=0x1ff80ba8) at src/sys/i386/i386/../../kern/subr_syscall.c:144 #9 0xffc033e7 in ?? () #10 0x00000033 in ?? () I made some research on the kernel code and found the problem appeared in the recent changes of SMP processing in mp_x86.c: https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd561a16ac54fdd8#diff-b34ee41e14f87fb2b18fdf77337237f336830ae88aac2a02e1c32aa45e43b4de https://reviews.freebsd.org/D33413 The problem is in the function smp_targeted_tlb_shootdown(): - sched_pin(); + KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); Under some circumstances the function is not pinned, which later causes PTE panic. I recompiled GENERIC kernel with INVARIANTS options and added the function name to the assertion text for additional info and got an immediate panic during the boot (see attached image panic_not_pinned.png). So the fix is to revert this line back: - KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); + sched_pin(); I attached the patch mp_x86.c.patch to fix the problem. After recompiling the kernel with this patch, I no longer see panics on both 12.2 and 12.3 when recompiling the kernel further.
Created attachment 231161 [details] Proposed patch
That's strange. In i386 pmap.c there is a single call to smp_masked_invlpg (from the stack in the screenshot). It's in pmap_invalidate_page() and there are sched_pin() and sched_unpin() calls around the smp_masked_invlpg() call.
Function smp_targeted_tlb_shootdown() has calls to sched_unpin(). So sched_pin() and sched_unpin() are not paired during multiple calls. Is it the problem?
(In reply to Dmitry K. from comment #3) Yes, I think that's the problem. When creating a patch for stable/12 I totally missed the fact that in main the smp_*_shootdown functions got split into i386 and amd64 variants. The i386 smp_targeted_tlb_shootdown has a paired sched_pin / sched_unpin while the amd64 smp_targeted_tlb_shootdown has only sched_unpin and expects the current thread to be already pinned. So, I merged the amd64 variant into the shared x86 smp shootdown code of stable/12. That works well with the merged amd64 pmap code, but it broke the i386 pmap code. Let me think how to fix the breakage. It seems that maybe it would be sufficient to make the sched_unpin calls in smp_targeted_tlb_shootdown specific to amd64. If I am not mistaken, on i386 smp_targeted_tlb_shootdown is always called with the current thread pinned, so internal pinning (which got removed by me) and unpinning should be unnecessary.
smp_targeted_tlb_shootdown() is invoked in smp_masked_invl*() and smp_cache_flush(). As I can see smp_masked_invl*() are invoked with correct sched_pin() / sched_unpin() order in pmap_*(). But smp_cache_flush() is invoked by pmap_invalidate_cache(), which is called in multiple places in pmap_*(). Some functions do not have outer sched_pin() / sched_unpin() guarding when calling pmap_invalidate_cache(), for example, in pmap_flush_page(). And it leads to the wrong order of pins in smp_targeted_tlb_shootdown(). So I would suggest to call sched_pin() / sched_unpin() explicitly in smp_targeted_tlb_shootdown() to make sure (as it was in the previous version).
Besides, is ok to have nested paired calls if they occur: sched_pin() ... sched_pin() ... sched_unpin() .. sched_unpin()
(In reply to Dmitry K. from comment #6) Thank you for the additional analysis. The nested pinning is indeed not a problem at all. I just thought that it was redundant but apparently it is only partially so. So, your proposed patch looks mostly good to me except that it would break amd64.
Created attachment 231173 [details] proposed patch Could you please test this patch? Would you be able to test it on a system (e.g. a VM) with a single CPU (with SMP enabled kernel)?
You are right, it would break amd64. I will test it with the following configs: - 1 core i386 - N cores i386 - 1 core amd64 - N cores amd64 It will take additional time.
Could the panics seemingly introduced with the upgrade from the 12.2-RELEASE-p7 to 12.2-RELEASE-p12 kernel reported in Bug 261233 be related to this or are they a different issue ?
(In reply to Richard Frewin from comment #10) Looks sufficiently similar. Could you please test the patch as well?
Will take some time as I don't have an i386 build environment set up - been relying on freebsd-update ...
Created attachment 231174 [details] Proposed patch (i386 & amd64)
(In reply to Andriy Gapon from comment #8) I modified patch for i386 and amd64. Assertion for amd64 is moved before if statement to detect unpinned calls and calling sched_unpin() during debug with INVARIANTS. Will test it later.
"and PREVENT calling sched_unpin()"
(In reply to Dmitry K. from comment #14) I don't think that the change in position of that KASSERT actually matters. Also, it does not need to be guarded by ifdef amd64 as long as it's i386's sched_pin, but that does not hurt either.
(In reply to Andriy Gapon from comment #16) Changing position of KASSERT will help to debug the function under amd64. Imagine for some reason smp_targeted_tlb_shootdown() is called unpinned and the first condition is satisfied (active kernel debug or booting). In this case we go to local_cb and shed_unpin() is called. If KASSERT is placed after the if statement, we will not receive the descriptive panic message "curthread not pinned" and unpinning will expose strange panics again. My tests are finished successfully. I tested the latest patch under i386 and amd64 with 1 and 4 vCPUs building the kernel with -j4.
(In reply to Dmitry K. from comment #17) Thank you for all your work on this issue!
I installed a VirtualBox VM from 12.2 i386 ISO and used freebsd-update to update to 12.2-p12. At this point I tried to recreate problem but could not get it to panic as I'm seeing on bare-metal systems. Then built a custom kernel with the patch in Attachment 231174 [details]. By the way, that applied with "Hunk #1 succeeded at 1663 (offset -6 lines)." I've copied the resultant /boot/kernel to a production system and after a reboot (caused by another panic) it's now running the patched kernel. Initial tests suggest it's fixed the issue, as it didn't panic during a couple of forced backups.
The patch in attachment 231174 [details] also seems to fix bug #261284, a installkernel/installworld now finished successfully. However, could it be that multiprocessor networking has taken a hit? Doing a lot of parallel RPC calls (in fact, showmount -e) from the i386 VM to multiple external hosts produces a load spike which I do not remember to have seen previously. -- Martin
*** Bug 261284 has been marked as a duplicate of this bug. ***
Created attachment 231245 [details] Proposed patch v2 (i386 & amd64)
(In reply to Andriy Gapon from comment #18) Andriy, could you take a look at the latest patch #231245? I moved sched_pin() from pmap functions to smp_targeted_tlb_shootdown(). It looks more consistent now without ifdefs. All the test are ok.
(In reply to Martin Birgmeier from comment #20) I think it would be a good idea to create a new bug report with additional info about before and after the regression.
(In reply to Dmitry K. from comment #23) This version is no go. amd64 has pinning in pmap for a good reason. If you want to make the patch "cleaner", then it would be better to move in the opposite direction: add missing sched pins in the i386 code.
Comment on attachment 231245 [details] Proposed patch v2 (i386 & amd64) NOT WORKING
(In reply to Martin Birgmeier from comment #20) > However, could it be that multiprocessor networking has taken a hit? What versions of FreeBSD are you comparing? Could you please re-test and confirm?
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c commit e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c Author: Andriy Gapon <avg@FreeBSD.org> AuthorDate: 2022-01-25 10:34:08 +0000 Commit: Andriy Gapon <avg@FreeBSD.org> CommitDate: 2022-01-25 10:34:08 +0000 smp_targeted_tlb_shootdown has to pin the CPU on i386 This should fix a regression in 1820ca215461 which happened because pmap -> shootdown contracts on amd64 and i386 diverged. On amd64 the pmap code always pins the CPU before calling the shootdown code and expects it to unpin on return. On i386 the pmap code either has pins and unpins around the shootdown calls or does not pin at all. This change should account for that difference. In main and stable/13 the contracts are also different, but the shootdown code is split into the i386 and amd64 variants and each variant is tailored towards the platform's pmap. PR: 261338 Reported by: Dmitry K. <thedix@yandex.ru> Debugged by: Dmitry K. <thedix@yandex.ru> Tested by: Dmitry K. <thedix@yandex.ru> Fixes: 1820ca215461 MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations Reviewed by: kib X-Pointyhat to: avg Differential Revision: https://reviews.freebsd.org/D33980 sys/x86/x86/mp_x86.c | 4 ++++ 1 file changed, 4 insertions(+)
We want to issue an erratum for this.
(In reply to Andriy Gapon from comment #25) Regarding making the patch more "cleaner". I'm not sure I can make these changes in i386 fast enough. Perhaps this can be rewritten in the future. As I see you committed changes to stable/12. Are you considering to move KASSERT before if statement for the reason described in comment #17?
(In reply to Dmitry K. from comment #30) No. I don't think that that is important enough.
(In reply to Andriy Gapon from comment #27) Please forget it. It was only a subjective impression and most likely not really substantial. Sorry for the noise. -- Martin
A commit in branch releng/12.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=feac4f1bdf22742885fcc94aad5b96e053e93fa9 commit feac4f1bdf22742885fcc94aad5b96e053e93fa9 Author: Andriy Gapon <avg@FreeBSD.org> AuthorDate: 2022-01-25 10:34:08 +0000 Commit: Gordon Tetlow <gordon@FreeBSD.org> CommitDate: 2022-02-01 17:54:53 +0000 smp_targeted_tlb_shootdown has to pin the CPU on i386 This should fix a regression in 1820ca215461 which happened because pmap -> shootdown contracts on amd64 and i386 diverged. On amd64 the pmap code always pins the CPU before calling the shootdown code and expects it to unpin on return. On i386 the pmap code either has pins and unpins around the shootdown calls or does not pin at all. This change should account for that difference. In main and stable/13 the contracts are also different, but the shootdown code is split into the i386 and amd64 variants and each variant is tailored towards the platform's pmap. PR: 261338 Reported by: Dmitry K. <thedix@yandex.ru> Debugged by: Dmitry K. <thedix@yandex.ru> Tested by: Dmitry K. <thedix@yandex.ru> Fixes: 1820ca215461 MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations Reviewed by: kib X-Pointyhat to: avg Differential Revision: https://reviews.freebsd.org/D33980 (cherry picked from commit e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c) Approved by: so Security: FreeBSD-EN-22:08.i386 sys/x86/x86/mp_x86.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch releng/12.2 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1e71fa9e4150431d665ab9a82251aada6933cf4a commit 1e71fa9e4150431d665ab9a82251aada6933cf4a Author: Andriy Gapon <avg@FreeBSD.org> AuthorDate: 2022-01-25 10:34:08 +0000 Commit: Gordon Tetlow <gordon@FreeBSD.org> CommitDate: 2022-02-01 17:50:43 +0000 smp_targeted_tlb_shootdown has to pin the CPU on i386 This should fix a regression in 1820ca215461 which happened because pmap -> shootdown contracts on amd64 and i386 diverged. On amd64 the pmap code always pins the CPU before calling the shootdown code and expects it to unpin on return. On i386 the pmap code either has pins and unpins around the shootdown calls or does not pin at all. This change should account for that difference. In main and stable/13 the contracts are also different, but the shootdown code is split into the i386 and amd64 variants and each variant is tailored towards the platform's pmap. PR: 261338 Reported by: Dmitry K. <thedix@yandex.ru> Debugged by: Dmitry K. <thedix@yandex.ru> Tested by: Dmitry K. <thedix@yandex.ru> Fixes: 1820ca215461 MFC r368649 / 3fd989da by kib: amd64 pmap: fix PCID mode invalidations Reviewed by: kib X-Pointyhat to: avg Differential Revision: https://reviews.freebsd.org/D33980 (cherry picked from commit e0cc1ce7c0866d6a5c42ef09cfca9582c4a8343c) Approved by: so Security: FreeBSD-EN-22:08.i386 sys/x86/x86/mp_x86.c | 4 ++++ 1 file changed, 4 insertions(+)
Thank you very much, Gordon.