Bug 248659

Summary: Specific generations of intel chips lock up with P-states active
Product: Base System Reporter: Dries Michiels <driesm.michiels>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Many People CC: cem, chris, fbsd_bugzilla, grahamperrin, oleg.nauman, pi, rkoberman, sdalu, serzh, sreeharisreedev1
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
URL: https://lists.freebsd.org/pipermail/freebsd-current/2020-September/077050.html

Description Dries Michiels 2020-08-14 18:27:06 UTC
When doing very low intensive stuff (like checking out the ports tree over WIFI g at like 6 MB/s) on my laptop (Lenovo T490) running KDE5, my disk IO seems to partially stall for a few seconds and then just continue. I have also observed frequent system freezes/deadlocks with no debug info what so ever, no kernel dump, etc. I'm leaning towards VFS deadlocks although I'm kind off in the shadow on how to proceed further to debug this issue. The reason I suspect the VFS stack is that I also observe the random freezes when not running any GUI, just console based interaction. Maybe a driver that I'm loading?

See video link for what I'm experiencing (https://youtu.be/1_ll4OBefjo).

I am running 13-CURRENT and have a Samsung NVMe drive (PM981a) running the UFS file system. I have disable SU-journaling but doesn't seem to help. I really like the look and feel of KDE5 on my laptop although its just not usable in this state with frequent data loss due to the system freezes and hard resets. Trim is disabled through tunefs on this drive as I've read that could be a cause for the problem I describe. Although any of the filesystem settings don't seem to help.

I'd very much appreciate someone bearing with me into debugging this issue.
I have also tried disabling all debugging related features in the kernel that should be disabled in stable branches but that didn't help either, so will probably reenable to debug.
Comment 1 Dries Michiels 2020-08-14 18:35:17 UTC
Re uploaded the video in better quality.
Comment 2 Dries Michiels 2020-08-19 07:00:40 UTC
The nvme hick-ups are resolved after disabling trim (probably it was being trashed with trim requests).
I still have the random system freezes and have tried to limit the scope of it.
ATM the system freezes also occur in single user mode (RO mount of my rootfs).
Comment 3 rkoberman 2020-09-12 21:09:19 UTC
Seeing the same issue on my Lenovo L15 running current on a Comet Lake CPU. Problem occurs in single-user mode and is mitigated by keyboard activity. This is a regression as the problem does not occur on 12.1-RELEASE-P8. Problem has occurred twice during boot, most recently after lo0 came up but before em0 or rtwn were started. Happens when system is busy (compiles on all cores) or idle. GENERIC-NODEBUG kernel except using SCHED_4BSD. Will shortly build GENERIC-SCHED_4BSD.

System gets very warm after the freeze and then cools, probably due to the firmware slowing the processor. This seems to indicate a possible loop in the kernel.

As long as I keep the keyboard reasonably active, the system does not seem to hang. While several times the freeze occurred when I was not closely watching the system, I am not aware of the system staying up longer than 10 minutes.

When running in graphic mode (X11/MATE), it has frozen when I was typing on at least one occasion. This could have been simple bad luck. Seems somewhat more stable when in graphic mode. 

From dmesg:
FreeBSD 13.0-CURRENT #2 r365481M: Tue Sep  8 20:16:02 PDT 2020
    root@ptavv:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64
FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86)reeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86)
VT(efifb): resolution 1920x1080
CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (2112.11-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0xa0660  Family=0x6  Model=0xa6  Stepping=0
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0x29c67af<FSGSBASE,TSCADJ,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,NFPUSG,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PROCTRACE>
  Structured Extended Features3=0xbc000400<MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0x2b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 4294967296 (4096 MB)
avail memory = 3746226176 (3572 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <LENOVO TP-R17  >
Comment 4 Dries Michiels 2020-09-12 21:24:07 UTC
I can confirm that this summarizes my issue perfectly.
Comment 5 rkoberman 2020-09-17 23:31:54 UTC
I've continued to analyze the problem. Don't know if this will help track it soen, but I have noted the following:
System is substantially more stable under X (Mate) than under just VT. Once I start Mate, I often have the system stay up and running for over an hour. When X11 is not running, I have not seen the system stay operational for over 10 minutes when the keyboard is not active. Also, when working on X, I have had the system lock up even when the keyboard is active.

I have not tested the reliability of the system when using a vty while X is also running.

The freeze is not instantaneous. With my system monitor running and a bulk disk to disk data transfer running (rsync of 190GB of media files) from a USB disk to the system disk, I note that the transfer slow dramatically a few seconds before the complete freeze. The write rate slowly declines from over 50MBps to zero. When it reaches zero, the system may be barely alive. I have managed to do a sync(1) once or twice which greatly reduces the number of corrections needed by fsck when I reboot. Even when the keyboard stops responding, my system monitor (gkrellm) will continue to update for a few seconds and on a couple of occasions, Ctrl-C to a frozen process caused the system to become responsive again for a couple of seconds with the system monitor updating and commands typed but not even echoed to appear in terminal windows.

My system has only 4B of RAM, so is rather restricted. Have not been able to even try running a VM on it. I have 16 G on order, but lost in the mail.

The system ahd a WD Black 2TB drive. Could a drive issue be at the root? The initial report wee also on a fairly recent Lenovo system. Could a bad disk batch be the trigger? But, if it is a disk issue, why no problems on 12.1?
Comment 6 rkoberman 2020-09-23 23:19:13 UTC
I had an epiphany yesterday and may have figured something that may help track this down. It appears that this is likely tied to the system disk which is an ATAPI (SATA) drive.

I can keep the system from freezing indefinitely by either typing or moving the cursor. I realized that these two devices (kbd and psm) are , to the best of my knowledge, the last physical devices still GIANT locked ad the last ISA devices, as well. It is my suspicion that the GIANT lock happening now and then causes something to clear out that will eventually livelock the system.

I previously noted that the disk transfer rate would deteriorate over time, leading to a livelock. I can now report that, if I see the transfer rate declining, I can suspend the job and, after a few seconds, the system returns to normal. If I let the problem continue for more than a few seconds, the keyboard will be locked up and the system will be livelocked and require a power down.

I am unable to un-tar teh firefox source tarball or any other large tarball. Even with typing or moving the mouse, the transfer rate will start declining. I can suspend tat, but it sems to start declining again very soon whne the tar is resumed and I was unable to complete the restore. I have seen similar behavior with other large tarballs. Oddly, when copying from USB disk to system disk, I see similar issues, but a suspend seems to allow then to return to full speed for a while upon resume. Copies on the system disk to the system disk seem to be the worst problem.

It also appears that an inactive system usually does not lock up. I can boot to the single-user or the login prompt and the system will remain usable for a long time. It does  seem to eventually lock up, but it can take hours. It had never locked until I left it at the single-user prompt after it finished fscking the system. When I get back to the system today, after about 13 hours, it was frozen.

Any suggestions on tracking this down would really be appreciated as this system is replacing an old one which has a failing fan and may become useless at any time.
Comment 7 rkoberman 2021-01-14 06:29:27 UTC
This has been going on for quite a while. Always Lenovo systems running CURRENT, but different models (X1 Carbon, T490, and L15). I'd really like to know if all are running the similar processors. Can others post their CPU model? On a more human readable level, it's a 10th Gen. i5-10210U CPU (1.60 GHz) and a Comet Lake graphics processor.

I previously reported a disk slow-down. This turned out to be a bad disk drive. The disk was literally slowing down. A hot smell led to tracking that down. It's been replaced and the slow-down is gone.
Comment 8 Marco 2021-01-14 22:19:25 UTC
Chipping in for my ThinkPad X1 Carbon 7th gen (type 20QDCTO1WW) which has the following:

CPU : i7-8665U (1.90 Ghz)
Intel UHD Graphics 620 (Whiskey Lake)
ACPI APIC table: <LENOVO TP-N2H  >

Have been trying the latest 13-CURRENT AMD64 memstick images but all of them have the same behavior where the system just freezes at random intervals using the default English keyboard mapping whilst it being idle.
Sometimes it can run for 7 hours idle whereas I've also seen it locking up after a minute after just started the live CD.

Downgrading to the latest 12.2-STABLE memstick.img (r368787) has helped.
System hasn't locked up/froze yet after more than 24 hours of idle running.
Comment 9 Dries Michiels 2021-01-15 17:03:32 UTC
My Laptop T490

CPU : i5-8365U (1.60 Ghz)
Intel UHD Graphics 620 (Whiskey Lake)
ACPI APIC table: <LENOVO TP-N2I  >

Meanwhile I found this wiki page:
https://wiki.freebsd.org/Laptops/Thinkpad_X1_Carbon
Which mentions that 13-CURRENT freezes what appears to be a random amount of time. Possible workaround hint.hwpstate_intel.0.disabled="1" in loader.conf. I'm testing this workaround as we speak.
Comment 10 rkoberman 2021-01-16 02:00:45 UTC
So, two Whiskey Lake systems. Mine is a Comet Lake and I am now have a verified hardware issue with the keyboard, so I suspect is that my system, while demonstrating similar symptoms, is actually entirely a hardware issue. The main board and disk have been replaced and I will be sending it in to get the keyboard/mousepad replaced. I will update if that turns out to be the issue.

It is conceivable that there is an unusual keyboard failure mode that might impact other systems.
Comment 11 Dries Michiels 2021-01-18 08:37:08 UTC
Did you ever test 12-STABLE? That is how I ruled out a hardware issue. My system was perfectly stable on 12-STABLE but not on 13-CURRENT. Marco has observed the same.
Comment 12 rkoberman 2021-01-18 15:52:51 UTC
I did try 12.2 and did not see this issue, but I tried it only briefly as I discovered that 12.2 did not support my graphics. As the problem has deteriorated over time, it may be just luck that I did not see a problem during the short time I had 12.2 running. When I get my system back, I plan to install 12.2 and update to 12-STABLE immediately. I can't do anything until then as I generally can't even break into BIOS Setup or boot to single user. On the one case where I made it into BIOS setup, it prompted me to reload default configuration and I discovered that neither the mousepad or the TrackPoint worked. I could move the pointer for a couple of seconds and then it jumped back to the lower left corner of the screen.

It is, of course, possible that 13 is sensitive to the keyboard/mousepad issue and 12.2 didn't hang even though I would still call that a regression.
Comment 13 Marco 2021-01-24 21:24:43 UTC
(In reply to Dries Michiels from comment #9)

How did your test with hint.hwpstate_intel.0.disabled="1" in loader.conf go?
Based on the info on https://gist.github.com/AnotherKamila/c14c3ebd66ac6a25c0193f8e103e66e3 they speak of running 13-CURRENT with hint.hwpstate_intel.0.disabled="1" as a workaround on both a gen7 and gen8 X1 Carbon without it hard-wedging.
Comment 14 Dries Michiels 2021-01-24 21:29:43 UTC
I have not ran into any freezes yet, so for now I'd say successful.
Comment 15 rkoberman 2021-01-29 07:32:16 UTC
My L15 arrived today with a new keyboard/mousepad. I set hint.hwpstate_intel.0.disabled="1" in loader.conf and I have has no freezes. I am currently updating the system to 13-STABLE and, once that is done and the system is stable, I will remove the hint and see if freezes reappear. If they do, we can add my 10th gen i5 to systems that need this tweek. I'm also into look into just what this hint does. Certainly hardware issues can lock up the system.
Comment 16 Marco 2021-02-01 21:43:19 UTC
I did the src upgrade from the 12.2-STABLE snapshot to 
13.0-ALPHA3 #0 stable/13-c256220-g76dd854f47f4

$ dmesg | head -50
---<<BOOT>>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-ALPHA3 #0 stable/13-c256220-g76dd854f47f4: Sun Jan 31 15:10:40 UTC 2021
    root@harbinger.fritz.box:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)
VT(efifb): resolution 3840x2160
CPU: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (2112.08-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x806ec  Family=0x6  Model=0x8e  Stepping=12
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7ffafbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0x29c6fbf<FSGSBASE,TSCADJ,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,NFPUSG,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PROCTRACE>
  Structured Extended Features3=0xbc000600<MCUOPT,MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0xab<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO,TSX_CTRL>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16221478912 (15470 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <LENOVO TP-N2H  >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads

Without hint.hwpstate_intel.0.disabled="1" in loader.conf the system still randomly freezes.
With the hint set the system runs stable.. not a single freeze has happened after over 24 hours of idling.

So we can add this gen 7 X1 Carbon to the list of systems that need the tweak.
Comment 17 rkoberman 2021-02-01 23:22:40 UTC
(In reply to Marco from comment #16)
This is another 8th Gen processor. So we now have reports of 7th, 8th, and 10th Gen Lenovo systems with this issue. Looks like something that Lenovo is doing (BIOS) is triggering this as there are a LOT of 8th an 9th gen systems out there and the only reports are on Lenovo systems. 

I really hate losing P-State power management, but it's a lot better than having the system lock up.

I don't think disabling P-States is a complete fix. I have had ONE freeze that appears the same (from what little I can tell) in the past 4 days on my 10th Gen L15 system.
Comment 18 Sergei Masharov 2021-02-24 13:56:14 UTC
I have a Gen8 Lenovo. 12.2-RELEASE and several previous releases were working fine, but 13.0-BETA2 and BETA3 freeze at random moments, sometimes even during kernel boot.

hint.hwpstate_intel.0.disabled="1" have helped me.
Comment 19 Sergei Masharov 2021-02-24 14:00:30 UTC
(In reply to Sergei Masharov from comment #18)

---<<BOOT>>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-BETA3 #0 releng/13.0-n244525-150b4388d3b: Fri Feb 19 04:04:34 UTC 2021
    root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)
VT(efifb): resolution 640x480
CPU: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (2112.08-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x806ec  Family=0x6  Model=0x8e  Stepping=12
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7ffafbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x121<LAHF,ABM,Prefetch>
  Structured Extended Features=0x29c6fbf<FSGSBASE,TSCADJ,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,NFPUSG,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PROCTRACE>
  Structured Extended Features3=0xbc000600<MCUOPT,MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0xab<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO,TSX_CTRL>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16255057920 (15502 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <LENOVO TP-N2H  >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
random: unblocking device.
ioapic0 <Version 2.0> irqs 0-119
Launching APs: 1 7 4 3 2 5 6
Timecounter "TSC" frequency 2112082482 Hz quality 1000
KTLS: Initialized 8 threads
random: entropy device external interface
[ath_hal] loaded
WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 13.0.
kbd1 at kbdmux0
000.000062 [4350] netmap_init               netmap: loaded module
mlx5en: Mellanox Ethernet driver 3.6.0 (December 2020)
nexus0
efirtc0: <EFI Realtime Clock>
efirtc0: registered as a time-of-day clock, resolution 1.000000s
cryptosoft0: <software crypto>
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS>
acpi0: <LENOVO TP-N2H>
acpi_ec0: <Embedded Controller: GPE 0x16, ECDT> port 0x62,0x66 on acpi0
acpi0: Power Button (fixed)
unknown: memory range not supported
cpu0: <ACPI CPU> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 24000000 Hz quality 950
Event timer "HPET" frequency 24000000 Hz quality 550
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x2000-0x203f mem 0xe2000000-0xe2ffffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0
vgapci0: Boot video device
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xe4620000-0xe462ffff irq 16 at device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
pci0: <memory, RAM> at device 20.2 (no driver attached)
pci0: <network> at device 20.3 (no driver attached)
pci0: <serial bus> at device 21.0 (no driver attached)
pci0: <serial bus> at device 21.1 (no driver attached)
pci0: <simple comms> at device 22.0 (no driver attached)
pci0: <simple comms, UART> at device 22.3 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 29.0 on pci0
pci1: <ACPI PCI bus> on pcib1
nvme0: <Generic NVMe Device> mem 0xe4500000-0xe4503fff irq 16 at device 0.0 on pci1
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 29.4 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
pcib4: <PCI-PCI bridge> irq 16 at device 0.0 on pci3
pci4: <PCI bus> on pcib4
pcib5: <PCI-PCI bridge> irq 17 at device 1.0 on pci3
pcib6: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci3
pci5: <ACPI PCI bus> on pcib6
xhci1: <XHCI (generic) USB 3.0 controller> mem 0xe0000000-0xe000ffff irq 18 at device 0.0 on pci5
xhci1: 32 bytes context size, 64-bit DMA
usbus1 on xhci1
usbus1: 5.0Gbps Super Speed USB v3.0
pcib7: <PCI-PCI bridge> irq 16 at device 4.0 on pci3
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
hdac0: <Intel Cannon Lake HDA Controller> mem 0xe463c000-0xe463ffff,0xe4400000-0xe44fffff irq 16 at device 31.3 on pci0
pci0: <serial bus> at device 31.5 (no driver attached)
em0: <Intel(R) PRO/1000 Network Connection> mem 0xe4600000-0xe461ffff irq 16 at device 31.6 on pci0
em0: Using 1024 TX descriptors and 1024 RX descriptors
em0: Using an MSI interrupt
em0: Ethernet address: 98:fa:9b:ad:d0:ff
em0: netmap queues/slots: TX 1/1024, RX 1/1024
acpi_button0: <Sleep Button> on acpi0
acpi_lid0: <Control Method Lid Switch> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 13.0.
psm0: model Generic PS/2 mouse, device ID 0
acpi_syscontainer0: <System Container> on acpi0
acpi_acad0: <AC Adapter> on acpi0
battery0: <ACPI Control Method Battery> on acpi0
atrtc0: <AT realtime clock> at port 0x70 irq 8 on isa0
atrtc0: Warning: Couldn't map I/O.
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
atrtc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 12.
est0: <Enhanced SpeedStep Frequency Control> on cpu0
Timecounters tick every 1.000 msec
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
ugen0.1: <0x8086 XHCI root HUB> at usbus0
ugen1.1: <0x8086 XHCI root HUB> at usbus1
uhub0 on usbus1
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
uhub1 on usbus0
uhub1: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
nvd0: <SAMSUNG MZVLB512HBJQ-000L7> NVMe namespace
nvd0: 488386MB (1000215216 512 byte sectors)
hdacc0: <Realtek ALC285 HDA CODEC> at cad 0 on hdac0
hdaa0: <Realtek ALC285 Audio Function Group> at nid 1 on hdacc0
pcm0: <Realtek ALC285 (Analog 3.1+HP/2.0)> at nid 20,23,33 and 25 on hdaa0
hdacc1: <Intel Kaby Lake HDA CODEC> at cad 2 on hdac0
hdaa1: <Intel Kaby Lake Audio Function Group> at nid 1 on hdacc1
pcm1: <Intel Kaby Lake (HDMI/DP 8ch)> at nid 3 on hdaa1
Comment 20 Sergei Masharov 2021-02-26 16:25:48 UTC
Also I have noticed these differences:

With hint.hwpstate_intel.0.disabled="1" sysctl dev.cpu.0 looks exactly the same as 12.2-RELEASE and all releases before:

# sysctl dev.cpu.0
dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
dev.cpu.0.cx_usage_counters: 1151952 0 0
dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 15982us
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_supported: C1/1/1 C2/2/151 C3/3/1034
dev.cpu.0.freq_levels: 2101/15000 2100/15000 1900/13193 1800/12317 1700/11459 1600/10759 1500/9936 1400/9127 1200/7696 1100/6937 1000/6196 800/4888 700/4193 600/3514 500/2850 400/2325
dev.cpu.0.freq: 2101
dev.cpu.0.%parent: acpi0
dev.cpu.0.%pnpinfo: _HID=none _UID=0 _CID=none
dev.cpu.0.%location: handle=\_SB_.PR00
dev.cpu.0.%driver: cpu
dev.cpu.0.%desc: ACPI CPU

but without this option it is completely different:
 
# sysctl dev.cpu.0
dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
dev.cpu.0.cx_usage_counters: 5944 0 0
dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 10276us
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_supported: C1/1/1 C2/2/151 C3/3/1034
dev.cpu.0.freq_levels: 2112/-1
dev.cpu.0.freq: 1005
dev.cpu.0.%parent: acpi0
dev.cpu.0.%pnpinfo: _HID=none _UID=0 _CID=none
dev.cpu.0.%location: handle=\_SB_.PR00
dev.cpu.0.%driver: cpu
dev.cpu.0.%desc: ACPI CPU
Comment 21 Sergei Masharov 2021-03-01 07:47:58 UTC
I have updated to 13.0-BETA4. the problem persist.
One time the system have hanged during kernel boot even with hint.hwpstate_intel.0.disabled="1"

but maybe this hint applied on later steps during the boot
Comment 22 rkoberman 2021-03-01 17:04:08 UTC
I had a second freeze on my system between 23:10 and 23:15, The system was pretty much idle. It was the second freeze in three weeks.

I suspect P-state operations may make the freeze more frequent, but are not the actual cause. If I understand P-States, they operate largely independent of the OS and adjust CPU frequency and voltage, but my knowledge is minimal.
Comment 23 Sreehari S 2021-03-08 03:49:36 UTC
Pretty sure this bug is related/the same thing: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253288
Comment 24 Sreehari S 2021-03-08 04:35:11 UTC
Yeah I can confirm that disabling hardware p-states is *not* an ideal solution to this problem, but it is all we can do until we figure out what difference FreeBSD's hardware P-state handling has that makes it lacking (some kind of edge case that we can find in Linux code or in the actual specification)
Comment 25 Sergei Masharov 2021-03-13 06:06:17 UTC
Just upgraded 13.0 from RC1 to RC2 and the system hanged during the kernel boot, last messages in the console were about ugen0.7 ugen0.8 ugen0.9 ugen0.10

hint.hwpstate_intel.0.disabled="1" is present in /boot/loader.conf

It seems that this hint will be applied at a later stage, in RC1 I saw the similar behavior, but if the kernel was successfully booted, it works perfectly normal with above hint.

Second boot attempt was successful and the system is now working fine with this hint.
Comment 26 rkoberman 2021-03-13 07:01:19 UTC
(In reply to Sergei Masharov from comment #25)
Note that I reported three failures with P-States disabled. Clearly, there can be failures, but they are rare with P-States disabled. I have a 10th gen system not supported on 12, so have been running 13 since before disabling P-States was suggested.
Comment 27 rkoberman 2021-03-13 07:02:00 UTC
(In reply to Sergei Masharov from comment #25)
Note that I reported three failures with P-States disabled. Clearly, there can be failures, but they are rare with P-States disabled. I have a 10th gen system not supported on 12, so have been running 13 since before disabling P-States was suggested.
Comment 28 Sergei Masharov 2021-04-03 13:08:55 UTC
I have updated couple more systems to RC5 and none of them have such strange dev.cpu.0.freq_levels

As I have mentioned before, on the problematic Lenovo system it looks like 
dev.cpu.0.freq_levels: 2112/-1

if I try to change the dev.cpu.0.freq it looks even more strange:
root@lenovo:/etc # sysctl dev.cpu.0.freq=99999
dev.cpu.0.freq: 1005 -> 936
root@lenovo:/etc # sysctl dev.cpu.0.freq=2112
dev.cpu.0.freq: 1005 -> 1005
root@lenovo:/etc # sysctl dev.cpu.0.freq=2112
dev.cpu.0.freq: 1005 -> 4222
root@lenovo:/etc # sysctl dev.cpu.0.freq=2112
dev.cpu.0.freq: 1005 -> 3911
root@lenovo:/etc # sysctl dev.cpu.0.freq=2112
dev.cpu.0.freq: 1005 -> 4222


Is it a normal and expected behavior? Because with hint.hwpstate_intel.0.disabled="1" it looks like on any other system and on the versions before 13.0

If the system is idle it can work up to several hours now, but on any significant CPU load it freezes just several minutes after.