Summary: | Specific intel chips lock up with P-states active | ||
---|---|---|---|
Product: | Base System | Reporter: | Dries Michiels <driesm> |
Component: | kern | Assignee: | freebsd-amd64 (Nobody) <amd64> |
Status: | Closed DUPLICATE | ||
Severity: | Affects Many People | CC: | ati.sharma+freebsd, cem, chris, dacrackerx64, fbsd_bugzilla, grahamperrin, jason, jon, oleg.nauman, pi, rashey, rkoberman, sdalu, serzh, sreeharisreedev1, uqs |
Priority: | --- | Keywords: | regression |
Version: | CURRENT | ||
Hardware: | amd64 | ||
OS: | Any | ||
URL: | https://lists.freebsd.org/pipermail/freebsd-current/2020-September/077050.html |
Description
Dries Michiels
2020-08-14 18:27:06 UTC
Re uploaded the video in better quality. The nvme hick-ups are resolved after disabling trim (probably it was being trashed with trim requests). I still have the random system freezes and have tried to limit the scope of it. ATM the system freezes also occur in single user mode (RO mount of my rootfs). Seeing the same issue on my Lenovo L15 running current on a Comet Lake CPU. Problem occurs in single-user mode and is mitigated by keyboard activity. This is a regression as the problem does not occur on 12.1-RELEASE-P8. Problem has occurred twice during boot, most recently after lo0 came up but before em0 or rtwn were started. Happens when system is busy (compiles on all cores) or idle. GENERIC-NODEBUG kernel except using SCHED_4BSD. Will shortly build GENERIC-SCHED_4BSD. System gets very warm after the freeze and then cools, probably due to the firmware slowing the processor. This seems to indicate a possible loop in the kernel. As long as I keep the keyboard reasonably active, the system does not seem to hang. While several times the freeze occurred when I was not closely watching the system, I am not aware of the system staying up longer than 10 minutes. When running in graphic mode (X11/MATE), it has frozen when I was typing on at least one occasion. This could have been simple bad luck. Seems somewhat more stable when in graphic mode. From dmesg: FreeBSD 13.0-CURRENT #2 r365481M: Tue Sep 8 20:16:02 PDT 2020 root@ptavv:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86)reeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-rc2-0-g414f32a9e86) VT(efifb): resolution 1920x1080 CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (2112.11-MHz K8-class CPU) Origin="GenuineIntel" Id=0xa0660 Family=0x6 Model=0xa6 Stepping=0 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x121<LAHF,ABM,Prefetch> Structured Extended Features=0x29c67af<FSGSBASE,TSCADJ,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,NFPUSG,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PROCTRACE> Structured Extended Features3=0xbc000400<MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> IA32_ARCH_CAPS=0x2b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 4294967296 (4096 MB) avail memory = 3746226176 (3572 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <LENOVO TP-R17 > I can confirm that this summarizes my issue perfectly. I've continued to analyze the problem. Don't know if this will help track it soen, but I have noted the following: System is substantially more stable under X (Mate) than under just VT. Once I start Mate, I often have the system stay up and running for over an hour. When X11 is not running, I have not seen the system stay operational for over 10 minutes when the keyboard is not active. Also, when working on X, I have had the system lock up even when the keyboard is active. I have not tested the reliability of the system when using a vty while X is also running. The freeze is not instantaneous. With my system monitor running and a bulk disk to disk data transfer running (rsync of 190GB of media files) from a USB disk to the system disk, I note that the transfer slow dramatically a few seconds before the complete freeze. The write rate slowly declines from over 50MBps to zero. When it reaches zero, the system may be barely alive. I have managed to do a sync(1) once or twice which greatly reduces the number of corrections needed by fsck when I reboot. Even when the keyboard stops responding, my system monitor (gkrellm) will continue to update for a few seconds and on a couple of occasions, Ctrl-C to a frozen process caused the system to become responsive again for a couple of seconds with the system monitor updating and commands typed but not even echoed to appear in terminal windows. My system has only 4B of RAM, so is rather restricted. Have not been able to even try running a VM on it. I have 16 G on order, but lost in the mail. The system ahd a WD Black 2TB drive. Could a drive issue be at the root? The initial report wee also on a fairly recent Lenovo system. Could a bad disk batch be the trigger? But, if it is a disk issue, why no problems on 12.1? I had an epiphany yesterday and may have figured something that may help track this down. It appears that this is likely tied to the system disk which is an ATAPI (SATA) drive. I can keep the system from freezing indefinitely by either typing or moving the cursor. I realized that these two devices (kbd and psm) are , to the best of my knowledge, the last physical devices still GIANT locked ad the last ISA devices, as well. It is my suspicion that the GIANT lock happening now and then causes something to clear out that will eventually livelock the system. I previously noted that the disk transfer rate would deteriorate over time, leading to a livelock. I can now report that, if I see the transfer rate declining, I can suspend the job and, after a few seconds, the system returns to normal. If I let the problem continue for more than a few seconds, the keyboard will be locked up and the system will be livelocked and require a power down. I am unable to un-tar teh firefox source tarball or any other large tarball. Even with typing or moving the mouse, the transfer rate will start declining. I can suspend tat, but it sems to start declining again very soon whne the tar is resumed and I was unable to complete the restore. I have seen similar behavior with other large tarballs. Oddly, when copying from USB disk to system disk, I see similar issues, but a suspend seems to allow then to return to full speed for a while upon resume. Copies on the system disk to the system disk seem to be the worst problem. It also appears that an inactive system usually does not lock up. I can boot to the single-user or the login prompt and the system will remain usable for a long time. It does seem to eventually lock up, but it can take hours. It had never locked until I left it at the single-user prompt after it finished fscking the system. When I get back to the system today, after about 13 hours, it was frozen. Any suggestions on tracking this down would really be appreciated as this system is replacing an old one which has a failing fan and may become useless at any time. This has been going on for quite a while. Always Lenovo systems running CURRENT, but different models (X1 Carbon, T490, and L15). I'd really like to know if all are running the similar processors. Can others post their CPU model? On a more human readable level, it's a 10th Gen. i5-10210U CPU (1.60 GHz) and a Comet Lake graphics processor. I previously reported a disk slow-down. This turned out to be a bad disk drive. The disk was literally slowing down. A hot smell led to tracking that down. It's been replaced and the slow-down is gone. Chipping in for my ThinkPad X1 Carbon 7th gen (type 20QDCTO1WW) which has the following: CPU : i7-8665U (1.90 Ghz) Intel UHD Graphics 620 (Whiskey Lake) ACPI APIC table: <LENOVO TP-N2H > Have been trying the latest 13-CURRENT AMD64 memstick images but all of them have the same behavior where the system just freezes at random intervals using the default English keyboard mapping whilst it being idle. Sometimes it can run for 7 hours idle whereas I've also seen it locking up after a minute after just started the live CD. Downgrading to the latest 12.2-STABLE memstick.img (r368787) has helped. System hasn't locked up/froze yet after more than 24 hours of idle running. My Laptop T490 CPU : i5-8365U (1.60 Ghz) Intel UHD Graphics 620 (Whiskey Lake) ACPI APIC table: <LENOVO TP-N2I > Meanwhile I found this wiki page: https://wiki.freebsd.org/Laptops/Thinkpad_X1_Carbon Which mentions that 13-CURRENT freezes what appears to be a random amount of time. Possible workaround hint.hwpstate_intel.0.disabled="1" in loader.conf. I'm testing this workaround as we speak. So, two Whiskey Lake systems. Mine is a Comet Lake and I am now have a verified hardware issue with the keyboard, so I suspect is that my system, while demonstrating similar symptoms, is actually entirely a hardware issue. The main board and disk have been replaced and I will be sending it in to get the keyboard/mousepad replaced. I will update if that turns out to be the issue. It is conceivable that there is an unusual keyboard failure mode that might impact other systems. Did you ever test 12-STABLE? That is how I ruled out a hardware issue. My system was perfectly stable on 12-STABLE but not on 13-CURRENT. Marco has observed the same. I did try 12.2 and did not see this issue, but I tried it only briefly as I discovered that 12.2 did not support my graphics. As the problem has deteriorated over time, it may be just luck that I did not see a problem during the short time I had 12.2 running. When I get my system back, I plan to install 12.2 and update to 12-STABLE immediately. I can't do anything until then as I generally can't even break into BIOS Setup or boot to single user. On the one case where I made it into BIOS setup, it prompted me to reload default configuration and I discovered that neither the mousepad or the TrackPoint worked. I could move the pointer for a couple of seconds and then it jumped back to the lower left corner of the screen. It is, of course, possible that 13 is sensitive to the keyboard/mousepad issue and 12.2 didn't hang even though I would still call that a regression. (In reply to Dries Michiels from comment #9) How did your test with hint.hwpstate_intel.0.disabled="1" in loader.conf go? Based on the info on https://gist.github.com/AnotherKamila/c14c3ebd66ac6a25c0193f8e103e66e3 they speak of running 13-CURRENT with hint.hwpstate_intel.0.disabled="1" as a workaround on both a gen7 and gen8 X1 Carbon without it hard-wedging. I have not ran into any freezes yet, so for now I'd say successful. My L15 arrived today with a new keyboard/mousepad. I set hint.hwpstate_intel.0.disabled="1" in loader.conf and I have has no freezes. I am currently updating the system to 13-STABLE and, once that is done and the system is stable, I will remove the hint and see if freezes reappear. If they do, we can add my 10th gen i5 to systems that need this tweek. I'm also into look into just what this hint does. Certainly hardware issues can lock up the system. I did the src upgrade from the 12.2-STABLE snapshot to 13.0-ALPHA3 #0 stable/13-c256220-g76dd854f47f4 $ dmesg | head -50 ---<<BOOT>>--- Copyright (c) 1992-2021 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.0-ALPHA3 #0 stable/13-c256220-g76dd854f47f4: Sun Jan 31 15:10:40 UTC 2021 root@harbinger.fritz.box:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe) VT(efifb): resolution 3840x2160 CPU: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (2112.08-MHz K8-class CPU) Origin="GenuineIntel" Id=0x806ec Family=0x6 Model=0x8e Stepping=12 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7ffafbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x121<LAHF,ABM,Prefetch> Structured Extended Features=0x29c6fbf<FSGSBASE,TSCADJ,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,NFPUSG,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PROCTRACE> Structured Extended Features3=0xbc000600<MCUOPT,MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> IA32_ARCH_CAPS=0xab<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO,TSX_CTRL> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 17179869184 (16384 MB) avail memory = 16221478912 (15470 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <LENOVO TP-N2H > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads Without hint.hwpstate_intel.0.disabled="1" in loader.conf the system still randomly freezes. With the hint set the system runs stable.. not a single freeze has happened after over 24 hours of idling. So we can add this gen 7 X1 Carbon to the list of systems that need the tweak. (In reply to Marco from comment #16) This is another 8th Gen processor. So we now have reports of 7th, 8th, and 10th Gen Lenovo systems with this issue. Looks like something that Lenovo is doing (BIOS) is triggering this as there are a LOT of 8th an 9th gen systems out there and the only reports are on Lenovo systems. I really hate losing P-State power management, but it's a lot better than having the system lock up. I don't think disabling P-States is a complete fix. I have had ONE freeze that appears the same (from what little I can tell) in the past 4 days on my 10th Gen L15 system. I have a Gen8 Lenovo. 12.2-RELEASE and several previous releases were working fine, but 13.0-BETA2 and BETA3 freeze at random moments, sometimes even during kernel boot. hint.hwpstate_intel.0.disabled="1" have helped me. (In reply to Sergei Masharov from comment #18) ---<<BOOT>>--- Copyright (c) 1992-2021 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 13.0-BETA3 #0 releng/13.0-n244525-150b4388d3b: Fri Feb 19 04:04:34 UTC 2021 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe) VT(efifb): resolution 640x480 CPU: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (2112.08-MHz K8-class CPU) Origin="GenuineIntel" Id=0x806ec Family=0x6 Model=0x8e Stepping=12 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7ffafbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x121<LAHF,ABM,Prefetch> Structured Extended Features=0x29c6fbf<FSGSBASE,TSCADJ,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,NFPUSG,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PROCTRACE> Structured Extended Features3=0xbc000600<MCUOPT,MD_CLEAR,IBPB,STIBP,L1DFL,ARCH_CAP,SSBD> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> IA32_ARCH_CAPS=0xab<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO,TSX_CTRL> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 17179869184 (16384 MB) avail memory = 16255057920 (15502 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <LENOVO TP-N2H > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" random: unblocking device. ioapic0 <Version 2.0> irqs 0-119 Launching APs: 1 7 4 3 2 5 6 Timecounter "TSC" frequency 2112082482 Hz quality 1000 KTLS: Initialized 8 threads random: entropy device external interface [ath_hal] loaded WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 13.0. kbd1 at kbdmux0 000.000062 [4350] netmap_init netmap: loaded module mlx5en: Mellanox Ethernet driver 3.6.0 (December 2020) nexus0 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s cryptosoft0: <software crypto> aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS> acpi0: <LENOVO TP-N2H> acpi_ec0: <Embedded Controller: GPE 0x16, ECDT> port 0x62,0x66 on acpi0 acpi0: Power Button (fixed) unknown: memory range not supported cpu0: <ACPI CPU> on acpi0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 24000000 Hz quality 950 Event timer "HPET" frequency 24000000 Hz quality 550 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0x2000-0x203f mem 0xe2000000-0xe2ffffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0 vgapci0: Boot video device xhci0: <XHCI (generic) USB 3.0 controller> mem 0xe4620000-0xe462ffff irq 16 at device 20.0 on pci0 xhci0: 32 bytes context size, 64-bit DMA usbus0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci0: <memory, RAM> at device 20.2 (no driver attached) pci0: <network> at device 20.3 (no driver attached) pci0: <serial bus> at device 21.0 (no driver attached) pci0: <serial bus> at device 21.1 (no driver attached) pci0: <simple comms> at device 22.0 (no driver attached) pci0: <simple comms, UART> at device 22.3 (no driver attached) pcib1: <ACPI PCI-PCI bridge> irq 16 at device 29.0 on pci0 pci1: <ACPI PCI bus> on pcib1 nvme0: <Generic NVMe Device> mem 0xe4500000-0xe4503fff irq 16 at device 0.0 on pci1 pcib2: <ACPI PCI-PCI bridge> irq 16 at device 29.4 on pci0 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2 pci3: <ACPI PCI bus> on pcib3 pcib4: <PCI-PCI bridge> irq 16 at device 0.0 on pci3 pci4: <PCI bus> on pcib4 pcib5: <PCI-PCI bridge> irq 17 at device 1.0 on pci3 pcib6: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci3 pci5: <ACPI PCI bus> on pcib6 xhci1: <XHCI (generic) USB 3.0 controller> mem 0xe0000000-0xe000ffff irq 18 at device 0.0 on pci5 xhci1: 32 bytes context size, 64-bit DMA usbus1 on xhci1 usbus1: 5.0Gbps Super Speed USB v3.0 pcib7: <PCI-PCI bridge> irq 16 at device 4.0 on pci3 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 hdac0: <Intel Cannon Lake HDA Controller> mem 0xe463c000-0xe463ffff,0xe4400000-0xe44fffff irq 16 at device 31.3 on pci0 pci0: <serial bus> at device 31.5 (no driver attached) em0: <Intel(R) PRO/1000 Network Connection> mem 0xe4600000-0xe461ffff irq 16 at device 31.6 on pci0 em0: Using 1024 TX descriptors and 1024 RX descriptors em0: Using an MSI interrupt em0: Ethernet address: 98:fa:9b:ad:d0:ff em0: netmap queues/slots: TX 1/1024, RX 1/1024 acpi_button0: <Sleep Button> on acpi0 acpi_lid0: <Control Method Lid Switch> on acpi0 acpi_tz0: <Thermal Zone> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] WARNING: Device "psm" is Giant locked and may be deleted before FreeBSD 13.0. psm0: model Generic PS/2 mouse, device ID 0 acpi_syscontainer0: <System Container> on acpi0 acpi_acad0: <AC Adapter> on acpi0 battery0: <ACPI Control Method Battery> on acpi0 atrtc0: <AT realtime clock> at port 0x70 irq 8 on isa0 atrtc0: Warning: Couldn't map I/O. atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 atrtc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 12. est0: <Enhanced SpeedStep Frequency Control> on cpu0 Timecounters tick every 1.000 msec ZFS filesystem version: 5 ZFS storage pool version: features support (5000) ugen0.1: <0x8086 XHCI root HUB> at usbus0 ugen1.1: <0x8086 XHCI root HUB> at usbus1 uhub0 on usbus1 uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1 uhub1 on usbus0 uhub1: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 nvd0: <SAMSUNG MZVLB512HBJQ-000L7> NVMe namespace nvd0: 488386MB (1000215216 512 byte sectors) hdacc0: <Realtek ALC285 HDA CODEC> at cad 0 on hdac0 hdaa0: <Realtek ALC285 Audio Function Group> at nid 1 on hdacc0 pcm0: <Realtek ALC285 (Analog 3.1+HP/2.0)> at nid 20,23,33 and 25 on hdaa0 hdacc1: <Intel Kaby Lake HDA CODEC> at cad 2 on hdac0 hdaa1: <Intel Kaby Lake Audio Function Group> at nid 1 on hdacc1 pcm1: <Intel Kaby Lake (HDMI/DP 8ch)> at nid 3 on hdaa1 Also I have noticed these differences: With hint.hwpstate_intel.0.disabled="1" sysctl dev.cpu.0 looks exactly the same as 12.2-RELEASE and all releases before: # sysctl dev.cpu.0 dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc dev.cpu.0.cx_usage_counters: 1151952 0 0 dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 15982us dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_supported: C1/1/1 C2/2/151 C3/3/1034 dev.cpu.0.freq_levels: 2101/15000 2100/15000 1900/13193 1800/12317 1700/11459 1600/10759 1500/9936 1400/9127 1200/7696 1100/6937 1000/6196 800/4888 700/4193 600/3514 500/2850 400/2325 dev.cpu.0.freq: 2101 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=none _UID=0 _CID=none dev.cpu.0.%location: handle=\_SB_.PR00 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU but without this option it is completely different: # sysctl dev.cpu.0 dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc dev.cpu.0.cx_usage_counters: 5944 0 0 dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 10276us dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_supported: C1/1/1 C2/2/151 C3/3/1034 dev.cpu.0.freq_levels: 2112/-1 dev.cpu.0.freq: 1005 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=none _UID=0 _CID=none dev.cpu.0.%location: handle=\_SB_.PR00 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU I have updated to 13.0-BETA4. the problem persist. One time the system have hanged during kernel boot even with hint.hwpstate_intel.0.disabled="1" but maybe this hint applied on later steps during the boot I had a second freeze on my system between 23:10 and 23:15, The system was pretty much idle. It was the second freeze in three weeks. I suspect P-state operations may make the freeze more frequent, but are not the actual cause. If I understand P-States, they operate largely independent of the OS and adjust CPU frequency and voltage, but my knowledge is minimal. Pretty sure this bug is related/the same thing: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253288 Yeah I can confirm that disabling hardware p-states is *not* an ideal solution to this problem, but it is all we can do until we figure out what difference FreeBSD's hardware P-state handling has that makes it lacking (some kind of edge case that we can find in Linux code or in the actual specification) Just upgraded 13.0 from RC1 to RC2 and the system hanged during the kernel boot, last messages in the console were about ugen0.7 ugen0.8 ugen0.9 ugen0.10 hint.hwpstate_intel.0.disabled="1" is present in /boot/loader.conf It seems that this hint will be applied at a later stage, in RC1 I saw the similar behavior, but if the kernel was successfully booted, it works perfectly normal with above hint. Second boot attempt was successful and the system is now working fine with this hint. (In reply to Sergei Masharov from comment #25) Note that I reported three failures with P-States disabled. Clearly, there can be failures, but they are rare with P-States disabled. I have a 10th gen system not supported on 12, so have been running 13 since before disabling P-States was suggested. (In reply to Sergei Masharov from comment #25) Note that I reported three failures with P-States disabled. Clearly, there can be failures, but they are rare with P-States disabled. I have a 10th gen system not supported on 12, so have been running 13 since before disabling P-States was suggested. I have updated couple more systems to RC5 and none of them have such strange dev.cpu.0.freq_levels As I have mentioned before, on the problematic Lenovo system it looks like dev.cpu.0.freq_levels: 2112/-1 if I try to change the dev.cpu.0.freq it looks even more strange: root@lenovo:/etc # sysctl dev.cpu.0.freq=99999 dev.cpu.0.freq: 1005 -> 936 root@lenovo:/etc # sysctl dev.cpu.0.freq=2112 dev.cpu.0.freq: 1005 -> 1005 root@lenovo:/etc # sysctl dev.cpu.0.freq=2112 dev.cpu.0.freq: 1005 -> 4222 root@lenovo:/etc # sysctl dev.cpu.0.freq=2112 dev.cpu.0.freq: 1005 -> 3911 root@lenovo:/etc # sysctl dev.cpu.0.freq=2112 dev.cpu.0.freq: 1005 -> 4222 Is it a normal and expected behavior? Because with hint.hwpstate_intel.0.disabled="1" it looks like on any other system and on the versions before 13.0 If the system is idle it can work up to several hours now, but on any significant CPU load it freezes just several minutes after. Quick question... Has anyone seen this problem seen it on other than Lenovo systems? It looks like all reports are on Lenovo systems, though some don't call out the manufacturer. If this is limited to Lenovo, it is likely tie to their BIOS ACPI, or EC. I bing this up because there was an issue several year ago when a combination of C-States and CPU frequency adjustments under TCC could freeze my Lenovo. It occurred at low-C-state (C3 or slower) and a very low CPU frequency. This was also Lenovo and "fixed" by disabling TCC. I should also note that turnng off TCC on my L15 did not help. I have only owned ThinkPads but I'll try disabling TCC. I don't recall which BIOS I'm currently on but haven't updated since posting my issue on the forums. I'm currently running 13.0-STABLE #5 stable/13-n245210-3bec9180c9e7 without powerdxx or any other power settings in rc.conf for that matter and my X1 Carbon 7th gen feels pretty warm to the touch. With coretemp loaded 'sysctl dev.cpu.{0-7}.temperature is sitting above 60.*C most of the time. Are you guys running your systems with any power settings despite not being able to use P-states? (In reply to Marco from comment #30) I was posting just to point out that some power management combinations can cause system lockups, not that I thought TCC was necessarily the cause of this issue. Also, the TCC issue was not Lenovo specific to the best of my recollection. Still, it's worth a try. I'd love to have working P-states. I'm reading in: https://wiki.freebsd.org/TuningPowerConsumption "Both ACPI and P4TCC throttling are now disabled by default in new installations" ^Triage: reclassify to see if this gathers more attention this way. I've recently purchased a T490 the system will crash without hint.hwpstate_intel.0.disabled=1 added to /boot/loader.conf fortunately somehow I made it through the install of 13.0 release! I've been able to get KDE5 to wake up, sleep, auto connect to Wlan, even played a bit with and connected to some bluetooth devices. I've switched away from quarterly to current pkg repo and via using boot environments, I've done several pkg upgrades without any issues. Local package database: Installed packages: 1175 Disk space occupied: 11 GiB I am a bit confused as this very new hardware has such unbelievable support added over the last few releases that some strange issue has not yet been addressed in 13.x Thank You everyone hopefully this issue will soon be tracked down. Kind Regards! FreeBSD t490 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 CPU: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz (2112.11-MHz K8-class CPU) CPU microcode: updated from 0xd6 to 0xea FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs real memory = 34359738368 (32768 MB) hostb0@pci0:0:0:0: class=0x060000 rev=0x0c hdr=0x00 vendor=0x8086 device=0x3e34 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Coffee Lake HOST and DRAM Controller' class = bridge subclass = HOST-PCI vgapci0@pci0:0:2:0: class=0x030000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x3ea0 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'WhiskeyLake-U GT2 [UHD Graphics 620]' class = display subclass = VGA none0@pci0:0:4:0: class=0x118000 rev=0x0c hdr=0x00 vendor=0x8086 device=0x1903 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem' class = dasp none1@pci0:0:8:0: class=0x088000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x1911 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model' class = base peripheral pchtherm0@pci0:0:18:0: class=0x118000 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9df9 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP Thermal Controller' class = dasp xhci0@pci0:0:20:0: class=0x0c0330 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9ded subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP USB 3.1 xHCI Controller' class = serial bus subclass = USB none2@pci0:0:20:2: class=0x050000 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9def subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP Shared SRAM' class = memory subclass = RAM iwm0@pci0:0:20:3: class=0x028000 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9df0 subvendor=0x8086 subdevice=0x0030 vendor = 'Intel Corporation' device = 'Cannon Point-LP CNVi [Wireless-AC]' class = network ig4iic0@pci0:0:21:0: class=0x0c8000 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9de8 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP Serial IO I2C Controller' class = serial bus none3@pci0:0:22:0: class=0x078000 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9de0 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP MEI Controller' class = simple comms pcib1@pci0:0:28:0: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0x9db8 subvendor=0x0000 subdevice=0x0000 vendor = 'Intel Corporation' device = 'Cannon Point-LP PCI Express Root Port' class = bridge subclass = PCI-PCI pcib2@pci0:0:28:4: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0x9dbc subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP PCI Express Root Port' class = bridge subclass = PCI-PCI pcib7@pci0:0:29:0: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0x9db0 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP PCI Express Root Port' class = bridge subclass = PCI-PCI pcib8@pci0:0:29:4: class=0x060400 rev=0xf0 hdr=0x01 vendor=0x8086 device=0x9db4 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP PCI Express Root Port' class = bridge subclass = PCI-PCI isab0@pci0:0:31:0: class=0x060100 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9d84 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP LPC Controller' class = bridge subclass = PCI-ISA hdac0@pci0:0:31:3: class=0x040380 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9dc8 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP High Definition Audio Controller' class = multimedia subclass = HDA none4@pci0:0:31:4: class=0x0c0500 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9da3 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP SMBus Controller' class = serial bus subclass = SMBus none5@pci0:0:31:5: class=0x0c8000 rev=0x30 hdr=0x00 vendor=0x8086 device=0x9da4 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'Cannon Point-LP SPI Controller' class = serial bus pcib3@pci0:2:0:0: class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x15c0 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016]' class = bridge subclass = PCI-PCI pcib4@pci0:3:0:0: class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x15c0 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016]' class = bridge subclass = PCI-PCI pcib5@pci0:3:1:0: class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x15c0 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016]' class = bridge subclass = PCI-PCI pcib6@pci0:3:2:0: class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 device=0x15c0 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'JHL6240 Thunderbolt 3 Bridge (Low Power) [Alpine Ridge LP 2016]' class = bridge subclass = PCI-PCI none6@pci0:4:0:0: class=0x088000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x15bf subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'JHL6240 Thunderbolt 3 NHI (Low Power) [Alpine Ridge LP 2016]' class = base peripheral xhci1@pci0:58:0:0: class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086 device=0x15c1 subvendor=0x17aa subdevice=0x2279 vendor = 'Intel Corporation' device = 'JHL6240 Thunderbolt 3 USB 3.1 Controller (Low Power) [Alpine Ridge LP 2016]' class = serial bus subclass = USB nvme0@pci0:61:0:0: class=0x010802 rev=0x03 hdr=0x00 vendor=0x8086 device=0xf1a6 subvendor=0x8086 subdevice=0x390b vendor = 'Intel Corporation' device = 'SSD Pro 7600p/760p/E 6100p Series' class = mass storage subclass = NVM it looks like the issue was fixed in 13.1-BETA1 How did you verify this? (In reply to Sergei Masharov from comment #35) Is this just a case of "Now it works" or is there some indication of a fix? In any case, I'll update to BETA-1 and give it a go. (In reply to Dries Michiels from comment #36) I have commented two lines below from device.hints and the system is running as it should. also the dev.cpu.0.freq_levels shows the sane values #hint.acpi_throttle.0.disabled="1" #hint.p4tcc.0.disabled="1" My bad, the issue still exist in the 13.1-BETA1 I completely forgot about hint.hwpstate_intel.0.disabled=1 in /boot/loader.conf This crash/lock up also happens on my Thinkpad X260 on FreeBSD 13.1-RELEASE amd64. If I close the lid on AC power, everything works fine. If I close the lid on battery power, the system immediately locks up when I close it. Adding "hint.hwpstate_intel.0.disabled=1" to /boot/loader.conf was the only thing that "fixed" it. *** This bug has been marked as a duplicate of bug 253288 *** |