I run FreeBSD on an intel atom D525MWV board. On recent FreeBSD versions, I am observing unusual load averages reported by 'top' even when 100% idle. It happens in a normal system, in single user mode, and on official usb install images. I have not yet confirmed if the load is real or if it's just kernel reporting the values wrong. Here are kernels that I have tried. Four of them are from the currently available usb memstick installers; the 2011 one is a version I've been successfully using for a year until downgrading recently. 8.3-RELEASE amd64 2012-04-09 -> load 0.02 after 5 minutes idle 9.0-RELEASE amd64 2012-01-03 -> load 0.02 after 5 minutes idle 10.0-CURRENT amd64 2011-11-12 (cvs) -> load 0.02 after 5 minutes idle 9.1-RC3 amd64 2012-10-30 (r242324) -> load 0.60 after 5 minutes idle 10.0-CURRENT amd64 2012-11-02 (r242464) -> load 0.60 after 5 minutes idle How-To-Repeat: 1) boot system on affected hardware(?) 2) leave 'top' running for a few minutes 3) observe load averages
A night without sleep and a chain of manual bisection steps later: stable/9 r232058 2012-02-23 - pass stable/9 r232129 2012-02-24 - fail (will continue to narrow it down later)
The problem first occurs on 232086 (original head revision 231161) http://svnweb.freebsd.org/base?view=revision&revision=232086 It has some acpi-related and tick source-related changes regarding their probing order/priority. I observed that this does indeed change the way they are reported in /var/log/messages. In particular, the order changes and they're initalized way earlier in the startup process. Diff attached.
The offending revision where the issue can first be observed in stable-9 is r232086 (MFC: r231161) - http://svnweb.freebsd.org/base?view=revision&revision=232086 The consequence of that change can be observed in /var/log/messages - the timer sources get initialized way earlier and in different order. Diff attached.
Hi! I'm getting the exact same issue. Upgraded from FreeBSD 9.0 to FreeBSD 9.1 and my load avg jumped from 0.01 to 0.60, while the server is doing pretty much nothing at all. I'm also seeing the timer differences in dmesg as described earlier in the bug report. # sysctl -a | egrep "kern.event|clock" kern.clockrate: { hz = 1000, tick = 1000, profhz = 8126, stathz = 127 } kern.eventtimer.choice: HPET(450) HPET1(440) HPET2(440) LAPIC(400) i8254(100) RTC(0) kern.eventtimer.et.LAPIC.flags: 15 kern.eventtimer.et.LAPIC.frequency: 0 kern.eventtimer.et.LAPIC.quality: 400 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.HPET.flags: 3 kern.eventtimer.et.HPET.frequency: 14318180 kern.eventtimer.et.HPET.quality: 450 kern.eventtimer.et.HPET1.flags: 3 kern.eventtimer.et.HPET1.frequency: 14318180 kern.eventtimer.et.HPET1.quality: 440 kern.eventtimer.et.HPET2.flags: 3 kern.eventtimer.et.HPET2.frequency: 14318180 kern.eventtimer.et.HPET2.quality: 440 kern.eventtimer.periodic: 0 kern.eventtimer.timer: HPET kern.eventtimer.activetick: 1 kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 debug.acpi.reset_clock: 1 debug.clocktime: 0 hw.clockrate: 2133 machdep.wall_cmos_clock: 0 dev.atrtc.0.%desc: AT realtime clock # dmesg | more Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 CPU: Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz (2133.45-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6f2 Family = 6 Model = f Stepping = 2 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant, performance statistics real memory = 4296015872 (4097 MB) avail memory = 4101558272 (3911 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <COMPAQ GLENWOOD> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 1 ioapic0 <Version 2.0> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <HPQOEM SLIC-WKS> on motherboard acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, dff00000 (3) failed cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 hpet1: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Event timer "HPET" frequency 14318180 Hz quality 450 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 device_attach: hpet0 attach returned 12 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xf808-0xf80b on acpi0 Regards Johan
For my system's kernel I just undid that one revision I mentioned earlier. No ill-effects observed. Since the change is so low-level and there is no rationale provided in the commit message, I do not know what improvement it was supposed to achieve. I tried e-mailing the author of that commit, but got no response. Also, due to the low-power nature of my system, I can't test if the 0.60 load is real, or just a by-product of broken time accounting. Either way it might be affecting the process scheduler.
Hi! I see. I got around the problem in FreeBSD 9.1 by changing the clock source. Like this: # sysctl -w kern.eventtimer.timer=LAPIC The load avg dropped to 0.00 again. You can try the different clock sources available to you and see what works best. Some sources might increase the number of interrupts (like the RTC) and some might cause more context switching or CPU load. For me the LAPIC works best. You can list your clock sources using: # sysctl kern.eventtimer.choice From what I understand, measuring time can be tricky because new systems can regulate core frequency on the fly (in HW) and virtualization also increases the difficulty... It seems the order and/or weight of the clock sources has changed. When I have a chance I will reboot into the old kernel and see what has been changed. I'm totally new to FreeBSD so I don't know the normal handling of these bugs but I'll put the author of the patch on cc as well :) Good luck! Johan
I'm seeing the same issues in FreeBSD 10-RELEASE. A VM has a load avg of about 0.5, a physical install will show ~0.3 load average. My tests were done in single user mode even. https://forums.freebsd.org/viewtopic.php?f=4&t=44600
(Warning: below is a long comment with a large amount of detail. I am CC'ing mav@ because I believe this may have to do with his eventtimer changes made sometime in 2012[1]. I'll provide some information that may greatly help narrow down the root cause near the end). I reported this problem, specific to stable/10 (i.e. the issue I see may be different than the one reported here), back in June 2014[2]. A visual representation of the difference between stable/10 and stable/9 on said hardware (bare-metal, not virtualised) is available[3]. The system in question still runs stable/9 (r289992) today without any load oddities. All of my KTR details and so on from that system are available here[4], where Adrian Chadd attempted to help me figure out what was going on. The mail thread (see [1] again) consisted of many people chiming in with useful details. At some point during this thread, maybe not publicly (I would need to review my mail archives), someone (John Baldwin? I'm not certain, my memory is not as good as it used to be) informed me that stable/10 had its load calculation algorithm changed. Since July 2014, I have been able to reproduce this problem with stable/10 on several different types of systems: - Supermicro X7SBA (bare metal) - VMware Workstation 11.x (FreeBSD stable/10 is a guest) - VPS provider Vultr.com (using QEMU) (FreeBSD stable/10 is a guest) There are, however, some individuals running stable/10 (e.g. a friend of mine using FreeNAS) on bare-metal hardware that do not experience this problem. Today I spent some time on a stable/10 (r289827) guest under VMware Workstation 11.x where I was able to reproduce the issue and found something surprising: kern.eventtimer.periodic=1 completely alleviates the problem -- the instant this sysctl is set to 1 ("periodic mode") the load begins to plummet and eventually hits 0.00, only increasing when there is "real" load. You can even set kern.eventtimer.periodic=0 ("one-shot") maybe 5-6 minutes after that and still see the system load at 0.00, although over the next 10+ minutes it will begin to gradually increase again. The downside/problem to using "periodic mode": vmstat -i shows a cpuX:timer rate of almost 2000 on both CPU cores, which if I understand eventtimers(4) correctly, would indicate that extremely high interrupt usage (in my case, LAPIC, i.e. local APIC). I just find it very interesting that somehow kern.eventtimer.periodic=1 is able to alleviate the issue. This leads me to wonder if the issue is with eventtimers(4) changes, or with the load calculation algorithm (or some "intermediary" piece that isn't taking into consideration certain things). On the VM in question, the eventtimer choices are: root@test-freebsd:~ # sysctl -a | grep kern.eventtimer kern.eventtimer.periodic: 0 kern.eventtimer.timer: LAPIC kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0) kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.LAPIC.frequency: 33000101 kern.eventtimer.et.LAPIC.flags: 7 On a Vultr.com VPS (QEMU), the eventtimer choices are (according to stable/9, which I use there): kern.eventtimer.singlemul: 2 kern.eventtimer.idletick: 0 kern.eventtimer.activetick: 1 kern.eventtimer.timer: LAPIC kern.eventtimer.periodic: 0 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.LAPIC.quality: 400 kern.eventtimer.et.LAPIC.frequency: 500017353 kern.eventtimer.et.LAPIC.flags: 15 kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0) And finally on the aforementioned Supermicro X7SBA box, the eventtimer choices (per stable/9) are: kern.eventtimer.singlemul: 2 kern.eventtimer.idletick: 0 kern.eventtimer.activetick: 1 kern.eventtimer.timer: LAPIC kern.eventtimer.periodic: 0 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.LAPIC.quality: 400 kern.eventtimer.et.LAPIC.frequency: 166681842 kern.eventtimer.et.LAPIC.flags: 15 kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0) Per hpet(4) I have tried the following in /boot/loader.conf on all of the above VM systems to no avail (probably because they do not use HPET): hint.hpet.X.legacy_route=1 hint.attimer.0.clock=0 hint.atrtc.0.clock=0 Another thing I found interesting on stable/10 under the VMware VM was that this message is printed: Event timer "HPET" frequency 14318180 Hz quality 550 Yet the eventtimer is no where to be found. Anyway, this problem has come up *repeatedly* on multiple FreeBSD mailing lists and on the forums yet has gotten no traction[5][6][7][8][9][10]. I feel, respectively, that this has gone on long enough. This problem greatly diminishes the feasibility of using FreeBSD as a server operating system (when you can no longer trust or rely on the load average, what can you believe?). End-users and even skilled/senior administrators do not know how to diagnose or troubleshoot + rectify this problem given where it lies in kernel space. Please remember: the problem is 100% reproducible. I am happy to pay for a low-end Vultr.com dedicated VPS (includes console access as long as you have a web browser w/ Javascript) and/or set up a dedicated VMware Workstation VM (w/ remote VNC console) for any developer which wants to take it on, on my own dime. I cannot provide the aforementioned Supermicro X7SBA box for testing (it's my FreeBSD box and must remain usable). Alexander (mav@) -- what can we provide you that can help narrow this down? P.S. -- The Importance field needs to be changed from Affects Only Me to Affects Some People, as this is definitely not specific to just Viktor. Also, I would strongly suggest changing Depends On to reference bug 192315, which references eventtimer issues (but may be a different problem -- I'll let mav@ decide). [1]: https://lists.freebsd.org/pipermail/freebsd-net/2012-April/031893.html [2]: https://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079386.html [3]: http://jdc.koitsu.org/freebsd/releng10_perf_issue/loadavg_comparison.png [4]: http://jdc.koitsu.org/freebsd/releng10_perf_issue/ [5]: https://lists.freebsd.org/pipermail/freebsd-virtualization/2014-October/002835.html [6]: https://forums.freebsd.org/threads/high-idle-load-in-vps.52688/ [7]: https://forums.freebsd.org/threads/high-cpu-utilization-by-average-load-values-while-no-processes-in-top-output-have-high-cpu-load.48933/ [8]: https://lists.freebsd.org/pipermail/freebsd-bugs/2013-January/051385.html [9]: https://lists.freebsd.org/pipermail/freebsd-xen/2014-January/002006.html [10]: https://forums.freebsd.org/threads/high-load-average-with-idle-state.38757/
I don't know whether it is the only cause of such behavior, but I have one logical explanation. The problem goes from the sampled nature of our load average calculation. To reduce power usage new clock code tries to aggregate events in bursts to reduce number of CPU wakeups. As result, if system is idle, its CPU may wakeup as low as two times per second. And if that CPU has some other events waking it up, it is quite likely they will be aggregated with load average sampling. As result, there is high enough probability that sampler will see some "load" to account, while it has no idea that that load will disappear in microsecond for another second. Enabling periodic interrupts breaks the event aliasing -- load average sampling will more likely fire in moments when system is really idle. But do we want to consume several extra watts of power just for one good looking number? To solve this we should probably change our mechanism of load average calculation somehow. If anybody have ideas -- I am all ears.
On 10.2-RELEASE, my long-term load average is around 0.10, which may be closer to reality because of powerd (225-1800MHz frequency scaling range). The currently chosen timecounter is TSC and the eventtimer is HPET. Unfortunately I forgot to note down the previous values. I can try an older kernel if needed.
Load Average has been known to be broken for a long time, pre-dating this by some time I think: https://lists.freebsd.org/pipermail/freebsd-current/2011-September/027138.html I think mav's observation is basically what's happening above.
Created attachment 167688 [details] broken loadavgs, a fix, then fixed loadavs
Created attachment 167689 [details] verbose boot Lenovo X200
The first 40% of the attached is a study of the "0.6" erroneous loadavgs experienced also by others, shown in some detail over long periods at virtually complete idle, plus a number of 100% CPU-bound tasks running. This shows that the oft-mentioned "0.6" figure (seeming randomly variable over a range of ~0.45 to ~0.75, with some outliers even beyond that) is an ADDITIVE increment to real loadavg, for real 100% CPU loads up to here 11. At 40% through this report are applied a pair of cpuset(1) commands, which on this machine completely fix broken loadavgs, permanently until reboot. I'll refrain from further speculation on what might cause this to happen. The rest of the report illustrates the well-behaved nature of HPET-derived loadavgs (both one-shot and periodic) at idle and at real loads of 1, 2, 4 and 11 CPU-bound tasks, after this particulat magic spell has been cast .. Verbose dmesg is also attached.
Thought I'd add that I checked 11.0-BETA4 for this problem: it still exists.
Jeremy, you shouldn't probably test LA on -BETA since it's built with INVARIANTS and WITNESS enabled. I'm not an expert but I wouldn't be surprised if these options affect LA. I've just checked 11.0-RC1 (in which kernel is built without these options) as described above and got LA 0.03. Since system never does literally nothing (there are system clock, process scheduler, etc) I would say it's a pretty expected behavior. At least it's what I see on my hardware (Fujitsu Lifebook E733).
The problem varies per hardware/system (see comment #8), including multiple types/implementations of VMs. Ian Lepore (smithi@nimnet.asn.au) already covered part of this in his "broken loadavg, a fix, then fixed loadavgs" attachment (it has a lengthy explanation). The issue almost certainly has something to do with eventtimers or timecounters. Ian and I chatted for a bit off-ticket about some of the possibilities, but neither of us could narrow down the root cause. I'm happy to wait for 11.0-RELEASE and try that, but I'm inclined to believe the issue is still present unless a committer can point to a commit that actually fixed it; the only commits I follow presently are to stable/9, and a search of stable/10 for the word "load" turns up no commits that look relevant. So, call it sysadmin + FreeBSD-user-since-1997 gut instinct. :-)
Problem described continues to exist on the following present-day FreeBSD releases (I've tested all of them; I did not test x86): * 10.3-RELEASE amd64 * 11.0-RELEASE amd64 * 12.0-CURRENT amd64 (r309773) Getting traction on this is important due to the EoL of the 9.x series as of January 1st 2017, compounded by the ports infrastructure strongly advocating ceasing use of 9.x (see ports r431746 for proof).
Apparently acpi-related. Notify committer of 232086 -- although this commit was several years ago.
For what it's worth: below is a spreadsheet of all the relevant systems I've tested including what OS version, timecounters available (and chosen), eventtimers available (and chosen), etc.. https://docs.google.com/spreadsheets/d/1xsKZ6yNMWW6TA_ZaYMwzZ-vI3C-pSQ9QAJk_yjPBOIQ/edit?usp=sharing
I wanted to add this piece of data (reflected in spreadsheet): I was able to reproduce this problem on Amazon EC2 using official FreeBSD AMIs (both 10.3-STABLE and 11.0-RELEASE) on a t2.micro instance. Launch, wait several minutes, then look at the system. There, timecounter is XENTIMER, and eventtimer is XENTIMER.
Hi, I have the same problem on pretty much all of the VPSes I use at different providers. On a brand new FreeBSD 11.0 amd64 install at Leaseweb, VULTR, Liteserver.nl, Amazon, etc I see high loads (sometimes hitting up to 2.00) on idle virtual servers with nothing installed. I've seen this high load issue reported many times on various sites and forums. It makes it hard to accurately see just how much cpu or load my virtual servers are using since I can't rely on FreeBSD's load report. Thank you, - Nick
^Triage: Update to latest -RELEASE version referenced as affected. User on IRC just reported 1.00 LA's on Oracle (13.0-p7 amd64, micro, free) VPS. Default timecounter: HPET, but no change with ACPI-fast but LAPIC increases idle load average to 2.00 (from 1.00)
(In reply to Kubilay Kocak from comment #23) timecounter: i8254 dropped idle load average from 1.00 to 0.1-0.3 ("normal")
^Triage: Re-triage given latest report on 13.0 Note also (explicitly), the original bisection to base r232086 (mfc) to base r231161 Cannot identify anyone substantially involved in timecounters since then modulo jkim (original committer), request feedback from them and kib@ for advice and potential resolution given an original regression commit was identified.
For anyone (still) affected or able to reproduce this issue, please include: - /var/run/dmesg.boot output (as attachments)
@Kubilay Kocak, I don't think this should be related to timecounters. It must be a coincidence. As I have written 7 years ago in comment 9, I believe it is a result of some synchronization between load average callouts and some other events. Since you've mentioned HPET, I'd think towards not HPET as timecounter, but as HPET as eventtimer. Depending on hardware and number of CPUs, kernel may have to use single HPET interrupt to deliver timer interrupts to multiple CPUs. As result, it is very simple to have two or more CPU to wake up from a long sleep same time (one where HPET interrupt is routed to and another where timer event is actually scheduled), that may cause load average callout to sample simultaneous threads.
Created attachment 233113 [details] ps shows idle 1000% cpu
Created attachment 233114 [details] dmesg.boot as requested 13.1-RC2
hp Z2 mini Xeon 6core. X system monitor shows 1 of 12 CPUs at 60-100%. Didn't do this a week or two ago. Previous 13.1-BETA3 and 13.1-RC1. Possibly caused by kde-plasma5 install?? Today is the first time I have heard the CPU fan run up when all core average appeared >60%. Maybe worse since I enabled KDE highDPI. Will disable.
So,having turned off sddm|kde highDPI and rebooted, ps still shows high idle. KDE system monitor reports high CPU one CPU 60-100%, while process list doesn't show idle and sorted list show worst performer is 1% CPU. Gnome system monitor shows 100% idle CPU split over several CPUs on graph between 0-10%.
Is there any interest in debugging this further? I can reproduce this on my home PC, Supermicro X12STL-IF with Xeon E-2314. Freshly installed with nothing running and 100% idle CPU, the load is hovering between 0.20-0.40.
It was fixed with timer updates in stable/12 and stable/13 a few month ago.
@Gordon (In reply to Gordon Bergling from comment #33) Cool, do you have the revision handy by any chance?
Anyone knows the commit that fixed that, and what the root cause was? I still see the problem with: $ uname -a FreeBSD nas 13.1-STABLE FreeBSD 13.1-STABLE #0 stable/13-n253535-4a8af507ebe5: Fri Jan 20 03:21:10 UTC 2023 root@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 last pid: 1545; load averages: 0.31, 0.11, 0.04 up 0+00:02:34 11:30:02 128 threads: 5 running, 103 sleeping, 20 waiting CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 15M Active, 2260K Inact, 229M Wired, 22M Buf, 31G Free Swap: 3962M Total, 3962M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0B 64K RUN 3 2:33 100.00% idle{idle: cpu3} 11 root 155 ki31 0B 64K CPU2 2 2:33 99.95% idle{idle: cpu2} 11 root 155 ki31 0B 64K CPU0 0 2:33 99.53% idle{idle: cpu0} 11 root 155 ki31 0B 64K CPU1 1 2:34 97.14% idle{idle: cpu1} 0 root -76 - 0B 640K - 1 0:00 0.04% kernel{if_config_tqg_0}
Context: aarch64 HoneyComb (so: 16 cores), main non-debug build, no video hardware, serial console and ssh access in use. I see the likes of: load averages: 0.20, 0.22, 0.25 in top -SIaocpu . The list of non-idle, non-zero WCPU% columns is usually something like: 0.05% top -SIaocpu 0.03% [clock] 0.01% [intr] 0.01% [usb] Sometimes there is also something like: 0.03% [acpi_thermal] or: 0.01% [kernel] Less often (varies more widely but is non-zero less often): 0.15% [zfskern] There are also a number of processes listed with 0.00%, such as (expect some overlaps with the above): 0.00% /usr/sbin/mountd -r /etc/exports /etc/zfs/exports 0.00% sshd: markmi@pts/1 (sshd) 0.00% /usr/sbin/ntpd -p /var/db/ntp/ntpd.pid -c /etc/ntp.conf -f /var/db/ntp/ntpd.drift -g 0.00% [intr] 0.00% [pagedaemon] 0.00% [bufdaemon] 0.00% [rand_harvestq] 0.00% [zfskern] 0.00% [syncer] 0.00% [vnlru] 0.00% [acpi_cooling0] Truely zero CPU usage processes are not listed. Definitely doing somewhat more than nothing, as is normal for so-called "idle" as far as I know. What about this would make the load averages unreasonable? (I'm also not aware of any sort of technical accuracy criteria being claimed for the load averages. More general judgments are involved about good-enough instead.) For reference: # uname -apKU # long output line split for readability FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #75 main-n259950-5723e5ac6d76-dirty: Fri Jan 6 01:50:53 PST 2023 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400077 1400077
(In reply to Mark Millard from comment #36) https://svnweb.freebsd.org/base/head/sys/kern/kern_synch.c?view=markup#l87 As far as I understand the OS is looking at the run queue every 5 seconds. And then averages it out over 1, 5 and 15 minutes. A load of 0.20 for the 1 minute counter would mean that out of the 6 times it looked at the run queue there was 1 job there in 2-3 of them.
(In reply to Stavros Filargyropoulos from comment #37) 5 seconds intervals for sampling, sampling 6 intervals, spans 30 seconds, not 60 seconds. For arithmetic mean=0.2 over 12 values, that would mean that the total of the 12 values would be 2.4 . (But the code does not use the arithmetic mean, so that is only suggestive.) I still do not see how this is to be classified as unreasonable for the background activity present, even if it was an arithmetic mean. You may want to ignore the following . . . But it does not keep track of the 12 values for a minute and then find the arithmetic mean: "average" here is more complicated. For i==0 (i.e., the 1 minute context), first setting up context for the code later quoted: cexp[i] is listed as: 0.9200444146293232 * FSCALE, /* exp(-1/12) */ The 0.9200444146293232 is for the exp(-1/12) and the 12 looks to be from 5sec * 12 == 60 sec. FSCALE is 2048. 2048 in turn is 1<<FSCALE, with FSCALE==11. Given all that, the new-sample update looks like (i==0 for the 1 minute context): avg->ldavg[i] = (cexp[i] * avg->ldavg[i] + nrun * FSCALE * (FSCALE - cexp[i])) >> FSHIFT; nrun is the new sample. It does not look like a uniform 1 minute average. More like a figure for which the contribution of the nrun values from more than 12 steps back just have a small influence on the value. Even the various 12 more recent nrun values do not look to have a uniform influence on the result, just larger than the older ones. (Not a very detailed classification, sorry.)
(In reply to Mark Millard from comment #38) > I still do not see how this is to be classified as unreasonable for the background activity present, even if it was an arithmetic mean. One argument is too look what 9.0-RELEASE used to report. Which would be an order of magnitude less. Another argument is to try to visualize what a load of 0.20 would mean if we had a single process and a very simple kernel that wouldn't run anything other than idle/scheduling and that process. Ignoring the more complex computation and taking just the arithmetic mean. 0.20 load would mean that we would have a loop running, since it's the only process, for 20% of a minute. That means it would run for 12 seconds and sleep for 48. I don't think that background activity is worth 12 seconds per minute. FreeBSD would have serious performance issues if it did. Yet another argument is to run something equivalent to the background activity. If things were fine and it's just our perception that is wrong, then I would expect the load to double. It doesn't..
Just some more observations. OpenBSD uses the same formula: https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/uvm/uvm_meter.c And load reported under similar usage is: load averages: 0.00, 0.01, 0.00 openbsd 05:14:10 51 threads: 50 idle, 1 on processor up 0:03 CPU states: 0.0% user, 0.0% nice, 0.0% sys, 0.0% spin, 0.0% intr, 100% idle Memory: Real: 9012K/723M act/tot Free: 246M Cache: 194M Swap: 0K/256M PID TID PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 96848 330739 -22 0 0K 1080K sleep - 3:13 0.00% idle0 0 100000 -18 0 0K 1080K sleep schedul 0:01 0.00% swapper 5169 407016 10 0 0K 1080K sleep bored 0:01 0.00% systqmp
(In reply to Stavros Filargyropoulos from comment #39) I've not looked at anything approaching a single hardware thread environment to observe what the load averages would be like (for the same type of cores run at the same sort of rates and other things being comparable). The closest I could test is 4 core instead of 16 core (MACCHIATObin Double Shot vs. the HoneyComb, same boot media used would be possible, still 4 GiBytes of RAM per core: 16 GiBytes). It is not obvious to me that varying just the hardware thread count leaves the load average as it is in my context. The above test would vary somewhat more then just the hardware thread count. But it would be fairly close as far as I know. I can say that a full load (no backlog) on the HoneyComb would be load average 16: the core count (nothing analogous to hyperthreading is involved). 0.20/16 == 0.0125 == 1/80, so 79 of 80 "units" are available for other uses in such a crude comparison. I'm never likely to notice the difference vs. 80 out of 80 "units" being available for use. If 4 cores also got 0.20 for the load average: 0.20/4 == 0.05 == 1/20 == 4/80, so 76 of 80 "units" left. That would be less left for other activity, but I'm still not likely to significantly notice vs. 80 out of 80. Of course, if 4 cores gets, say, 0.80 for a load average when "idle" that would be different. But I do not remember noticing such load averages on idle 4 core systems that are similar to the HoneyComb in other ways. May be I'll test at some point so general memory from activity not focused on the issue is not involved. I'm not claiming 0.20 or 0.80 would be good approximations: illustrations only. Another point is: if the machine is not near or beyond being fully in use (all hardware threads), I'm even less likely to notice the activity that is associated with idle: the parallelism avoids most direct-tradeoff time conflicts.
(In reply to Mark Millard from comment #41) The only thing I can advise for your case is to run NetBsd/OpenBSD/Linux on same setup and see the load being reported. 0.40 load for the usage you posted is not correct. There is a bug. It may be a benign one. I am not impressed however that this was closed without any further information.
0.20, not 0.40. Sorry, typing on phone.
(In reply to Stavros Filargyropoulos from comment #40) Previously you listed (FreeBSD context): 128 threads: 5 running, 103 sleeping, 20 waiting vs. for the OpenBSD context now: 51 threads: 50 idle, 1 on processor Focusing just on the thread counts (the other figures are not likely to be as stable of figures) . . . I can not tell how much of 128 vs. 50 is a mismatched context/configuration vs. how much is differences in software design/implementation. Whatever the mix, the 128 vs. 50 is notable all by itself. That actually points out an issue for the 4 core test: some types of thread counts do scale with core counts in order to be able to put more cores to use if needed. (It does not mean that they are busy.) I'm not sure how similar the thread counts would be. I might need to stop nfs or some such to have a more similar context relative to thread counts.
fwiw the dualcore HT atom minipc I originally reported this for has been showing realistic averages (0.01 to 0.10) since my last post around 10.2 in late 2015, and has remained that way until now, running 13.1. Here's the system vars that might be affecting processor time accounting. kern.timecounter.choice: TSC(1000) ACPI-fast(900) HPET(950) i8254(0) kern.eventtimer.choice: HPET(450) HPET1(440) HPET2(440) LAPIC(100) i8254(100) RTC(0) dev.cpufreq.0.freq_driver: p4tcc0 dev.cpu.0.freq_levels: 1800/-1 1575/-1 1350/-1 1125/-1 900/-1 675/-1 450/-1 225/-1 dev.cpu.0.cx_method: C1/mwait/hwc dev.cpu.0.cx_supported: C1/1/3
It is a long known problem of very idle systems. To save the power kernel tries to group different timer events together to reduce number of CPU wakeups from idle state. The problem is that load average accounting also based on timer events, and if system is idle, each time it fires, there is often something else to do, that is getting accounted into load average.
(In reply to Alexander Motin from comment #46) Alexander, in the comment above gbe@FreeBSD.org suggested that this has been fixed. I tried to have a look through the recent commits in ACPI but can't seem to find anything that would have fixed this. Do you happen to know commit was Gordon referring to?
I am not sure what he means. There could be some optimizations, but I am not sure it can be easily fixed, unless we implement some mechanism to somehow strictly prevent timer other events grouping with the load average ones.
(In reply to Alexander Motin from comment #48) Thanks Alexander. Gives me some more confidence to know the root cause and understand that it shouldn't functionally affect the system in any other ways.
(In reply to Alexander Motin from comment #46) Quoting your #9 : . . . if system is idle, its CPU may wakeup as low as two times per second. But, the load average code is based on fixed 5 second sampling and then a small variability is tolerated/introduced to avoid matching a (nearly) fixed frequency activity accidentally matching. The "fixed" turns into a fixed number of adjustments, such as 12, 60, or 180. Having anything like 30sec between steppings of the load average adjustment, spreads the 12 steps case (60 sec @5 sec each intended) out to something like 360 sec total for the 12 steps. (Also has implications for widely variable time interval sizes.) If that is really the case, not much else can matter as long as the 5 sec interval is not well approximated and calling the average a "1 minute" average can be highly misleading. Descriptions that would be accurate are "12 step load figure", "60 step load figure", and "180 step load figure". No fixed conversion to time intervals available --and not really arithmetic means. Sound right? Note: For what I use the "load averages" for, the "N step load figure" interpretation works fine without knowing specific times. Inaccuracies in small figures (relative to the hardware thread count involved) make no noticeable difference. Nor do small percentage errors in the larger figures: Rule of thumb material for making some judgments that are not highly sensitive to the details.
(In reply to Mark Millard from comment #50) > 30sec Where did you get the 30secs from? He said twice per second. Not per minute?
(In reply to Stavros Filargyropoulos from comment #51) Good question: bad substitution in my head that I did not notice, despite "second" being in what I quoted. So Ignore my #50. Sorry for the noise. Too bad I can not delete to avoid my confusing anyone besides myself . . .
(In reply to Gordon Bergling from comment #33) Gordon, please provide a date and link for the revision you refer to? I updated to 12.4 yesterday, assuming that "updates in stable/12 and stable/13 a few month ago" should mean preceding 12.4 being spun off stable/12 on 20th October, but I can see no change at all in idle behaviour of load_avgs? (Alexander, more when I know more ...)
(In reply to Stavros Filargyropoulos from comment #42) FYI for comparison/contrast: I booted a new USB3 installation of OpenBSD 7.2 on the HoneyComb and let it sit for over 20 minutes with top -1SCHIocpu running. A sequence of occasional extraction from after that are: load averages: 0.50, 0.35, 0.23 load averages: 0.26, 0.31, 0.22 load averages: 0.39, 0.34, 0.24 load averages: 0.24, 0.30, 0.23 load averages: 0.41, 0.33, 0.26 load averages: 0.21, 0.30, 0.25 That gives an idea of the variability. The list of non-idle processes is almost always empty but on occasion shows a acpi0 line (and never any others that I've observed). Looks to me like it reports generally larger load averages than FreeBSD on the specific platform. (I've no clue what all contributes to that status.)
(In reply to Stavros Filargyropoulos from comment #42) FYI: a booted-NetBSD-on-HoneyComb comparison/contrast done in a similar way to the earlier OpenBSD example: Very different results, always . . . load averages: 0.00, 0.00, 0.00