Bug 135447

Summary: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported
Product: Base System Reporter: Dmitry Kubov <dk>
Component: i386Assignee: freebsd-acpi (Nobody) <acpi>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 7.2-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.txt none

Description Dmitry Kubov 2009-06-10 17:00:11 UTC
Intel Turbo Boost Technology not working, re-released Intel Hyper-Threading Technology demonstrates performance drop.

sysctl -a output attached

Not possible to see CPU P0-state/C3-state usage, required to enable Turbo Boost Technology, see http://www.intel.com/technology/turboboost/

Fix: Not use Hyper-Threading, but still no Turbo Boost benefits

Patch attached with submission follows:
How-To-Repeat: 1. Disabled both Turbo Boost and Hyper-Threading.
2. take some benchmarking (both single threaded and 8-16 threaded)
3. Enabled Turbo Boost (only)
4. take some benchmarking - same results (1-2% difference)
5. Enabled both Turbo Boost and Hyper-Threading.
6. take some benchmarking - 10-15% performance drop
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2009-06-12 06:42:26 UTC
State Changed
From-To: open->suspended

Mark suspended awaiting someone to test some code.
Comment 2 Bruce Cran freebsd_committer freebsd_triage 2010-09-12 11:31:54 UTC
State Changed
From-To: suspended->closed



Comment 3 Bruce Cran freebsd_committer freebsd_triage 2010-09-12 11:31:54 UTC
Responsible Changed
From-To: freebsd-amd64->brucec
Comment 4 Bruce Cran freebsd_committer freebsd_triage 2010-09-12 11:35:23 UTC
State Changed
From-To: closed->open

This does appear to be a bug, but only in the ACPI code: Hyperthreading  
isn't guaranteed to give a performance increase for every workload, and  
TurboBoost is only seen once lower Cx states are enabled. 


Comment 5 Bruce Cran freebsd_committer freebsd_triage 2010-09-12 11:35:23 UTC
Responsible Changed
From-To: brucec->freebsd-acpi

Over to the acpi list. 
This appears to be a problem with ACPI on the machine, because C2 and C3  
states should probably be available.
Comment 6 Andriy Gapon 2010-09-12 16:52:14 UTC
First of all, the PR was created improperly - sysctl -a output is in "Fix"
section for some reason.  Please don't do that in the future.

Second, it's impossible to tell if "C2 and C3 states should probably be
available" unless we see acpidump -dt output and dmesg with ACPI debugging
enabled (ACPI_PROCESSOR x ACPI_DB_INFO):
http://www.freebsd.org/doc/handbook/acpi-debug.html

P.S.
That is, generally we expect to see C2 and C3 levels, but not universally.
BIOS or BIOS configuration, motherboard peculiarities and OS supported features
are among the thing that affect C2+ availability.

-- 
Andriy Gapon
Comment 7 Jaakko Heinonen freebsd_committer freebsd_triage 2010-09-19 09:38:54 UTC
State Changed
From-To: open->feedback

Note that submitter has been asked for feedback.
Comment 8 Andriy Gapon freebsd_committer freebsd_triage 2010-09-20 15:27:02 UTC
on 20/09/2010 17:06 Dmitry Kubov said the following:
> Maybe I need some kind of powerd running? No any info about TurboBoost tune on
> FreeBSD.

So you do have the levels reported in cx_supported?
This is not what you attached to the PR.  And this is not what I tried to debug.
I am not sure what changed in your environment, but you should have said that you
have the those levels reported and not wasted my time on this.

Please ask the above on questions@.

> # sysctl dev.cpu
> dev.cpu.0.%desc: ACPI CPU
> dev.cpu.0.%driver: cpu
> dev.cpu.0.%location: handle=\_PR_.P001
> dev.cpu.0.%pnpinfo: _HID=none _UID=0
> dev.cpu.0.%parent: acpi0
> dev.cpu.0.freq: 2934
> dev.cpu.0.freq_levels: 2934/105000 2800/82000 2667/71000 2533/64000 2400/55000 2
> 267/48000 2133/41000 2000/36000 1867/31000 1733/27000 1600/23000 1400/20125
> 1200/17250 1000/14375 800/11500 600/8625 400/5750 200/2875
> dev.cpu.0.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.0.cx_lowest: C1
> dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 500us
> dev.cpu.1.%desc: ACPI CPU
> dev.cpu.1.%driver: cpu
> dev.cpu.1.%location: handle=\_PR_.P002
> dev.cpu.1.%pnpinfo: _HID=none _UID=0
> dev.cpu.1.%parent: acpi0
> dev.cpu.1.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.1.cx_lowest: C1
> dev.cpu.1.cx_usage: 100.00% 0.00% 0.00% last 500us
> dev.cpu.2.%desc: ACPI CPU
> dev.cpu.2.%driver: cpu
> dev.cpu.2.%location: handle=\_PR_.P003
> dev.cpu.2.%pnpinfo: _HID=none _UID=0
> dev.cpu.2.%parent: acpi0
> dev.cpu.2.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.2.cx_lowest: C1
> dev.cpu.2.cx_usage: 100.00% 0.00% 0.00% last 500us
> dev.cpu.3.%desc: ACPI CPU
> dev.cpu.3.%driver: cpu
> dev.cpu.3.%location: handle=\_PR_.P004
> dev.cpu.3.%pnpinfo: _HID=none _UID=0
> dev.cpu.3.%parent: acpi0
> dev.cpu.3.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.3.cx_lowest: C1
> dev.cpu.3.cx_usage: 100.00% 0.00% 0.00% last 500us
> dev.cpu.4.%desc: ACPI CPU
> dev.cpu.4.%driver: cpu
> dev.cpu.4.%location: handle=\_PR_.P005
> dev.cpu.4.%pnpinfo: _HID=none _UID=0
> dev.cpu.4.%parent: acpi0
> dev.cpu.4.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.4.cx_lowest: C1
> dev.cpu.4.cx_usage: 100.00% 0.00% 0.00% last 500us
> dev.cpu.5.%desc: ACPI CPU
> dev.cpu.5.%driver: cpu
> dev.cpu.5.%location: handle=\_PR_.P006
> dev.cpu.5.%pnpinfo: _HID=none _UID=0
> dev.cpu.5.%parent: acpi0
> dev.cpu.5.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.5.cx_lowest: C1
> dev.cpu.5.cx_usage: 100.00% 0.00% 0.00% last 500us
> dev.cpu.6.%desc: ACPI CPU
> dev.cpu.6.%driver: cpu
> dev.cpu.6.%location: handle=\_PR_.P007
> dev.cpu.6.%pnpinfo: _HID=none _UID=0
> dev.cpu.6.%parent: acpi0
> dev.cpu.6.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.6.cx_lowest: C1
> dev.cpu.6.cx_usage: 100.00% 0.00% 0.00% last 500us
> dev.cpu.7.%desc: ACPI CPU
> dev.cpu.7.%driver: cpu
> dev.cpu.7.%location: handle=\_PR_.P008
> dev.cpu.7.%pnpinfo: _HID=none _UID=0
> dev.cpu.7.%parent: acpi0
> dev.cpu.7.cx_supported: C1/3 C2/205 C3/245
> dev.cpu.7.cx_lowest: C1
> dev.cpu.7.cx_usage: 100.00% 0.00% 0.00% last 500us



-- 
Andriy Gapon
Comment 9 Dmitry Kubov 2010-09-20 15:41:08 UTC
> on 20/09/2010 17:06 Dmitry Kubov said the following:
>> Maybe I need some kind of powerd running? No any info about TurboBoost tune on
>> FreeBSD.
> So you do have the levels reported in cx_supported?
> This is not what you attached to the PR.  And this is not what I tried to debug.
> I am not sure what changed in your environment, but you should have said that you
> have the those levels reported and not wasted my time on this.
>
> Please ask the above on questions@.

Change is 7.2 upgraded to 8.1
Comment 10 Andriy Gapon freebsd_committer freebsd_triage 2010-09-20 15:48:07 UTC
on 20/09/2010 17:41 Dmitry Kubov said the following:
> 
> 
>> on 20/09/2010 17:06 Dmitry Kubov said the following:
>>> Maybe I need some kind of powerd running? No any info about TurboBoost tune on
>>> FreeBSD.
>> So you do have the levels reported in cx_supported?
>> This is not what you attached to the PR.  And this is not what I tried to debug.
>> I am not sure what changed in your environment, but you should have said that you
>> have the those levels reported and not wasted my time on this.
>>
>> Please ask the above on questions@.
> 
> Change is 7.2 upgraded to 8.1

Great!  That was a wise decision on your part.
But you should have told us/me that (e.g. followed up to your own PR).
OK, now you can enable use of lower Cx states via hw.acpi.cpu.cx_lowest sysctl and
the PR can be closed.

-- 
Andriy Gapon
Comment 11 Dmitry Kubov 2010-09-20 15:54:41 UTC
>> Change is 7.2 upgraded to 8.1
> Great!  That was a wise decision on your part.
> But you should have told us/me that (e.g. followed up to your own PR).
> OK, now you can enable use of lower Cx states via hw.acpi.cpu.cx_lowest sysctl and
> the PR can be closed.

Well,
#sysctl hw.acpi.cpu.cx_lowest="C3"
hw.acpi.cpu.cx_lowest: C1 -> C3

few minutes later top statistics
last pid: 90967;  load averages:  3.91,  3.32,  2.32   up 15+21:14:21  
18:52:30
86 processes:  5 running, 81 sleeping
CPU: 43.8% user,  0.0% nice,  3.3% system,  0.1% interrupt, 52.7% idle

# sysctl dev.cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.P001
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 2934
dev.cpu.0.freq_levels: 2934/105000 2800/82000 2667/71000 2533/64000 
2400/55000 2
267/48000 2133/41000 2000/36000 1867/31000 1733/27000 1600/23000 
1400/20125 1200/17250 1000/14375 800/11500 600/8625 400/5750 200/2875
dev.cpu.0.cx_supported: C1/3 C2/205 C3/245
dev.cpu.0.cx_lowest: C3
dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 500us
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.P002
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.cx_supported: C1/3 C2/205 C3/245
dev.cpu.1.cx_lowest: C3
dev.cpu.1.cx_usage: 100.00% 0.00% 0.00% last 500us
dev.cpu.2.%desc: ACPI CPU
dev.cpu.2.%driver: cpu
dev.cpu.2.%location: handle=\_PR_.P003
dev.cpu.2.%pnpinfo: _HID=none _UID=0
dev.cpu.2.%parent: acpi0
dev.cpu.2.cx_supported: C1/3 C2/205 C3/245
dev.cpu.2.cx_lowest: C3
dev.cpu.2.cx_usage: 100.00% 0.00% 0.00% last 500us
dev.cpu.3.%desc: ACPI CPU
dev.cpu.3.%driver: cpu
dev.cpu.3.%location: handle=\_PR_.P004
dev.cpu.3.%pnpinfo: _HID=none _UID=0
dev.cpu.3.%parent: acpi0
dev.cpu.3.cx_supported: C1/3 C2/205 C3/245
dev.cpu.3.cx_lowest: C3
dev.cpu.3.cx_usage: 100.00% 0.00% 0.00% last 500us
dev.cpu.4.%desc: ACPI CPU
dev.cpu.4.%driver: cpu
dev.cpu.4.%location: handle=\_PR_.P005
dev.cpu.4.%pnpinfo: _HID=none _UID=0
dev.cpu.4.%parent: acpi0
dev.cpu.4.cx_supported: C1/3 C2/205 C3/245
dev.cpu.4.cx_lowest: C3
dev.cpu.4.cx_usage: 100.00% 0.00% 0.00% last 500us
dev.cpu.5.%desc: ACPI CPU
dev.cpu.5.%driver: cpu
dev.cpu.5.%location: handle=\_PR_.P006
dev.cpu.5.%pnpinfo: _HID=none _UID=0
dev.cpu.5.%parent: acpi0
dev.cpu.5.cx_supported: C1/3 C2/205 C3/245
dev.cpu.5.cx_lowest: C3
dev.cpu.5.cx_usage: 100.00% 0.00% 0.00% last 500us
dev.cpu.6.%desc: ACPI CPU
dev.cpu.6.%driver: cpu
dev.cpu.6.%location: handle=\_PR_.P007
dev.cpu.6.%pnpinfo: _HID=none _UID=0
dev.cpu.6.%parent: acpi0
dev.cpu.6.cx_supported: C1/3 C2/205 C3/245
dev.cpu.6.cx_lowest: C3
dev.cpu.6.cx_usage: 100.00% 0.00% 0.00% last 500us
dev.cpu.7.%desc: ACPI CPU
dev.cpu.7.%driver: cpu
dev.cpu.7.%location: handle=\_PR_.P008
dev.cpu.7.%pnpinfo: _HID=none _UID=0
dev.cpu.7.%parent: acpi0
dev.cpu.7.cx_supported: C1/3 C2/205 C3/245
dev.cpu.7.cx_lowest: C3
dev.cpu.7.cx_usage: 100.00% 0.00% 0.00% last 500us

C2/C3 not used at all
Comment 12 Andriy Gapon freebsd_committer freebsd_triage 2010-09-20 16:11:20 UTC
on 20/09/2010 17:54 Dmitry Kubov said the following:
> dev.cpu.7.cx_supported: C1/3 C2/205 C3/245
Note these------------------------^^^----^^^
> dev.cpu.7.cx_lowest: C3
> dev.cpu.7.cx_usage: 100.00% 0.00% 0.00% last 500us
And this --------------------------------------^^^^^
> C2/C3 not used at all

And now there is this code in acpi_cpu.c:

    /* Find the lowest state that has small enough latency. */
    cx_next_idx = 0;
    for (i = sc->cpu_cx_lowest; i >= 0; i--) {
        if (sc->cpu_cx_states[i].trans_lat * 3 <= sc->cpu_prev_sleep) {
            cx_next_idx = i;
            break;
        }
    }

205 * 3 and 245 * 3 are both greater than 500, so this is the reason why they are
never entered.

Perhaps Alexander can give some advice here.

-- 
Andriy Gapon
Comment 13 Dmitry Kubov 2010-09-20 16:29:12 UTC

> 205 * 3 and 245 * 3 are both greater than 500, so this is the reason why they are
> never entered.
>
> Perhaps Alexander can give some advice here.


Looks like I can simply update src to 8-stable?

Revision *1.79.2.10*: download 
<http://www.freebsd.org/cgi/cvsweb.cgi/%7Echeckout%7E/src/sys/dev/acpica/acpi_cpu.c?rev=1.79.2.10;content-type=text%2Fplain> 
- view: text 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c?rev=1.79.2.10;content-type=text%2Fplain>, 
markup 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c?rev=1.79.2.10;content-type=text%2Fx-cvsweb-markup>, 
annotated 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c?annotate=1.79.2.10> 
- select for diffs 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c?r1=1.79.2.10#rev1.79.2.10>
/Mon Sep 20 05:39:50 2010 UTC/ (9 hours, 47 minutes ago) by /avg/
Branches: RELENG_8 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c?only_with_tag=RELENG_8>
Diff to: previous 1.79.2.9: preferred 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c.diff?r1=1.79.2.9;r2=1.79.2.10>, 
colored 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c.diff?r1=1.79.2.9;r2=1.79.2.10;f=h>; 
branchpoint 1.79: preferred 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c.diff?r1=1.79;r2=1.79.2.10>, 
colored 
<http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/acpica/acpi_cpu.c.diff?r1=1.79;r2=1.79.2.10;f=h>
Changes since revision 1.79.2.9: +1 -9 lines

SVN rev 212887 on 2010-09-20 05:39:50Z by avg

MFC r212549: acpi_cpu: do not apply P_LVLx_LAT rules to latencies
returned by _CST

Comment 14 Alexander Motin freebsd_committer freebsd_triage 2010-09-20 16:42:57 UTC
Andriy Gapon wrote:
> on 20/09/2010 17:54 Dmitry Kubov said the following:
>> dev.cpu.7.cx_supported: C1/3 C2/205 C3/245
> Note these------------------------^^^----^^^
>> dev.cpu.7.cx_lowest: C3
>> dev.cpu.7.cx_usage: 100.00% 0.00% 0.00% last 500us
> And this --------------------------------------^^^^^
>> C2/C3 not used at all
> 
> 205 * 3 and 245 * 3 are both greater than 500, so this is the reason why they are
> never entered.

The only way to enter C-states with so high latency is significantly
increase CPUs' continuous sleep time. Sleep time of 500ms there is
artificial and calculated as 1000000/(2*hz). 8.1 was unable yet to
measure real sleep time in C1. But 2*hz is quite realistic estimation
for idle system.

Recently I have committed to 9-CURRENT large set of patches, making idle
CPUs to not wake up on timer interrupts when it is not needed. It allows
idle CPUs sleep up to as much as 100000us, making any C-states available
now effectively usable. I can acknowledge that TurboBoost on my Core i7
870 gives about 10% benefit when only one physical core is used:
http://docs.freebsd.org/cgi/mid.cgi?4C959830.3060808

I have requests and wish to merge these changes into 8-STABLE, but most
likely it won't happen in nearest few months, as code is very new and
requires more testing.

Until that time I recommend you to follow this guide:
http://wiki.freebsd.org/TuningPowerConsumption
It was actually oriented on laptops, but effective usage of C2/C3 states
 was one of it's goals. Also on my Core i7 870 LAPIC dies in C2/C3
states, so consider migration to i8254 timer, as also described in this
guide.

-- 
Alexander Motin
Comment 15 Alexander Motin freebsd_committer freebsd_triage 2010-09-20 16:49:54 UTC
Dmitry Kubov wrote:
>> 205 * 3 and 245 * 3 are both greater than 500, so this is the reason why they are
>> never entered.
>>
>> Perhaps Alexander can give some advice here.
> 
> Looks like I can simply update src to 8-stable?
> 
> SVN rev 212887 on 2010-09-20 05:39:50Z by avg
> 
> MFC r212549: acpi_cpu: do not apply P_LVLx_LAT rules to latencies
> returned by _CST

No, it's different case. This won't help you.

-- 
Alexander Motin
Comment 16 Dmitry Kubov 2010-09-20 17:39:54 UTC
> Until that time I recommend you to follow this guide:
> http://wiki.freebsd.org/TuningPowerConsumption
> It was actually oriented on laptops, but effective usage of C2/C3 states
>   was one of it's goals. Also on my Core i7 870 LAPIC dies in C2/C3
> states, so consider migration to i8254 timer, as also described in this
> guide.

Thanks for help, I'll start with this guide.
Comment 17 Dmitry Kubov 2010-09-21 12:12:52 UTC
Ok, I am able to activate C3 state after loader.conf tweaks. According 
to http://www.intel.com/technology/turboboost/

Intel Turbo Boost Technology is activated when the Operating System (OS) 
requests the highest processor performance state (P0).

I have no clue about P0 state activation on FreeBSD.
Comment 18 Alexander Motin freebsd_committer freebsd_triage 2010-09-21 12:16:46 UTC
Dmitry Kubov wrote:
> Ok, I am able to activate C3 state after loader.conf tweaks. According
> to http://www.intel.com/technology/turboboost/
> 
> Intel Turbo Boost Technology is activated when the Operating System (OS)
> requests the highest processor performance state (P0).
> 
> I have no clue about P0 state activation on FreeBSD.

P0 is just a highest available CPU frequency. If you are not using
powerd - it should be set all the time. If you are using powerd - it
will set it in part of second after load appear.

-- 
Alexander Motin
Comment 19 Alexander Motin freebsd_committer freebsd_triage 2010-09-23 13:34:23 UTC
Dmitry Kubov wrote:
>> It would be
>> interesting to repeat same test if you updated to 8-STABLE or at least
>> apply patch from SVN rev 209897 on 2010-07-11 11:58:46Z.
> 
> New system:
> CPU: Intel(R) Xeon(R) CPU           X5680  @ 3.33GHz (3333.47-MHz
> K8-class CPU)
> FreeBSD/SMP: Multiprocessor System Detected: 12 CPUs
> FreeBSD/SMP: 2 package(s) x 6 core(s)
> HT disabled in BIOS.

This CPU has only 266MHz TurboBoost speedup. And some part of it
(probably half) could be enabled all the time. This benefit still could
be overweighted by C-states latencies penalty. It could be interesting
to test some other workloads, like compilation with different number of
threads.

> Note 3333/3334 difference:
> TurboBoost disabled:
> dev.cpu.0.freq: 3333
> dev.cpu.0.freq_levels: 3333/130000 3200/117000 3067/105000 2933/94000
> 2800/85000
>  2667/76000 2533/68000 2400/61000 2267/54000 2133/48000 2000/43000
> 1867/39000 17
> 33/35000 1600/32000 1400/28000 1200/24000 1000/20000 800/16000 600/12000
> 400/8000 200/4000
> dev.est.0.freq_settings: 3333/130000 3200/117000 3067/105000 2933/94000
> 2800/850
> 00 2667/76000 2533/68000 2400/61000 2267/54000 2133/48000 2000/43000
> 1867/39000 1733/35000 1600/32000
> 
> TurboBoost enabled:
> dev.cpu.0.freq: 3334
> dev.cpu.0.freq_levels: 3334/143000 3200/117000 3067/105000 2933/94000
> 2800/85000
>  2667/76000 2533/68000 2400/61000 2267/54000 2133/48000 2000/43000
> 1867/39000 17
> 33/35000 1600/32000 1400/28000 1200/24000 1000/20000 800/16000 600/12000
> 400/8000 200/4000
> dev.est.0.freq_settings: 3334/143000 3333/130000 3200/117000 3067/105000
> 2933/94
> 000 2800/85000 2667/76000 2533/68000 2400/61000 2267/54000 2133/48000
> 2000/43000 1867/39000 1733/35000 1600/32000

Intel writes that BIOS may report additional P-state with 1MHz
difference, to allow OS to control TurboBoost. It's just cpufreq
subsystem behavior/limitation to drop very close frequencies. Actually I
am not sure how this additional P-state could be used, except for testing.

> In short: no 60% disk io performance drop in 8.1-STABLE. Other tests
> give same results like 8.1-RELEASE, 5% average cpu performance drop.

Disk performance fix is reasonable. Some recent improvements in
9-CURRENT should improve it even more. What's about ubench - try some
different load.

-- 
Alexander Motin
Comment 20 Dmitry Kubov 2010-09-23 13:48:18 UTC
> This CPU has only 266MHz TurboBoost speedup. And some part of it
> (probably half) could be enabled all the time. This benefit still could
> be overweighted by C-states latencies penalty. It could be interesting
> to test some other workloads, like compilation with different number of
> threads.
>

Actually tested 8.1-RELEASE with both TurboBoost options in BIOS:

TurboBoost OFF
Ubench Single CPU:   451935 (0.40s)
Ubench Single CPU:   450927 (0.40s)
Ubench Single CPU:   450486 (0.40s)

TurboBoost ON
Ubench Single CPU:   450890 (0.40s)
Ubench Single CPU:   450890 (0.40s)
Ubench Single CPU:   449926 (0.40s)

C-states latencies penalty is reasonable idea. But looks like P0-state 
not activated at all.
What about too high %% for C3 state during heavy load:
dev.cpu.0.cx_usage: 0.17% 0.06% 99.75% last 7560us

> Disk performance fix is reasonable. Some recent improvements in
> 9-CURRENT should improve it even more. What's about ubench - try some
> different load.
>
Can you suggest other CPU only benchmark?

make -j 16 buildworld
can't load all cores, can't see less than 11% idle
Comment 21 Alexander Motin freebsd_committer freebsd_triage 2010-09-23 14:01:09 UTC
Dmitry Kubov wrote:
> 
>> This CPU has only 266MHz TurboBoost speedup. And some part of it
>> (probably half) could be enabled all the time. This benefit still could
>> be overweighted by C-states latencies penalty. It could be interesting
>> to test some other workloads, like compilation with different number of
>> threads.
>>
> 
> Actually tested 8.1-RELEASE with both TurboBoost options in BIOS:
> 
> TurboBoost OFF
> Ubench Single CPU:   451935 (0.40s)
> Ubench Single CPU:   450927 (0.40s)
> Ubench Single CPU:   450486 (0.40s)
> 
> TurboBoost ON
> Ubench Single CPU:   450890 (0.40s)
> Ubench Single CPU:   450890 (0.40s)
> Ubench Single CPU:   449926 (0.40s)
> 
> C-states latencies penalty is reasonable idea. But looks like P0-state
> not activated at all.

Try to kill powerd and manually set highest CPU frequency. 0.40s test
time looks a bit suspicious, as powerd may just not react in time to set
P0 state.

> What about too high %% for C3 state during heavy load:
> dev.cpu.0.cx_usage: 0.17% 0.06% 99.75% last 7560us

It's not really strange. These numbers count number of enters into each
state. So when CPU is completely bust - they won't be updated. Main case
when C1 state should be actively used/counted is loads with high
interrupt rate or heavy context switching, such as disk I/O or network load.

>> Disk performance fix is reasonable. Some recent improvements in
>> 9-CURRENT should improve it even more. What's about ubench - try some
>> different load.
>>
> Can you suggest other CPU only benchmark?
> 
> make -j 16 buildworld
> can't load all cores, can't see less than 11% idle

I think it's not the main goal to completely load all CPUs. But this
test is realistic and has really usable result.

-- 
Alexander Motin
Comment 22 Dmitry Kubov 2010-09-23 14:06:22 UTC
> Try to kill powerd and manually set highest CPU frequency. 0.40s test
> time looks a bit suspicious, as powerd may just not react in time to set
> P0 state.
>
powerd does not enabled. Where/how set highest CPU frequency?
Comment 23 Alexander Motin freebsd_committer freebsd_triage 2010-09-23 14:07:32 UTC
Dmitry Kubov wrote:
>> Try to kill powerd and manually set highest CPU frequency. 0.40s test
>> time looks a bit suspicious, as powerd may just not react in time to set
>> P0 state.
>>
> powerd does not enabled. Where/how set highest CPU frequency?

sysctl dev.cpu |grep freq

-- 
Alexander Motin
Comment 24 Dmitry Kubov 2010-09-23 14:10:05 UTC
>>> Try to kill powerd and manually set highest CPU frequency. 0.40s test
>>> time looks a bit suspicious, as powerd may just not react in time to set
>>> P0 state.
>>>
>> powerd does not enabled. Where/how set highest CPU frequency?
> sysctl dev.cpu |grep freq
>
# sysctl dev.cpu |grep freq
dev.cpu.0.freq: 3334
dev.cpu.0.freq_levels: 3334/143000 3200/117000 3067/105000 2933/94000 
2800/85000 2667/76000 2533/68000 2400/61000 2267/54000 2133/48000 
2000/43000 1867/39000 1733/35000 1600/32000 1400/28000 1200/24000 
1000/20000 800/16000 600/12000 400/8000 200/4000

So its already max freq.
Comment 25 Dmitry Kubov 2010-09-24 08:18:01 UTC
Is it possible to stick running threads to same CPU core for longer time 
to avoid C-states latencies penalty?
Comment 26 Alexander Motin freebsd_committer freebsd_triage 2010-09-24 08:22:41 UTC
Dmitry Kubov wrote:
> Is it possible to stick running threads to same CPU core for longer time
> to avoid C-states latencies penalty?

man 1 cpuset

-- 
Alexander Motin
Comment 27 Dmitry Kubov 2010-09-24 08:34:51 UTC
> Dmitry Kubov wrote:
>> Is it possible to stick running threads to same CPU core for longer time
>> to avoid C-states latencies penalty?
> man 1 cpuset
>
Its static assignment, requires scheduling all tasks by hand.

cpuset -l 7 ubench -cs
Ubench Single CPU:   453051 (0.40s)
less than 1% boost.
Comment 28 Andriy Gapon freebsd_committer freebsd_triage 2010-09-24 08:35:27 UTC
State Changed
From-To: feedback->closed

The issue as described in this PR is not present in stable/8. 
ACPI in stable/7 is not going to be updated. 
In stable/8 C3 is reported but is never used in default 
configuration, because processor never sleeps long enough 
to C3 state with that long enter/exit delay. 
The reporter is tuning his system for optimal performance.