I plan to upgrade our server to a Ryzen 9 5950X system, 16 cores, 3400MHz base frequency, 128GB RAM, and ran into an issue while testing. For reference I will use a very simple command: dd if=/dev/zero bs=1M count=1000 | bzip2 - | wc which, using a Linux rescue system (i.e. nothing else running) consistently and repeatedly completes in about: 1048576000 bytes (1.0 GB, 1000 MiB) copied, 4.67561 s, 224 MB/s Now for FreeBSD 12.2-RELEASE, on a basic boot not running anything except ssh: if I _disable_ hyperthreading using machdep.hyperthreading_allowed=0, I get the following approximate result consistently and repeatedly: dd if=/dev/zero bs=1M count=1000 | bzip2 - | wc 1048576000 bytes transferred in 4.874335 secs (215121876 bytes/sec) Slightly slower than Linux, not sure if this is in how bzip2 is compiled etc, but nothing that worries me. However, if I _enable_ hyperthreading, i.e. the default I started with, then I will get: dd if=/dev/zero bs=1M count=1000 | bzip2 - | wc 1048576000 bytes transferred in 4.887522 secs (214541450 bytes/sec) 1048576000 bytes transferred in 7.507138 secs (139677190 bytes/sec) 1048576000 bytes transferred in 6.227179 secs (168386989 bytes/sec) 1048576000 bytes transferred in 7.590263 secs (138147516 bytes/sec) 1048576000 bytes transferred in 7.421037 secs (141297776 bytes/sec) 1048576000 bytes transferred in 4.922986 secs (212995935 bytes/sec) 1048576000 bytes transferred in 4.945138 secs (212041827 bytes/sec) 1048576000 bytes transferred in 7.671600 secs (136682828 bytes/sec) 1048576000 bytes transferred in 7.673428 secs (136650273 bytes/sec) i.e. very consistently varying results with relatively large differences in commands executed immediately after one another (and no other load whatsoever). I'm curious why this is happening. I am not running powerd or touched any of the cpu settings. Booting _without_ hyperthreading: CPU: AMD Ryzen 9 5950X 16-Core Processor (3393.70-MHz K8-class CPU) Origin="AuthenticAMD" Id=0xa20f10 Family=0x19 Model=0x21 Stepping=0 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,<b30>> Structured Extended Features=0x219c97a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA> Structured Extended Features2=0x40068c<UMIP,PKU,VAES,VPCLMULQDQ,RDPID> Structured Extended Features3=0x10 XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> AMD Extended Feature Extensions ID EBX=0x111ef657<CLZERO,IRPerf,XSaveErPtr> SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 137434759168 (131068 MB) avail memory = 133793423360 (127595 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 8 core(s) x 2 hardware threads FreeBSD/SMP Online: 1 package(s) x 2 cache groups x 8 core(s) # sysctl dev.cpu.0 dev.cpu.0.cx_method: C1/hlt C2/io dev.cpu.0.cx_usage_counters: 11922 0 dev.cpu.0.cx_usage: 100.00% 0.00% last 43430us dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_supported: C1/1/1 C2/2/18 dev.cpu.0.freq_levels: 3400/3740 2800/2800 2200/1980 dev.cpu.0.freq: 3400 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 dev.cpu.0.%location: handle=\_SB_.PLTF.C000 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU Booting _with_ hyperthreading: CPU: AMD Ryzen 9 5950X 16-Core Processor (3393.69-MHz K8-class CPU) Origin="AuthenticAMD" Id=0xa20f10 Family=0x19 Model=0x21 Stepping=0 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,<b30>> Structured Extended Features=0x219c97a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA> Structured Extended Features2=0x40068c<UMIP,PKU,VAES,VPCLMULQDQ,RDPID> Structured Extended Features3=0x10 XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> AMD Extended Feature Extensions ID EBX=0x111ef657<CLZERO,IRPerf,XSaveErPtr> SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 137434759168 (131068 MB) avail memory = 133793423360 (127595 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 8 core(s) x 2 hardware threads # sysctl dev.cpu.0 dev.cpu.0.cx_method: C1/hlt C2/io dev.cpu.0.cx_usage_counters: 3232 0 dev.cpu.0.cx_usage: 100.00% 0.00% last 65311us dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_supported: C1/1/1 C2/2/18 dev.cpu.0.freq_levels: 3400/3740 2800/2800 2200/1980 dev.cpu.0.freq: 3400 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 dev.cpu.0.%location: handle=\_SB_.PLTF.C000 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU zenstates.py reports: # ./zenstates.py -l P0 - Enabled - FID = 88 - DID = 8 - VID = 48 - IDD = 22( / 1 ) - Ratio = 34.00 - vCore = 1.10000 P1 - Enabled - FID = 8C - DID = A - VID = 58 - IDD = 1C( / 1 ) - Ratio = 28.00 - vCore = 1.00000 P2 - Enabled - FID = 84 - DID = C - VID = 68 - IDD = 16( / 1 ) - Ratio = 22.00 - vCore = 0.90000 P3 - Disabled P4 - Disabled P5 - Disabled P6 - Disabled P7 - Disabled Core Performance Boost - Enabled C6 State - Package - Disabled C6 State - Core - Enabled FWIW if I disable core performance boost the varying execution times shift from ~4.8 and ~7.8 to to ~6.5 and ~10.8 seconds respectively, i.e. same behaviour just slower. I was hoping someone could explain why this is happening (note it doesn't happen on Linux), if it is expected, and/or how it can be worked around or fixed, or where the problem would be (p-states?). Happy to test anything. PS - I tried a 13-BETA2 rescue boot which has HT enabled and it behaves exactly the same.
Do you think that this is the same issue as bug 256594? Also, did you try disabling PBO (Precision Boost Overdrive) as opposed to CPB?
Interesting result, which I could reproduce. But I'd be very surprised if this was connected to bug 256594. I have performed a few tests on -CURRENT, with different numbers of processes running in parallel: $ t () { for i in $(jot ${1:-1}); do dd if=/dev/zero bs=1M count=1000 | bzip2 - | wc > /dev/null & done 2>&1 | grep transferred; wait } $ t 4 1048576000 bytes transferred in 5.601800 secs (187185555 bytes/sec) 1048576000 bytes transferred in 5.649712 secs (185598117 bytes/sec) 1048576000 bytes transferred in 5.695707 secs (184099356 bytes/sec) 1048576000 bytes transferred in 9.145955 secs (114649153 bytes/sec) $ t 4 1048576000 bytes transferred in 5.615530 secs (186727884 bytes/sec) 1048576000 bytes transferred in 6.000599 secs (174745209 bytes/sec) 1048576000 bytes transferred in 8.281161 secs (126621862 bytes/sec) 1048576000 bytes transferred in 8.982560 secs (116734646 bytes/sec) $ t 4 1048576000 bytes transferred in 5.597056 secs (187344204 bytes/sec) 1048576000 bytes transferred in 8.940248 secs (117287131 bytes/sec) 1048576000 bytes transferred in 8.962013 secs (117002282 bytes/sec) 1048576000 bytes transferred in 8.975112 secs (116831521 bytes/sec) There are only two typical throughput value ranges: 175 to 188 MB/s and 115 to 125 MB/s (in powers of 10, not 2). This is roughly a factor of 3/2 ... $ t 16 1048576000 bytes transferred in 7.537053 secs (139122806 bytes/sec) 1048576000 bytes transferred in 7.643938 secs (137177468 bytes/sec) 1048576000 bytes transferred in 7.658221 secs (136921619 bytes/sec) 1048576000 bytes transferred in 7.676633 secs (136593217 bytes/sec) 1048576000 bytes transferred in 7.684927 secs (136445807 bytes/sec) 1048576000 bytes transferred in 7.692365 secs (136313868 bytes/sec) 1048576000 bytes transferred in 7.785566 secs (134682056 bytes/sec) 1048576000 bytes transferred in 7.869853 secs (133239594 bytes/sec) 1048576000 bytes transferred in 7.887814 secs (132936190 bytes/sec) 1048576000 bytes transferred in 7.902913 secs (132682214 bytes/sec) 1048576000 bytes transferred in 7.901557 secs (132704990 bytes/sec) 1048576000 bytes transferred in 7.918014 secs (132429169 bytes/sec) 1048576000 bytes transferred in 7.964384 secs (131658150 bytes/sec) 1048576000 bytes transferred in 7.973078 secs (131514575 bytes/sec) 1048576000 bytes transferred in 7.992037 secs (131202601 bytes/sec) 1048576000 bytes transferred in 8.074766 secs (129858370 bytes/sec) Now all results are between 130 and 140 MB/s. And this outcome is stable over multiple runs. $ t 32 1048576000 bytes transferred in 11.279196 secs (92965495 bytes/sec) 1048576000 bytes transferred in 11.343222 secs (92440755 bytes/sec) 1048576000 bytes transferred in 11.345478 secs (92422376 bytes/sec) 1048576000 bytes transferred in 11.422671 secs (91797797 bytes/sec) 1048576000 bytes transferred in 11.522082 secs (91005777 bytes/sec) 1048576000 bytes transferred in 11.757213 secs (89185763 bytes/sec) 1048576000 bytes transferred in 11.796787 secs (88886578 bytes/sec) 1048576000 bytes transferred in 11.787529 secs (88956389 bytes/sec) 1048576000 bytes transferred in 11.830471 secs (88633499 bytes/sec) 1048576000 bytes transferred in 11.866944 secs (88361080 bytes/sec) 1048576000 bytes transferred in 11.901904 secs (88101537 bytes/sec) 1048576000 bytes transferred in 11.956605 secs (87698475 bytes/sec) 1048576000 bytes transferred in 11.952918 secs (87725524 bytes/sec) 1048576000 bytes transferred in 11.955508 secs (87706519 bytes/sec) 1048576000 bytes transferred in 11.961946 secs (87659316 bytes/sec) 1048576000 bytes transferred in 11.992837 secs (87433521 bytes/sec) 1048576000 bytes transferred in 12.017736 secs (87252376 bytes/sec) 1048576000 bytes transferred in 12.023212 secs (87212632 bytes/sec) 1048576000 bytes transferred in 12.014854 secs (87273302 bytes/sec) 1048576000 bytes transferred in 12.054915 secs (86983277 bytes/sec) 1048576000 bytes transferred in 12.149618 secs (86305266 bytes/sec) 1048576000 bytes transferred in 12.179530 secs (86093302 bytes/sec) 1048576000 bytes transferred in 12.260039 secs (85527952 bytes/sec) 1048576000 bytes transferred in 12.261602 secs (85517046 bytes/sec) 1048576000 bytes transferred in 12.260685 secs (85523445 bytes/sec) 1048576000 bytes transferred in 12.386748 secs (84653048 bytes/sec) 1048576000 bytes transferred in 12.415505 secs (84456972 bytes/sec) 1048576000 bytes transferred in 12.487385 secs (83970822 bytes/sec) 1048576000 bytes transferred in 12.527210 secs (83703871 bytes/sec) 1048576000 bytes transferred in 12.602776 secs (83201986 bytes/sec) 1048576000 bytes transferred in 12.618314 secs (83099532 bytes/sec) 1048576000 bytes transferred in 12.725476 secs (82399749 bytes/sec) Again similar results on multiple runs, always between 80 and 93 MB/s. The big variations exist if not all cores are busy, and this might be due to non-optimal scheduling performed by SCHED_ULE. It would be very interesting to repeat this test with SCHED_4BSD, instead. Funny detail: my CPU is reported to have an upper CPU clock of 4000 MHz, not 3400 MHz (no overclocking, but I had powerd running before, it has been stopped for these measurements). And I'm quite sure that I had once seen C2 statistics in the sysctl output, which are missing, now: $ sysctl dev.cpu.0 dev.cpu.0.temperature: 48,6C dev.cpu.0.cx_method: C1/hlt dev.cpu.0.cx_usage_counters: 175874 dev.cpu.0.cx_usage: 100.00% last 6552us dev.cpu.0.cx_lowest: C8 dev.cpu.0.cx_supported: C1/1/0 dev.cpu.0.freq_levels: 4000/3740 2800/2800 2200/1980 dev.cpu.0.freq: 4000 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 _CID=none dev.cpu.0.%location: handle=\_SB_.PLTF.C000 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU Maybe I need to check the energy efficiency settings in the BIOS, but I thought I had enabled all of them again, after the last BIOS update ...
Update after booting a kernel with SCHED_4BSD: This really appears to be a SCHED_ULE issue, as expected. With SCHED_4BSD the results are quite homogeneous for each run: $ t # repeated runs ... 1048576000 bytes transferred in 6.720101 secs (156035746 bytes/sec) s048576000 bytes transferred in 7.037620 secs (148995835 bytes/sec) 1048576000 bytes transferred in 7.031251 secs (149130790 bytes/sec) 1048576000 bytes transferred in 7.032412 secs (149106159 bytes/sec) 1048576000 bytes transferred in 7.202687 secs (145581229 bytes/sec) 1048576000 bytes transferred in 6.918272 secs (151566177 bytes/sec) 1048576000 bytes transferred in 6.391244 secs (164064463 bytes/sec) 1048576000 bytes transferred in 6.686209 secs (156826683 bytes/sec) 1048576000 bytes transferred in 6.668778 secs (157236600 bytes/sec) 1048576000 bytes transferred in 6.914906 secs (151639959 bytes/sec) 1048576000 bytes transferred in 6.835760 secs (153395678 bytes/sec) 1048576000 bytes transferred in 6.755348 secs (155221619 bytes/sec) $ t 4 1048576000 bytes transferred in 7.574533 secs (138434415 bytes/sec) 1048576000 bytes transferred in 7.776473 secs (134839540 bytes/sec) 1048576000 bytes transferred in 7.839487 secs (133755689 bytes/sec) 1048576000 bytes transferred in 7.856730 secs (133462132 bytes/sec) $ t 4 1048576000 bytes transferred in 7.481412 secs (140157499 bytes/sec) 1048576000 bytes transferred in 7.557077 secs (138754182 bytes/sec) 1048576000 bytes transferred in 7.676920 secs (136588110 bytes/sec) 1048576000 bytes transferred in 8.040430 secs (130412922 bytes/sec) $ t 4 1048576000 bytes transferred in 7.484386 secs (140101810 bytes/sec) 1048576000 bytes transferred in 7.581198 secs (138312698 bytes/sec) 1048576000 bytes transferred in 7.710614 secs (135991248 bytes/sec) 1048576000 bytes transferred in 7.736921 secs (135528856 bytes/sec) $ t 16 1048576000 bytes transferred in 9.879846 secs (106132833 bytes/sec) 1048576000 bytes transferred in 10.087562 secs (103947418 bytes/sec) 1048576000 bytes transferred in 10.232516 secs (102474894 bytes/sec) 1048576000 bytes transferred in 10.237664 secs (102423370 bytes/sec) 1048576000 bytes transferred in 10.320563 secs (101600658 bytes/sec) 1048576000 bytes transferred in 10.375758 secs (101060179 bytes/sec) 1048576000 bytes transferred in 10.443859 secs (100401199 bytes/sec) 1048576000 bytes transferred in 10.481844 secs (100037355 bytes/sec) 1048576000 bytes transferred in 10.494019 secs (99921294 bytes/sec) 1048576000 bytes transferred in 10.510178 secs (99767674 bytes/sec) 1048576000 bytes transferred in 10.559435 secs (99302286 bytes/sec) 1048576000 bytes transferred in 10.638978 secs (98559840 bytes/sec) 1048576000 bytes transferred in 10.678294 secs (98196963 bytes/sec) 1048576000 bytes transferred in 10.808183 secs (97016860 bytes/sec) 1048576000 bytes transferred in 11.084706 secs (94596647 bytes/sec) 1048576000 bytes transferred in 11.231343 secs (93361587 bytes/sec) Seems that my BIOS settings were not optimal during the tests documented in Comment 2. After loading "Optimal Default" settings and re-activation of the energy efficiency options I do get the following sysctl output now: $ sysctl dev.cpu.0 dev.cpu.0.temperature: 29,6C dev.cpu.0.cx_method: C1/hlt C2/io dev.cpu.0.cx_usage_counters: 180 70723 dev.cpu.0.cx_usage: 0.25% 99.74% last 9261us dev.cpu.0.cx_lowest: C8 dev.cpu.0.cx_supported: C1/1/1 C2/2/18 dev.cpu.0.freq_levels: 3400/3740 2800/2800 2200/1980 dev.cpu.0.freq: 3400 dev.cpu.0.%parent: acpi0 dev.cpu.0.%pnpinfo: _HID=ACPI0007 _UID=0 _CID=none dev.cpu.0.%location: handle=\_SB_.PLTF.C000 dev.cpu.0.%driver: cpu dev.cpu.0.%desc: ACPI CPU Anyway, after another reboot back to SCHED_ULE and repeating the tests with new BIOS settings I see: $ t 1 # multiple runs again ... 1048576000 bytes transferred in 7.640816 secs (137233502 bytes/sec) 1048576000 bytes transferred in 6.225996 secs (168418995 bytes/sec) 1048576000 bytes transferred in 4.852763 secs (216078118 bytes/sec) 1048576000 bytes transferred in 4.832574 secs (216980866 bytes/sec) 1048576000 bytes transferred in 4.819031 secs (217590617 bytes/sec) $ t 4 1048576000 bytes transferred in 4.956440 secs (211558309 bytes/sec) 1048576000 bytes transferred in 7.634614 secs (137344998 bytes/sec) 1048576000 bytes transferred in 7.788965 secs (134623280 bytes/sec) 1048576000 bytes transferred in 7.819442 secs (134098574 bytes/sec) $ t 4 1048576000 bytes transferred in 4.853663 secs (216038083 bytes/sec) 1048576000 bytes transferred in 4.857143 secs (215883287 bytes/sec) 1048576000 bytes transferred in 7.721846 secs (135793430 bytes/sec) 1048576000 bytes transferred in 7.792580 secs (134560821 bytes/sec) $ t 4 1048576000 bytes transferred in 4.908570 secs (213621479 bytes/sec) 1048576000 bytes transferred in 7.742029 secs (135439433 bytes/sec) 1048576000 bytes transferred in 7.784341 secs (134703245 bytes/sec) 1048576000 bytes transferred in 7.794096 secs (134534649 bytes/sec) Very similar to the results in the initial bug report ... And it really shows that SCHED_ULE causes the incoherent performance. But the throughput with SCHED_ULE is a lot higher than with SCHED_4BSD, probably due to the immensely higher system overhead of the latter with a large number of cores. While the CPU% with SCHED_ULE is in the order of 2%, with SCHED_4BSD I have seen values up to 60% with 32 parallel tasks. Another observation: WCPU of the bzip2 processes on the kernel with SCHED_4BSD was displayed as way beyond 100% (in the order of 300% in one run I remember). Since bzip2 is not multi-threaded (AFAIK) this seems to be a wrong measurement. The CPU% for the bzip2 processes is always near 100% with SCHED_ULE.
(In reply to Stefan Eßer from comment #3) Thank you for the tests! Another observation is that if your t function is modified to pin bzip2 processes to CPUs (e.g. cpuset -l $((2 * i)) bzip2 ...), then the results are much more consistent. In fact, they match the best results from the original test: $ ~/test.sh 4 1048576000 bytes transferred in 4.903489 secs (213842854 bytes/sec) 1048576000 bytes transferred in 4.904297 secs (213807597 bytes/sec) 1048576000 bytes transferred in 4.908786 secs (213612071 bytes/sec) 1048576000 bytes transferred in 4.915402 secs (213324567 bytes/sec)
And a possible "eureka moment" that's consistent with the original description: if I choose odd logical CPUs, (cpuset -l $((2 * i + 1)), then I consistently get the worst results: $ ~/test.sh 4 1048576000 bytes transferred in 7.800822 secs (134418651 bytes/sec) 1048576000 bytes transferred in 7.803095 secs (134379498 bytes/sec) 1048576000 bytes transferred in 7.809778 secs (134264513 bytes/sec) 1048576000 bytes transferred in 7.810560 secs (134251064 bytes/sec) If this is indeed what it is, then several conclusions can be drawn: - hardware threads within a core are not born equal on this hardware - "primary" threads should be preferred - ULE does not do that
Is this still a problem on supported releases? A few years ago I fixed a few scheduler race conditions, mostly in ULE, that caused cores to go idle even with pending work. I tried running this on a 16-core AMD 7950X3D with hyperthreading enabled, and don't see much variance: git (main) markj@xinde> for i in $(seq 1 10); do t; done 1048576000 bytes transferred in 4.312122 secs (243169371 bytes/sec) 1048576000 bytes transferred in 4.062625 secs (258103089 bytes/sec) 1048576000 bytes transferred in 4.081073 secs (256936378 bytes/sec) 1048576000 bytes transferred in 4.017203 secs (261021443 bytes/sec) 1048576000 bytes transferred in 4.118516 secs (254600457 bytes/sec) 1048576000 bytes transferred in 4.063185 secs (258067483 bytes/sec) 1048576000 bytes transferred in 4.067748 secs (257778022 bytes/sec) 1048576000 bytes transferred in 4.484794 secs (233806939 bytes/sec) 1048576000 bytes transferred in 4.207453 secs (249218714 bytes/sec) 1048576000 bytes transferred in 3.954746 secs (265143670 bytes/sec)