| Summary: | Delays in TCP connection over Gigabit Ethernet connections; Regression from 6.3-RELEASE | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Aleksander Derevianko <aeder> | ||||||||
| Component: | kern | Assignee: | freebsd-net (Nobody) <net> | ||||||||
| Status: | Closed Works As Intended | ||||||||||
| Severity: | Affects Some People | CC: | amvandemore, cem, eugen | ||||||||
| Priority: | --- | Keywords: | regression | ||||||||
| Version: | 10.3-RELEASE | ||||||||||
| Hardware: | i386 | ||||||||||
| OS: | Any | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Aleksander Derevianko
2018-01-29 14:36:08 UTC
After re-reading:
> nanosleep() return with random error up to 4%.
For clarification, I mean, it oversleep for up to ~4% of 0.6 seconds.
Did you try multiqueue? Please attach: 1) /var/run/dmesg.boot of verbose boot for Moxa DA-820; 2) output of "sysctl kern.timecounter" command; 3) output of "sysctl kern.eventtimer" command; 4) output of "sysctl kern.hz" command. If your new system runs with kern.eventtimer.periodic=0 by default, try to change this setting to 1. And why do use version 10.3 that has "Expected EoL" in April 30, 2018? And 10.4 in October 31, 2018, as whole line of 10.x versions. Have you considered trying 11.1-RELEASE? It has many more changes to get fixes, if found necessary. What syscalls are you measuring delay over? It seems that the column with delay is different between your two sets of samples -- does this mean anything? Thanks. Additional info: I have tryed the following configuration: Xen (CentOS 7 in domain 0) running over Moxa hardware, under Xen - FreeBSD 10.3-RELEASE i386. xn0 configured as bridge. Between two copies of FreeBSD-10.3 running under Xen on different host computers - this test application produce much better result:
grep times fspa2_fspa1_virt.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}'
| sort | uniq -c
9863 send_sync 0 0 0 0
38565 send_sync 0 0 0 1
2 send_sync 0 0 0 10
9 send_sync 0 0 0 11
16 send_sync 0 0 0 12
10 send_sync 0 0 0 13
1 send_sync 0 0 0 17
301599 send_sync 0 0 0 2
1 send_sync 0 0 0 233
698526 send_sync 0 0 0 3
5195 send_sync 0 0 0 4
2770 send_sync 0 0 0 5
1361 send_sync 0 0 0 6
14 send_sync 0 0 0 7
5 send_sync 0 0 0 8
4 send_sync 0 0 0 9
1 send_sync 0 0 1 2
1 send_sync 0 0 2 0
4 send_sync 0 0 2 1
51 send_sync 0 0 2 2
79 send_sync 0 0 2 3
6545 send_sync 0 1 0 0
21849 send_sync 0 1 0 1
1 send_sync 0 1 0 10
1 send_sync 0 1 0 11
1 send_sync 0 1 0 12
39671 send_sync 0 1 0 2
244 send_sync 0 1 0 3
23 send_sync 0 1 0 4
8 send_sync 0 1 0 5
1 send_sync 0 1 0 6
1 send_sync 0 1 2 0
1 send_sync 0 14 0 2
17499 send_sync 0 2 0 0
58073 send_sync 0 2 0 1
1 send_sync 0 2 0 10
3 send_sync 0 2 0 11
2 send_sync 0 2 0 12
115462 send_sync 0 2 0 2
572 send_sync 0 2 0 3
7 send_sync 0 2 0 4
4 send_sync 0 2 0 5
1 send_sync 0 2 0 7
2 send_sync 0 2 0 9
7959 send_sync 0 3 0 0
20669 send_sync 0 3 0 1
1 send_sync 0 3 0 11
1 send_sync 0 3 0 12
49220 send_sync 0 3 0 2
267 send_sync 0 3 0 3
3 send_sync 0 3 0 4
2 send_sync 0 3 0 5
1 send_sync 0 3 1 0
23 send_sync 0 4 0 0
35 send_sync 0 4 0 1
54 send_sync 0 4 0 2
2 send_sync 0 4 0 3
1 send_sync 0 49 0 1
1 send_sync 0 5 0 1
1 send_sync 0 5 0 2
1 send_sync 0 6 0 2
1 send_sync 0 7 0 0
1 send_sync 0 7 0 3
1 send_sync 2 0 0 0
2 send_sync 2 0 0 1
16 send_sync 2 0 0 2
41 send_sync 2 0 0 3
1 send_sync 2 1 0 1
1 send_sync 2 1 0 2
From 1.396.680 samples only 52 >= 10ms, and only 2 >= 20 ms.
From my POV, it means that problem is somewhere in OS<->hardware layer, not in OS itself.
(In reply to amvandemore from comment #2)
No, I have not tried multiqueue - first, it give advantage only under high load, and 40 Kbytes * 3 = 120 Kbytes/sec over 1Gibit line is very low load. Seconds, Under 6.3-RELEASE it works fine without multiqueue.
(In reply to Eugene Grosbein from comment #3)
dmesg will be added as attachment.
root@fspa1:~/clock/new_res # sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) i8254(0) HPET(950) dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.alloweddeviation: 0
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.tc.TSC-low.frequency: 1247191596
kern.timecounter.tc.TSC-low.counter: 233727093
kern.timecounter.tc.TSC-low.mask: 4294967295
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 10612645
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 14761
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 3051462975
kern.timecounter.tc.HPET.mask: 4294967295
root@fspa1:~/clock/new_res # sysctl kern.eventtimer
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
kern.eventtimer.choice: LAPIC(600) HPET(550) HPET1(440) HPET2(440) HPET3(440) HP
ET4(440) i8254(100) RTC(0)
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.HPET4.quality: 440
kern.eventtimer.et.HPET4.frequency: 14318180
kern.eventtimer.et.HPET4.flags: 3
kern.eventtimer.et.HPET3.quality: 440
kern.eventtimer.et.HPET3.frequency: 14318180
kern.eventtimer.et.HPET3.flags: 3
kern.eventtimer.et.HPET2.quality: 440
kern.eventtimer.et.HPET2.frequency: 14318180
kern.eventtimer.et.HPET2.flags: 3
kern.eventtimer.et.HPET1.quality: 440
kern.eventtimer.et.HPET1.frequency: 14318180
kern.eventtimer.et.HPET1.flags: 3
kern.eventtimer.et.HPET.quality: 550
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.flags: 7
kern.eventtimer.et.LAPIC.quality: 600
kern.eventtimer.et.LAPIC.frequency: 49887687
kern.eventtimer.et.LAPIC.flags: 7
root@fspa1:~/clock/new_res # sysctl kern.hz
kern.hz: 1000
root@fspa1:~/clock/new_res # sysctl kern.eventtimer.periodic
kern.eventtimer.periodic: 0
I will try and report results here.
> And why do use version 10.3 that has "Expected EoL" in April 30, 2018? And
> 10.4 in October 31, 2018, as whole line of 10.x versions.
> Have you considered trying 11.1-RELEASE? It has many more changes to get
> fixes, if found necessary.
We have very long verification/release cycle because this software will be used in critical infrastructure (railway automation). Latest security fixes is not actually important, because it will be used in very closed environment.
I will try to update to 10.4 and 11.1 and check if it helps.
(In reply to Conrad Meyer from comment #4)
> What syscalls are you measuring delay over? It seems that the column with
> delay is different between your two sets of samples -- does this mean
> anything? Thanks.
clock_gettime( CLOCK_MONOTONIC, ...);
See attached source code.
Actually, delay under 6.3-RELEASE is lower ( and evaluate time is higher ) because 6.3-RELEASE is running on different hardware. Test load (Fibonacci calculation) is executed faster under modern hardware.
I will try to install 6.3-RELEASE on Moxa hardware, repeat test and report result. Unfortunelly, I need to have test running for at least 6 hours before I get reliable results.
Created attachment 190196 [details]
dmesg from verbose boot of Moxa DA-820
I have now running tests on Moxa DA-820 with kern.eventtimer.periodic=1.
One interesting thing: if you look on tests with 6.3-RELEASE, 98% of samples is
1115322 send_sync 0 0 0 0
= send(small data) recv(small data) send(40Kb) recv(40Kb)
- so every send() and recv() call is executed in <1ms.
7% of samples have
73629 send_sync 0 1 0 0
- so first recv() have 1ms execution time, but it's OK because it is first recv() after nanosleep(~200ms) - and we can't expect that nanosleep() will act exactly the same on different computers. Moreover, nanosleep() period is evaluated from two local calls to clock_gettime( CLOCK_MONOTONIC, ...) and subtraction operation.
But, for 10.3-RELEASE we have approximatelly even division between:
155042 send_sync 0 0 0 0
122890 send_sync 0 0 0 1
case with delay on first recv() - less then 0.02%
147 send_sync 0 1 0 0
This last 1 is for second recv() - and in this case, we have already syncronised computers over first small recv().
P.S. Test with kern.eventtimer.periodic=1 produce the following results for short test time:
root@fspa2:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
1484 send_sync 0 0 0 0
1314 send_sync 0 0 0 1
1 send_sync 0 0 0 230
4 send_sync 0 1 0 0
root@fspa1:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
1698 send_sync 0 0 0 0
1134 send_sync 0 0 0 1
1 send_sync 0 0 0 229
11 send_sync 0 1 0 0
Very quick very huge delay.
One moment:
fspa1:
times 2550: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2551: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2552: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2553: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
times 2555: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2556: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2557: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2558: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2559: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
fspb:
times 2550: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2551: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2552: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2553: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 2555: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2556: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2557: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2558: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2559: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
The problem arise on the SAME cycle in both computers!
How it is possible? Seems like in one of computers both send and receive was blocked (buffered) on some level.
You may want to do "sysctl kern.timecounter.fast_gettime=0" and re-do your tests because 6.3 did not have fast_gettime mode. Also, you may want to decrease influence of kernel randomness harvester in case it uses network as a source for your system. Please show output of "sysctl kern.random". Also, if you use em0 network interface, please attach output of "sysctl dev.em.0" You may want to disable NIC flow control to eliminate influence of possible "pause" frames. (In reply to Eugene Grosbein from comment #8) Will try with sysctl kern.timecounter.fast_gettime=0 and report here. (In reply to Eugene Grosbein from comment #9) root@fspa2:~ # sysctl kern.random kern.random.sys.harvest.swi: 0 kern.random.sys.harvest.interrupt: 0 kern.random.sys.harvest.point_to_point: 0 kern.random.sys.harvest.ethernet: 0 kern.random.sys.seeded: 1 kern.random.yarrow.slowoverthresh: 2 kern.random.yarrow.slowthresh: 128 kern.random.yarrow.fastthresh: 96 kern.random.yarrow.bins: 10 kern.random.yarrow.gengateinterval: 10 kern.random.live_entropy_sources: Hardware, Intel Secure Key RNG kern.random.active_adaptor: yarrow kern.random.adaptors: dummy,yarrow I have already think about it, and have it in /etc/sysctrl.conf kern.random.sys.harvest.swi=0 kern.random.sys.harvest.interrupt=0 kern.random.sys.harvest.point_to_point=0 kern.random.sys.harvest.ethernet=0 (In reply to Eugene Grosbein from comment #10) Currently I have try it with Realtec card (additional). See attached sysctl.txt Created attachment 190201 [details]
Full sysctl -a output from Moxa DA-820
Results of tests with kern.eventtimer.periodic=0 seems very interesting:
A1 computer:
grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
18989 send_sync 0 0 0 0
1557 send_sync 0 0 0 1
3 send_sync 0 0 0 229
352 send_sync 0 1 0 0
A2 computer
root@fspa2:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
11498 send_sync 0 0 0 0
9621 send_sync 0 0 0 1
3 send_sync 0 0 0 2
4 send_sync 0 0 0 230
20 send_sync 0 1 0 0
root@fspa1:~/clock/new_res # grep times periodic.txt | grep recv_data\ 229
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
times 7278: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
times 10424: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
root@fspa1:~/clock/new_res #
times 18296: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
root@fspa2:~/clock/new_res # grep times periodic.txt| grep recv_data\ 230
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 7278: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 10424: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 18296: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
root@fspa2:~/clock/new_res #
As you can see, in 3 cases delay was catched on both sides, and in one case - only in A2 computer.
Moreover, it seems like A1 computer have 92% of all zeroes:
18989 send_sync 0 0 0 0
1557 send_sync 0 0 0 1
At the same time, A2 computer have only 54% of all zeroes:
11498 send_sync 0 0 0 0
9621 send_sync 0 0 0 1
I have found the difference in config:
A1
root@fspa1:~/clock/new_res # cat /boot/loader.conf
boot_verbose="YES"
coretemp_load="YES"
autoboot_delay="-1"
loader_logo="fbsdbw"
kern.ipc.semmns=350
hw.em.txd="4096"
hw.em.rxd="4096"
hw.em.rx_process_limit="-1"
hw.em.enable_msix="0"
A2
root@fspa2:~/clock/new_res # cat /boot/loader.conf
console="comconsole,vidconsole"
coretemp_load="YES"
autoboot_delay="-1"
loader_logo="fbsdbw"
kern.ipc.semmns="350"
hw.em.txd="4096"
hw.em.rxd="4096"
hw.em.rx_process_limit="-1"
hw.em.enable_msix="0"
Can boot_verbose="YES" or console="comconsole,vidconsole" make such difference?
(In reply to Eugene Grosbein from comment #10) Regarding flow control: Who will send "pause" frames, if I have only two computers (A1 and A2) connected to this switch, and nothing more? And load is pretty low, like 40K*3 per seconds in both directions. (In reply to Aleksander Derevianko from comment #14) I would not be so trustful to correctness of flow control implementation in both of NIC firmware and switch. Low volume of traffic means you can turn flow control off without a problem. (In reply to Aleksander Derevianko from comment #13) > Can boot_verbose="YES" or console="comconsole,vidconsole" make such difference? It can if your system writes something to console during the test. (In reply to Aleksander Derevianko from comment #13) Why do you disable MSI-X support for em interfaces? (In reply to Eugene Grosbein from comment #16) No, it doesn't write anything to console. Output from the test itself is redirected to file on disk (SSD 100Gb INTEL SSDSA2BZ100G3 6PB10362). Results was checked from remote ssh session. (In reply to Eugene Grosbein from comment #17) I have switched MSI-X support off in one of the previous attempts to solve this issue. Actually, it doesn't change anything at all in my test case. ------------------- OK, current results from still-running test: Computer A1: root@fspa1:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c 212603 send_sync 0 0 0 0 6437 send_sync 0 0 0 1 10 send_sync 0 0 0 229 4177 send_sync 0 1 0 0 1 send_sync 0 1 0 1 Computer A2: root@fspa2:~/clock/new_res # !459 grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c 122102 send_sync 0 0 0 0 100857 send_sync 0 0 0 1 47 send_sync 0 0 0 2 12 send_sync 0 0 0 230 2 send_sync 0 0 0 231 170 send_sync 0 1 0 0 I have set kern.timecounter.fast_gettime=0 yesteday, on both A1 and A2, and for last 12 hours 6 more cases of long delays appear on A1, and 8 on A2 computer. I will try to repeat this test with console="comconsole,vidconsole" removed both on A1 and A2 computers. With console="comconsole,vidconsole" removed both on A1 and A2 computers:
root@fspa1:~/clock/new_res # grep times periodic_1.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
5116 send_sync 0 0 0 0
2234 send_sync 0 0 0 1
1 send_sync 0 0 0 229
64 send_sync 0 1 0 0
root@fspa2:~/clock/new_res # grep times periodic_1.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
4000 send_sync 0 0 0 0
3235 send_sync 0 0 0 1
1 send_sync 0 0 0 230
6 send_sync 0 1 0 0
Again, on the same cycle.
----------------------------------------------------
With boot_verbose="YES"
root@fspa1:~/clock/new_res # grep times periodic_2.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
6539 send_sync 0 0 0 0
794 send_sync 0 0 0 1
2 send_sync 0 0 0 229
125 send_sync 0 1 0 0
root@fspa2:~/clock/new_res # grep times periodic_2.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
4210 send_sync 0 0 0 0
3249 send_sync 0 0 0 1
2 send_sync 0 0 0 230
6 send_sync 0 1 0 0
Now with em0 devices:
A1 computer:
root@fspa1:~/clock/new_res # grep times periodic_em0.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
11256 send_sync 0 0 0 0
10968 send_sync 0 0 0 1
234 send_sync 0 1 0 0
1 send_sync 0 1 0 1
A2 computer:
root@fspa2:~/clock/new_res # grep times periodic_em0.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
18284 send_sync 0 0 0 0
2490 send_sync 0 0 0 1
1704 send_sync 0 1 0 0
Current results seems just fine. I think, rare delays for 230ms in previous tests is actually Realtec card problems. It's cheap TP-Link TG-3468 device, so if it will continue to work fine with build-in em0 I will consider problem solved.
Actual solutions seems to be
kern.eventtimer.periodic=1
I will try to set only
kern.eventtimer.periodic=1
and
kern.timecounter.alloweddeviation=0
parameters, leaving everything else to system defaults.
OK, problem solved. 12-hours test doesn't show any delays longer then 1 ms. For the future reference, only following parameters must be set in /etc/sysctl.conf: -------------------------------------------------- # Exact clock for nanosleep() kern.timecounter.alloweddeviation=0 # Disable random delays in network/adapter code kern.eventtimer.periodic=1 -------------------------------------------------- All other parameters (boot.loader) can be left as system defaults. Closing defect with "Works As Intended" because it's possible to solve just in OS configuration. Thanks everybody for good support and good advices! |