Bug 225535 - Delays in TCP connection over Gigabit Ethernet connections; Regression from 6.3-RELEASE
Summary: Delays in TCP connection over Gigabit Ethernet connections; Regression from 6...
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: i386 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2018-01-29 14:36 UTC by Aleksander Derevianko
Modified: 2018-02-01 09:04 UTC (History)
3 users (show)

See Also:


Attachments
cycle_clock.c (9.64 KB, text/plain)
2018-01-29 14:36 UTC, Aleksander Derevianko
no flags Details
dmesg from verbose boot of Moxa DA-820 (46.22 KB, text/plain)
2018-01-30 12:33 UTC, Aleksander Derevianko
no flags Details
Full sysctl -a output from Moxa DA-820 (94.31 KB, text/plain)
2018-01-30 14:27 UTC, Aleksander Derevianko
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksander Derevianko 2018-01-29 14:36:08 UTC
Created attachment 190162 [details]
cycle_clock.c

Delays in TCP connection over Gigabit Ethernet connections; Regression from 6.3-RELEASE.

Long description of usage case:

The company I'm working in is developing soft-realtime applications, with internal cycle like 600 ms (0.6) seconds. The core of applications is working (for functions safety) simulatenously on two computers with different architecture (A - FreeBSD/Intel, B - Linux/PowerPC). 
A and B computers connected together via copper Gigabit switch, with no other devices connected to the same network.

Normal cycle for application looks like this:

1. Process input from object controllers;
2. Cross-compare input between A and B computers
3. Make evaluations (simulate complex automata).
4. Produce output to send to objects controllers.
5. Cross-compare output between A and B.
6. Send output to object controllers.
7. Sleep until the end-of-cycle. 
8. Go to step 1

Cross-compare part is done using tcp connection over gigabit ethernet, between A and B computers.
If A or B do not able to handle all operations in appropriate time (600 ms +/- 150 ms) the whole system halts due to internal time checks.

Yes, may be using UDP packets may be better - but even in this configuration, last released version of hardware is working just fine - uptime ~ 1 year per installation, approximatelly 100 installations worldwide. Moreover, no cases of halts due to internal time checks was found - some other defects, software or hardware, was causing rare halts.

So, the old release use A industrial computer,

CPU: Intel(R) Core(TM)2 Duo CPU     L7400  @ 1.50GHz (1500.12-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6fb  Stepping = 11
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=0x20100000<NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 2
....
em0: <Intel(R) PRO/1000 Network Connection Version - 6.7.2> port 0xdf00-0xdf1f mem 0xff880000-0xff89ffff,0xff860000-0xff87ffff irq 16 at device 0.0 on pci4
em0: Ethernet address: 00:30:64:09:6f:ee
....

FreeBSD fspa2 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 04:45:45 UTC 2008     root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP  i386

and this configuration works just fine. 
I have made test application simulating main cycle of the complex system, and maximum delay is 24 ms for the 1 million cycles.

Test application (running on two different computers) is working like this:

1. Establish tcp connection (one instance listen, another connect).

2. send() small fixed amount of data from both sides
3. recv() small fixed data.
4. send() 40 K of data from both sides
5. recv() 40 K of data on both sides
6. Perform complex, but useless computation (simulate actual application load).
7. Calculate how much to nanosleep() until end-of-cycle, call nanosleep().

go on step 2

------------------
every operation is guarded with clock_gettime() and duration of send(), recv(), evaluate() and nanosleep() is printed out.

I will attach test application sources to this ticket.

For FreeBSD 6.3-RELEASE, output looks like this - running between two identical computers with FreeBSD-6.3 installed (only 2-4 columns is shown):

fspa2# grep times fspa2_fspa2_vpu.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10}' | sort | uniq -c
1115322 send_sync 0  0 0 0
7425    send_sync 0  0 0 1
73629   send_sync 0  1 0 0
   1    send_sync 0 13 0 0
  66    send_sync 0  2 0 0
   1    send_sync 0 24 0 0
  27    send_sync 0  3 0 0
  17    send_sync 0  4 0 0

As you can see, the maximum delay is 24 milliseconds (0.024 s) and it's happends only once.

So now the real problem: using FreeBSD 10.3-RELEASE and much more powerfull hardware (Moxa DA-820 industrial computer):

CPU: Intel(R) Core(TM) i7-3555LE CPU @ 2.50GHz (2494.39-MHz 686-class CPU)
  Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x28100000<NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics

em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0x6600-0x661f mem 0xc0a00000-0xc0a1ffff,0xc0a27000-0xc0a27fff irq 20 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: 00:90:e8:69:ea:3c


root@fspa2:~/clock/new_res # uname -a
FreeBSD fspa2 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 03:51:29 UTC 2016     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  i386

and the same test application we get (running between two identical devices, both Moxa DA-820 with FreeBSD-10.3):

root@fspa2:~/clock/new_res # grep times  direct_1.txt  | awk '{print $3 " " $4 "
 " $6 " " $8 " " $10 ;}' | sort | uniq -c
155042 send_sync 0 0 0 0
122890 send_sync 0 0 0 1
   1 send_sync 0 0 0 100
   1 send_sync 0 0 0 102
   1 send_sync 0 0 0 111
   1 send_sync 0 0 0 12
   1 send_sync 0 0 0 125
   2 send_sync 0 0 0 13
   1 send_sync 0 0 0 130
   1 send_sync 0 0 0 131
   1 send_sync 0 0 0 133
   2 send_sync 0 0 0 136
   1 send_sync 0 0 0 148
   2 send_sync 0 0 0 149
   3 send_sync 0 0 0 150
   1 send_sync 0 0 0 156
   1 send_sync 0 0 0 159
   1 send_sync 0 0 0 16
   1 send_sync 0 0 0 161
   1 send_sync 0 0 0 17
   1 send_sync 0 0 0 176
   1 send_sync 0 0 0 19
  18 send_sync 0 0 0 2
   1 send_sync 0 0 0 229
   1 send_sync 0 0 0 23
   1 send_sync 0 0 0 24
   2 send_sync 0 0 0 25
   1 send_sync 0 0 0 26
   1 send_sync 0 0 0 28
   1 send_sync 0 0 0 282
   1 send_sync 0 0 0 29
  14 send_sync 0 0 0 3
   1 send_sync 0 0 0 30
   1 send_sync 0 0 0 31
   1 send_sync 0 0 0 32
   1 send_sync 0 0 0 37
   1 send_sync 0 0 0 38
  14 send_sync 0 0 0 4
   1 send_sync 0 0 0 40
   1 send_sync 0 0 0 41
   1 send_sync 0 0 0 43
   1 send_sync 0 0 0 45
   2 send_sync 0 0 0 46
   4 send_sync 0 0 0 49
  16 send_sync 0 0 0 5
   2 send_sync 0 0 0 53
   2 send_sync 0 0 0 57
   4 send_sync 0 0 0 58
  14 send_sync 0 0 0 59
  17 send_sync 0 0 0 6
  20 send_sync 0 0 0 60
  16 send_sync 0 0 0 61
   8 send_sync 0 0 0 62
   4 send_sync 0 0 0 63
   1 send_sync 0 0 0 64
   1 send_sync 0 0 0 67
   1 send_sync 0 0 0 68
   9 send_sync 0 0 0 7
   1 send_sync 0 0 0 70
   1 send_sync 0 0 0 72
   1 send_sync 0 0 0 79
   5 send_sync 0 0 0 8
   3 send_sync 0 0 0 80
   1 send_sync 0 0 0 81
   1 send_sync 0 0 0 82
   1 send_sync 0 0 0 84
   1 send_sync 0 0 0 89
   3 send_sync 0 0 0 9
   1 send_sync 0 0 0 90
   1 send_sync 0 0 0 93
   1 send_sync 0 0 0 95
   1 send_sync 0 0 0 97
 147 send_sync 0 1 0 0
   1 send_sync 0 33 0 0

As you can see, with only 300.000 cycles (1.100.000 cycles in old configuration) we have multiple cases of long and very long delays, including delay for 229 ms.

The only difference from standart OS configuration is 

sysctl kern.timecounter.alloweddeviation=0

because without it, nanosleep() return with random error up to 4%.

============================================
I have tryed everything that I can think of:

1. Tweaking em0 syscontrols - do not help. In polling mode, works much worse.
2. Replace intel em0 devices with Realtec re0 device - slightly better, but stil produce significant delays.
3. Tweaking kernel syscontrols - disabling new rfc compatibility - do not help.
4. Replace cables and switch to different model just from the box - do not help.


If anybody have any ideas how to fix it, please comment it here.
Comment 1 Aleksander Derevianko 2018-01-29 14:40:12 UTC
After re-reading:

> nanosleep() return with random error up to 4%.

For clarification, I mean, it oversleep for up to ~4% of 0.6 seconds.
Comment 2 amvandemore 2018-01-29 15:57:13 UTC
Did you try multiqueue?
Comment 3 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-29 17:27:35 UTC
Please attach:

1) /var/run/dmesg.boot of verbose boot for Moxa DA-820;
2) output of "sysctl kern.timecounter" command;
3) output of "sysctl kern.eventtimer" command;
4) output of "sysctl kern.hz" command.

If your new system runs with kern.eventtimer.periodic=0 by default, try to change this setting to 1.

And why do use version 10.3 that has "Expected EoL" in April 30, 2018? And 10.4 in October 31, 2018, as whole line of 10.x versions.

Have you considered trying 11.1-RELEASE? It has many more changes to get fixes, if found necessary.
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2018-01-29 17:35:40 UTC
What syscalls are you measuring delay over?  It seems that the column with delay is different between your two sets of samples -- does this mean anything?  Thanks.
Comment 5 Aleksander Derevianko 2018-01-30 12:25:41 UTC
Additional info: I have tryed the following configuration: Xen (CentOS 7 in domain 0) running over Moxa hardware, under Xen - FreeBSD 10.3-RELEASE i386. xn0 configured as bridge. Between two copies of FreeBSD-10.3 running under Xen on different host computers - this test application produce much better result:

grep times fspa2_fspa1_virt.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}'
 | sort | uniq -c
9863 send_sync 0 0 0 0
38565 send_sync 0 0 0 1
   2 send_sync 0 0 0 10
   9 send_sync 0 0 0 11
  16 send_sync 0 0 0 12
  10 send_sync 0 0 0 13
   1 send_sync 0 0 0 17
301599 send_sync 0 0 0 2
   1 send_sync 0 0 0 233
698526 send_sync 0 0 0 3
5195 send_sync 0 0 0 4
2770 send_sync 0 0 0 5
1361 send_sync 0 0 0 6
  14 send_sync 0 0 0 7
   5 send_sync 0 0 0 8
   4 send_sync 0 0 0 9
   1 send_sync 0 0 1 2
   1 send_sync 0 0 2 0
   4 send_sync 0 0 2 1
  51 send_sync 0 0 2 2
  79 send_sync 0 0 2 3
6545 send_sync 0 1 0 0
21849 send_sync 0 1 0 1
   1 send_sync 0 1 0 10
   1 send_sync 0 1 0 11
   1 send_sync 0 1 0 12
39671 send_sync 0 1 0 2
 244 send_sync 0 1 0 3
  23 send_sync 0 1 0 4
   8 send_sync 0 1 0 5
   1 send_sync 0 1 0 6
   1 send_sync 0 1 2 0
   1 send_sync 0 14 0 2
17499 send_sync 0 2 0 0
58073 send_sync 0 2 0 1
   1 send_sync 0 2 0 10
   3 send_sync 0 2 0 11
   2 send_sync 0 2 0 12
115462 send_sync 0 2 0 2
 572 send_sync 0 2 0 3
   7 send_sync 0 2 0 4
   4 send_sync 0 2 0 5
   1 send_sync 0 2 0 7
   2 send_sync 0 2 0 9
7959 send_sync 0 3 0 0
20669 send_sync 0 3 0 1
   1 send_sync 0 3 0 11
   1 send_sync 0 3 0 12
49220 send_sync 0 3 0 2
 267 send_sync 0 3 0 3
   3 send_sync 0 3 0 4
   2 send_sync 0 3 0 5
   1 send_sync 0 3 1 0
  23 send_sync 0 4 0 0
  35 send_sync 0 4 0 1
  54 send_sync 0 4 0 2
   2 send_sync 0 4 0 3
   1 send_sync 0 49 0 1
   1 send_sync 0 5 0 1
   1 send_sync 0 5 0 2
   1 send_sync 0 6 0 2
   1 send_sync 0 7 0 0
   1 send_sync 0 7 0 3
   1 send_sync 2 0 0 0
   2 send_sync 2 0 0 1
  16 send_sync 2 0 0 2
  41 send_sync 2 0 0 3
   1 send_sync 2 1 0 1
   1 send_sync 2 1 0 2

From 1.396.680 samples only 52 >= 10ms, and only 2 >= 20 ms. 
From my POV, it means that problem is somewhere in OS<->hardware layer, not in OS itself.



(In reply to amvandemore from comment #2)

No, I have not tried multiqueue - first, it give advantage only under high load, and 40 Kbytes * 3 = 120 Kbytes/sec over 1Gibit line is very low load. Seconds, Under 6.3-RELEASE it works fine without multiqueue.

(In reply to Eugene Grosbein from comment #3)
dmesg will be added as attachment.

root@fspa1:~/clock/new_res # sysctl kern.timecounter
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: TSC-low(1000) ACPI-fast(900) i8254(0) HPET(950) dummy(-1000000)
kern.timecounter.hardware: TSC-low
kern.timecounter.alloweddeviation: 0
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.TSC-low.quality: 1000
kern.timecounter.tc.TSC-low.frequency: 1247191596
kern.timecounter.tc.TSC-low.counter: 233727093
kern.timecounter.tc.TSC-low.mask: 4294967295
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 10612645
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 14761
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 3051462975
kern.timecounter.tc.HPET.mask: 4294967295


root@fspa1:~/clock/new_res # sysctl kern.eventtimer
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
kern.eventtimer.choice: LAPIC(600) HPET(550) HPET1(440) HPET2(440) HPET3(440) HP
ET4(440) i8254(100) RTC(0)
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.HPET4.quality: 440
kern.eventtimer.et.HPET4.frequency: 14318180
kern.eventtimer.et.HPET4.flags: 3
kern.eventtimer.et.HPET3.quality: 440
kern.eventtimer.et.HPET3.frequency: 14318180
kern.eventtimer.et.HPET3.flags: 3
kern.eventtimer.et.HPET2.quality: 440
kern.eventtimer.et.HPET2.frequency: 14318180
kern.eventtimer.et.HPET2.flags: 3
kern.eventtimer.et.HPET1.quality: 440
kern.eventtimer.et.HPET1.frequency: 14318180
kern.eventtimer.et.HPET1.flags: 3
kern.eventtimer.et.HPET.quality: 550
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.flags: 7
kern.eventtimer.et.LAPIC.quality: 600
kern.eventtimer.et.LAPIC.frequency: 49887687
kern.eventtimer.et.LAPIC.flags: 7

root@fspa1:~/clock/new_res # sysctl kern.hz
kern.hz: 1000

root@fspa1:~/clock/new_res # sysctl kern.eventtimer.periodic
kern.eventtimer.periodic: 0

I will try and report results here.

> And why do use version 10.3 that has "Expected EoL" in April 30, 2018? And 
> 10.4 in October 31, 2018, as whole line of 10.x versions.

> Have you considered trying 11.1-RELEASE? It has many more changes to get 
> fixes, if found necessary.

We have very long verification/release cycle because this software will be used in critical infrastructure (railway automation). Latest security fixes is not actually important, because it will be used in very closed environment. 
I will try to update to 10.4 and 11.1 and check if it helps.

(In reply to Conrad Meyer from comment #4)
> What syscalls are you measuring delay over?  It seems that the column with 
> delay is different between your two sets of samples -- does this mean 
> anything?  Thanks. 

clock_gettime( CLOCK_MONOTONIC, ...);
See attached source code.

Actually, delay under 6.3-RELEASE is lower ( and evaluate time is higher ) because 6.3-RELEASE is running on different hardware. Test load (Fibonacci calculation) is executed faster under modern hardware.
I will try to install 6.3-RELEASE on Moxa hardware, repeat test and report result. Unfortunelly, I need to have test running for at least 6 hours before I get reliable results.
Comment 6 Aleksander Derevianko 2018-01-30 12:33:42 UTC
Created attachment 190196 [details]
dmesg from verbose boot of Moxa DA-820
Comment 7 Aleksander Derevianko 2018-01-30 13:06:08 UTC
I have now running tests on Moxa DA-820 with kern.eventtimer.periodic=1.

One interesting thing: if you look on tests with 6.3-RELEASE, 98% of samples is

1115322 send_sync 0 0 0 0 

= send(small data) recv(small data) send(40Kb) recv(40Kb)

- so every send() and recv() call is executed in <1ms.

7% of samples have 
73629   send_sync 0  1 0 0  

- so first recv() have 1ms execution time, but it's OK because it is first recv() after nanosleep(~200ms) - and we can't expect that nanosleep() will act exactly the same on different computers. Moreover, nanosleep() period is evaluated from two local calls to clock_gettime( CLOCK_MONOTONIC, ...) and subtraction operation.


But, for 10.3-RELEASE we have approximatelly even division between:
155042 send_sync 0 0 0 0
122890 send_sync 0 0 0 1

case with delay on first recv() - less then 0.02%
147 send_sync 0 1 0 0

This last 1 is for second recv() - and in this case, we have already syncronised computers over first small recv().


P.S. Test with kern.eventtimer.periodic=1 produce the following results for short test time:

root@fspa2:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
1484 send_sync 0 0 0 0
1314 send_sync 0 0 0 1
   1 send_sync 0 0 0 230
   4 send_sync 0 1 0 0

root@fspa1:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
1698 send_sync 0 0 0 0
1134 send_sync 0 0 0 1
   1 send_sync 0 0 0 229
  11 send_sync 0 1 0 0

Very quick very huge delay.

One moment:

fspa1:
times 2550: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2551: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2552: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2553: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
times 2555: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2556: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2557: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2558: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2559: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247

fspb:
times 2550: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2551: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2552: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2553: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 2555: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2556: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247
times 2557: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248
times 2558: send_sync 0 recv_sync 0 send_data 0 recv_data 1 eval 52 sleep 247
times 2559: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 247

The problem arise on the SAME cycle in both computers!
How it is possible? Seems like in one of computers both send and receive was blocked (buffered) on some level.
Comment 8 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-30 14:06:02 UTC
You may want to do "sysctl kern.timecounter.fast_gettime=0" and re-do your tests because 6.3 did not have fast_gettime mode.
Comment 9 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-30 14:11:28 UTC
Also, you may want to decrease influence of kernel randomness harvester in case it uses network as a source for your system. Please show output of "sysctl kern.random".
Comment 10 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-30 14:22:14 UTC
Also, if you use em0 network interface, please attach output of "sysctl dev.em.0"

You may want to disable NIC flow control to eliminate influence of possible "pause" frames.
Comment 11 Aleksander Derevianko 2018-01-30 14:27:09 UTC
(In reply to Eugene Grosbein from comment #8)
Will try with sysctl kern.timecounter.fast_gettime=0 and report here.

(In reply to Eugene Grosbein from comment #9)
root@fspa2:~ # sysctl kern.random
kern.random.sys.harvest.swi: 0
kern.random.sys.harvest.interrupt: 0
kern.random.sys.harvest.point_to_point: 0
kern.random.sys.harvest.ethernet: 0
kern.random.sys.seeded: 1
kern.random.yarrow.slowoverthresh: 2
kern.random.yarrow.slowthresh: 128
kern.random.yarrow.fastthresh: 96
kern.random.yarrow.bins: 10
kern.random.yarrow.gengateinterval: 10
kern.random.live_entropy_sources: Hardware, Intel Secure Key RNG
kern.random.active_adaptor: yarrow
kern.random.adaptors: dummy,yarrow

I have already think about it, and have it in /etc/sysctrl.conf
kern.random.sys.harvest.swi=0
kern.random.sys.harvest.interrupt=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.ethernet=0


(In reply to Eugene Grosbein from comment #10)
Currently I have try it with Realtec card (additional).
See attached sysctl.txt
Comment 12 Aleksander Derevianko 2018-01-30 14:27:45 UTC
Created attachment 190201 [details]
Full sysctl -a output from Moxa DA-820
Comment 13 Aleksander Derevianko 2018-01-30 14:40:27 UTC
Results of tests with kern.eventtimer.periodic=0 seems very interesting:

A1 computer:
grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
18989 send_sync 0 0 0 0
1557 send_sync 0 0 0 1
   3 send_sync 0 0 0 229
 352 send_sync 0 1 0 0


A2 computer
root@fspa2:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
11498 send_sync 0 0 0 0
9621 send_sync 0 0 0 1
   3 send_sync 0 0 0 2
   4 send_sync 0 0 0 230
  20 send_sync 0 1 0 0


root@fspa1:~/clock/new_res # grep times periodic.txt | grep  recv_data\ 229
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
times 7278: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
times 10424: send_sync 0 recv_sync 0 send_data 0 recv_data 229 eval 52 sleep 19
root@fspa1:~/clock/new_res #
times 18296: send_sync 0 recv_sync 0 send_data 0 recv_data 0 eval 52 sleep 248

root@fspa2:~/clock/new_res # grep times periodic.txt| grep  recv_data\ 230
times 2554: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 7278: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 10424: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
times 18296: send_sync 0 recv_sync 0 send_data 0 recv_data 230 eval 52 sleep 18
root@fspa2:~/clock/new_res #

As you can see, in 3 cases delay was catched on both sides, and in one case - only in A2 computer.

Moreover, it seems like A1 computer have 92% of all zeroes:
18989 send_sync 0 0 0 0
1557 send_sync 0 0 0 1

At the same time, A2 computer have only 54% of all zeroes:
11498 send_sync 0 0 0 0
9621 send_sync 0 0 0 1

I have found the difference in config:
A1
root@fspa1:~/clock/new_res # cat /boot/loader.conf
boot_verbose="YES"
coretemp_load="YES"
autoboot_delay="-1"
loader_logo="fbsdbw"
kern.ipc.semmns=350
hw.em.txd="4096"
hw.em.rxd="4096"
hw.em.rx_process_limit="-1"
hw.em.enable_msix="0"

A2
root@fspa2:~/clock/new_res # cat /boot/loader.conf
console="comconsole,vidconsole"
coretemp_load="YES"
autoboot_delay="-1"
loader_logo="fbsdbw"
kern.ipc.semmns="350"
hw.em.txd="4096"
hw.em.rxd="4096"
hw.em.rx_process_limit="-1"
hw.em.enable_msix="0"


Can boot_verbose="YES" or console="comconsole,vidconsole" make such difference?
Comment 14 Aleksander Derevianko 2018-01-30 14:51:37 UTC
(In reply to Eugene Grosbein from comment #10)
Regarding flow control: Who will send "pause" frames, if I have only two computers (A1 and A2) connected to this switch, and nothing more? And load is pretty low, like 40K*3 per seconds in both directions.
Comment 15 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-30 16:37:44 UTC
(In reply to Aleksander Derevianko from comment #14)

I would not be so trustful to correctness of flow control implementation in both of NIC firmware and switch. Low volume of traffic means you can turn flow control off without a problem.
Comment 16 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-30 16:39:04 UTC
(In reply to Aleksander Derevianko from comment #13)

> Can boot_verbose="YES" or console="comconsole,vidconsole" make such difference?

It can if your system writes something to console during the test.
Comment 17 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-30 16:40:32 UTC
(In reply to Aleksander Derevianko from comment #13)

Why do you disable MSI-X support for em interfaces?
Comment 18 Aleksander Derevianko 2018-01-31 07:29:06 UTC
(In reply to Eugene Grosbein from comment #16)
No, it doesn't write anything to console. Output from the test itself is redirected to file on disk (SSD 100Gb INTEL SSDSA2BZ100G3 6PB10362). Results was checked from remote ssh session.

(In reply to Eugene Grosbein from comment #17)
I have switched MSI-X support off in one of the previous attempts to solve this issue. Actually, it doesn't change anything at all in my test case.

-------------------
OK, current results from still-running test:

Computer A1:
root@fspa1:~/clock/new_res # grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
212603 send_sync 0 0 0 0
6437 send_sync 0 0 0 1
  10 send_sync 0 0 0 229
4177 send_sync 0 1 0 0
   1 send_sync 0 1 0 1


Computer A2:
root@fspa2:~/clock/new_res # !459
grep times periodic.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
122102 send_sync 0 0 0 0
100857 send_sync 0 0 0 1
  47 send_sync 0 0 0 2
  12 send_sync 0 0 0 230
   2 send_sync 0 0 0 231
 170 send_sync 0 1 0 0


I have set kern.timecounter.fast_gettime=0 yesteday, on both A1 and A2, and for last 12 hours 6 more cases of long delays appear on A1, and 8 on A2 computer.

I will try to repeat this test with 
console="comconsole,vidconsole"
removed both on A1 and A2 computers.
Comment 19 Aleksander Derevianko 2018-01-31 08:54:50 UTC
With console="comconsole,vidconsole" removed both on A1 and A2 computers:

root@fspa1:~/clock/new_res # grep times periodic_1.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
5116 send_sync 0 0 0 0
2234 send_sync 0 0 0 1
   1 send_sync 0 0 0 229
  64 send_sync 0 1 0 0

root@fspa2:~/clock/new_res # grep times periodic_1.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
4000 send_sync 0 0 0 0
3235 send_sync 0 0 0 1
   1 send_sync 0 0 0 230
   6 send_sync 0 1 0 0

Again, on the same cycle.

----------------------------------------------------

With boot_verbose="YES"
root@fspa1:~/clock/new_res # grep times periodic_2.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
6539 send_sync 0 0 0 0
 794 send_sync 0 0 0 1
   2 send_sync 0 0 0 229
 125 send_sync 0 1 0 0

root@fspa2:~/clock/new_res # grep times periodic_2.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
4210 send_sync 0 0 0 0
3249 send_sync 0 0 0 1
   2 send_sync 0 0 0 230
   6 send_sync 0 1 0 0
Comment 20 Aleksander Derevianko 2018-01-31 11:24:05 UTC
Now with em0 devices:

A1 computer:
root@fspa1:~/clock/new_res # grep times periodic_em0.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10;}' | sort | uniq -c
11256 send_sync 0 0 0 0
10968 send_sync 0 0 0 1
 234 send_sync 0 1 0 0
   1 send_sync 0 1 0 1

A2 computer:
root@fspa2:~/clock/new_res # grep times periodic_em0.txt | awk '{print $3 " " $4 " " $6 " " $8 " " $10 ;}' | sort | uniq -c
18284 send_sync 0 0 0 0
2490 send_sync 0 0 0 1
1704 send_sync 0 1 0 0


Current results seems just fine. I think, rare delays for 230ms in previous tests is actually Realtec card problems. It's cheap TP-Link TG-3468 device, so if it will continue to work fine with build-in em0 I will consider problem solved. 

Actual solutions seems to be 
kern.eventtimer.periodic=1

I will try to set only 
kern.eventtimer.periodic=1 
and 
kern.timecounter.alloweddeviation=0 
parameters, leaving everything else to system defaults.
Comment 21 Aleksander Derevianko 2018-02-01 09:04:40 UTC
OK, problem solved. 12-hours test doesn't show any delays longer then 1 ms.

For the future reference, only following parameters must be set in /etc/sysctl.conf:
--------------------------------------------------
# Exact clock for nanosleep()
kern.timecounter.alloweddeviation=0

# Disable random delays in network/adapter code
kern.eventtimer.periodic=1
--------------------------------------------------
All other parameters (boot.loader) can be left as system defaults.

Closing defect with "Works As Intended" because it's possible to solve just in OS configuration.

Thanks everybody for good support and good advices!